[GitHub] spark pull request #17718: [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySu...

bogdanrdc Fri, 21 Apr 2017 07:20:18 -0700

GitHub user bogdanrdc opened a pull request:

    https://github.com/apache/spark/pull/17718


    [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'Enabling/disabling 
ignoreCorruptFiles' flaky test 

    ## What changes were proposed in this pull request?
    
    SharedSQLContext.afterEach now calls DebugFilesystem.assertNoOpenStreams 
inside eventually.
    SQLTestUtils withTempDir calls waitForTasksToFinish before deleting the 
directory.
    
    ## How was this patch tested?
    New test but marked as ignored because it takes 30s. Can be unignored for 
review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bogdanrdc/spark SPARK-20407-BACKPORT2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17718
    
----
commit 60e02a173ddf335d58852e56611131ec4409ae8b
Author: Tathagata Das <[email protected]>
Date:   2016-12-22T00:43:17Z

    [SPARK-18234][SS] Made update mode public
    
    ## What changes were proposed in this pull request?
    
    Made update mode public. As part of that here are the changes.
    - Update DatastreamWriter to accept "update"
    - Changed package of InternalOutputModes from o.a.s.sql to 
o.a.s.sql.catalyst
    - Added update mode state removing with watermark to StateStoreSaveExec
    
    ## How was this patch tested?
    
    Added new tests in changed modules
    
    Author: Tathagata Das <[email protected]>
    
    Closes #16360 from tdas/SPARK-18234.
    
    (cherry picked from commit 83a6ace0d1be44f70e768348ae6688798c84343e)
    Signed-off-by: Tathagata Das <[email protected]>

commit 021952d5808715d0b9d6c716f8b67cd550f7982e
Author: Takeshi YAMAMURO <[email protected]>
Date:   2016-12-22T00:53:33Z

    [SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation buffer
    
    ## What changes were proposed in this pull request?
    This pr is to fix an `NullPointerException` issue caused by a following 
`limit + aggregate` query;
    ```
    scala> val df = Seq(("a", 1), ("b", 2), ("c", 1), ("d", 5)).toDF("id", 
"value")
    scala> df.limit(2).groupBy("id").count().show
    WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 8204, 
lvsp20hdn012.stubprod.com): java.lang.NullPointerException
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
 Source)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
    ```
    The root culprit is that 
[`$doAgg()`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L596)
 skips an initialization of [the buffer 
iterator](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L603);
 `BaseLimitExec` sets `stopEarly=true` and `$doAgg()` exits in the middle 
without the initialization.
    
    ## How was this patch tested?
    Added a test to check if no exception happens for limit + aggregates in 
`DataFrameAggregateSuite.scala`.
    
    Author: Takeshi YAMAMURO <[email protected]>
    
    Closes #15980 from maropu/SPARK-18528.
    
    (cherry picked from commit b41ec997786e2be42a8a2a182212a610d08b221b)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 9a3c5bd7082474cfb01f021aef103e44d12e2ff1
Author: Burak Yavuz <[email protected]>
Date:   2016-12-22T01:23:48Z

    [FLAKY-TEST] InputStreamsSuite.socket input stream
    
    ## What changes were proposed in this pull request?
    
    
https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.streaming.InputStreamsSuite&test_name=socket+input+stream
    
    ## How was this patch tested?
    
    Tested 2,000 times.
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #16343 from brkyvz/sock.
    
    (cherry picked from commit afe36516e4b4031196ee2e0a04980ac49208ea6b)
    Signed-off-by: Tathagata Das <[email protected]>

commit 07e2a17d1cb7eade93d482d18a2079e9e6f40f57
Author: Shixiong Zhu <[email protected]>
Date:   2016-12-22T06:02:57Z

    [SPARK-18908][SS] Creating StreamingQueryException should check if 
logicalPlan is created
    
    ## What changes were proposed in this pull request?
    
    This PR audits places using `logicalPlan` in StreamExecution and ensures 
they all handles the case that `logicalPlan` cannot be created.
    
    In addition, this PR also fixes the following issues in 
`StreamingQueryException`:
    - `StreamingQueryException` and `StreamExecution` are cycle-dependent 
because in the `StreamingQueryException`'s constructor, it calls 
`StreamExecution`'s `toDebugString` which uses `StreamingQueryException`. Hence 
it will output `null` value in the error message.
    - Duplicated stack trace when calling Throwable.printStackTrace because 
StreamingQueryException's toString contains the stack trace.
    
    ## How was this patch tested?
    
    The updated `test("max files per trigger - incorrect values")`. I found 
this issue when I switched from `testStream` to the real codes to verify the 
failure in this test.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #16322 from zsxwing/SPARK-18907.
    
    (cherry picked from commit ff7d82a207e8bef7779c27378f7a50a138627341)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit def3690f6889979226478bf9c35a240d7e0662e6
Author: Reynold Xin <[email protected]>
Date:   2016-12-22T07:29:56Z

    [SQL] Minor readability improvement for partition handling code
    
    This patch includes minor changes to improve readability for partition 
handling code. I'm in the middle of implementing some new feature and found 
some naming / implicit type inference not as intuitive.
    
    This patch should have no semantic change and the changes should be covered 
by existing test cases.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #16378 from rxin/minor-fix.
    
    (cherry picked from commit 7c5b7b3a2e5a7c1b2d0d8ce655840cad581e47ac)
    Signed-off-by: Reynold Xin <[email protected]>

commit ec0d6e21ed85164fd7eb519ec1d017497122c55c
Author: Reynold Xin <[email protected]>
Date:   2016-12-22T07:46:33Z

    [DOC] bucketing is applicable to all file-based data sources
    
    ## What changes were proposed in this pull request?
    Starting Spark 2.1.0, bucketing feature is available for all file-based 
data sources. This patch fixes some function docs that haven't yet been updated 
to reflect that.
    
    ## How was this patch tested?
    N/A
    
    Author: Reynold Xin <[email protected]>
    
    Closes #16349 from rxin/ds-doc.
    
    (cherry picked from commit 2e861df96eacd821edbbd9883121bff67611074f)
    Signed-off-by: Reynold Xin <[email protected]>

commit f6853b3e5a068c1bc972eae2370d8bd94026d682
Author: Reynold Xin <[email protected]>
Date:   2016-12-22T18:35:09Z

    [SPARK-18973][SQL] Remove SortPartitions and RedistributeData
    
    ## What changes were proposed in this pull request?
    SortPartitions and RedistributeData logical operators are not actually used 
and can be removed. Note that we do have a Sort operator (with global flag 
false) that subsumed SortPartitions.
    
    ## How was this patch tested?
    Also updated test cases to reflect the removal.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #16381 from rxin/SPARK-18973.
    
    (cherry picked from commit 2615100055860faa5f74d3711d4d15ebae6aba25)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 132f2297118e29a9bc0830d24063f425dc75892b
Author: Ryan Williams <[email protected]>
Date:   2016-12-22T00:37:20Z

    [SPARK-17807][CORE] split test-tags into test-JAR
    
    Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's 
compile-scope transitive-dependency) on scalatest by splitting test-oriented 
tags into spark-tags' test JAR.
    
    Alternative to #16303.
    
    Author: Ryan Williams <[email protected]>
    
    Closes #16311 from ryan-williams/tt.
    
    (cherry picked from commit afd9bc1d8a85adf88c412d8bc75e46e7ecb4bcdd)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit 5e801034915dd206f720ae89dc00bb2a84ae3d41
Author: Shixiong Zhu <[email protected]>
Date:   2016-12-23T00:21:09Z

    [SPARK-18985][SS] Add missing @InterfaceStability.Evolving for Structured 
Streaming APIs
    
    ## What changes were proposed in this pull request?
    
    Add missing InterfaceStability.Evolving for Structured Streaming APIs
    
    ## How was this patch tested?
    
    Compiling the codes.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #16385 from zsxwing/SPARK-18985.
    
    (cherry picked from commit 2246ce88ae6bf842cf325ee3efcb7bea53f8ca37)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 1857acc717dcd083d21b20ef4d09723c3901bdfb
Author: Shixiong Zhu <[email protected]>
Date:   2016-12-23T00:22:55Z

    [SPARK-18972][CORE] Fix the netty thread names for RPC
    
    ## What changes were proposed in this pull request?
    
    Right now the name of threads created by Netty for Spark RPC are 
`shuffle-client-**` and `shuffle-server-**`. It's pretty confusing.
    
    This PR just uses the module name in TransportConf to set the thread name. 
In addition, it also includes the following minor fixes:
    
    - TransportChannelHandler.channelActive and channelInactive should call the 
corresponding super methods.
    - Make ShuffleBlockFetcherIterator throw NoSuchElementException if it has 
no more elements. Otherwise,  if the caller calls `next` without `hasNext`, it 
will just hang.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #16380 from zsxwing/SPARK-18972.
    
    (cherry picked from commit f252cb5d161e064d39cc1ed1d9299307a0636174)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 5bafdc45d6493f2ea41cc4bce0faa5f93ff3162c
Author: Shixiong Zhu <[email protected]>
Date:   2016-12-23T23:38:41Z

    [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use 
ConcurrentHashMap to make it faster
    
    ## What changes were proposed in this pull request?
    
    The time complexity of ConcurrentHashMap's `remove` is O(1). Changing 
ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to 
`ConcurrentHashMap's` will make the removal much faster.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #16390 from zsxwing/SPARK-18991.
    
    (cherry picked from commit a848f0ba84e37fd95d0f47863ec68326e3296b33)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit ca25b1e51f036fb837e3fe8218cb04d7360e049d
Author: Kousuke Saruta <[email protected]>
Date:   2016-12-24T13:02:58Z

    [SPARK-18837][WEBUI] Very long stage descriptions do not wrap in the UI
    
    ## What changes were proposed in this pull request?
    
    This issue was reported by wangyum.
    
    In the AllJobsPage, JobPage and StagePage, the description length was 
limited before like as follows.
    
    ![ui-2 0 
0](https://cloud.githubusercontent.com/assets/4736016/21319673/8b225246-c651-11e6-9041-4fcdd04f4dec.gif)
    
    But recently, the limitation seems to have been accidentally removed.
    
    ![ui-2 1 
0](https://cloud.githubusercontent.com/assets/4736016/21319825/104779f6-c652-11e6-8bfa-dfd800396352.gif)
    
    The cause is that some tables are no longer `sortable` class although they 
were, and `sortable` class does not only mark tables as sortable but also 
limited the width of their child `td` elements.
    The reason why now some tables are not `sortable` class is because another 
sortable mechanism was introduced by #13620 and #13708 with pagination feature.
    
    To fix this issue, I've introduced new class `table-cell-width-limited` 
which limits the description cell width and the description is like what it was.
    
    <img width="1260" alt="2016-12-20 1 00 34" 
src="https://cloud.githubusercontent.com/assets/4736016/21320478/89141c7a-c654-11e6-8494-f8f91325980b.png";>
    
    ## How was this patch tested?
    
    Tested manually with my browser.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #16338 from sarutak/SPARK-18837.
    
    (cherry picked from commit f2ceb2abe9357942a51bd643683850efd1fc9df7)
    Signed-off-by: Sean Owen <[email protected]>

commit ac7107fe70fcd0b584001c10dd624a4d8757109c
Author: Carson Wang <[email protected]>
Date:   2016-12-28T12:12:44Z

    [MINOR][DOC] Fix doc of ForeachWriter to use writeStream
    
    ## What changes were proposed in this pull request?
    
    Fix the document of `ForeachWriter` to use `writeStream` instead of `write` 
for a streaming dataset.
    
    ## How was this patch tested?
    Docs only.
    
    Author: Carson Wang <[email protected]>
    
    Closes #16419 from carsonwang/FixDoc.
    
    (cherry picked from commit 2a5f52a7146abc05bf70e65eb2267cd869ac4789)
    Signed-off-by: Sean Owen <[email protected]>

commit 7197a7bc7061e2908b6430f494dba378378d5d02
Author: Sean Owen <[email protected]>
Date:   2016-12-28T12:17:33Z

    [SPARK-18993][BUILD] Unable to build/compile Spark in IntelliJ due to 
missing Scala deps in spark-tags
    
    ## What changes were proposed in this pull request?
    
    This adds back a direct dependency on Scala library classes from spark-tags 
because its Scala annotations need them.
    
    ## How was this patch tested?
    
    Existing tests
    
    Author: Sean Owen <[email protected]>
    
    Closes #16418 from srowen/SPARK-18993.
    
    (cherry picked from commit d7bce3bd31ec193274718042dc017706989d7563)
    Signed-off-by: Sean Owen <[email protected]>

commit 80d583bd09de54890cddfcc0c6fd807d7200ea75
Author: Tathagata Das <[email protected]>
Date:   2016-12-28T20:11:25Z

    [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming 
regarding watermarking and status
    
    ## What changes were proposed in this pull request?
    
    - Extended the Window operation section with code snippet and explanation 
of watermarking
    - Extended the Output Mode section with a table showing the compatibility 
between query type and output mode
    - Rewrote the Monitoring section with updated jsons generated by 
StreamingQuery.progress/status
    - Updated API changes in the StreamingQueryListener example
    
    TODO
    - [x] Figure showing the watermarking
    
    ## How was this patch tested?
    
    N/A
    
    ## Screenshots
    ### Section: Windowed Aggregation with Event Time
    
    <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" 
src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png";>
    
    
![image](https://cloud.githubusercontent.com/assets/663212/21246241/45b0f87a-c2dc-11e6-9c29-d0a89e07bf8d.png)
    
    <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" 
src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png";>
    
    ----------------------------
    ### Section: Output Modes
    
![image](https://cloud.githubusercontent.com/assets/663212/21246276/8ee44948-c2dc-11e6-9fa2-30502fcf9a55.png)
    
    ----------------------------
    ### Section: Monitoring
    
![image](https://cloud.githubusercontent.com/assets/663212/21246535/3c5baeb2-c2de-11e6-88cd-ca71db7c5cf9.png)
    
![image](https://cloud.githubusercontent.com/assets/663212/21246574/789492c2-c2de-11e6-8471-7bef884e1837.png)
    
    Author: Tathagata Das <[email protected]>
    
    Closes #16294 from tdas/SPARK-18669.
    
    (cherry picked from commit 092c6725bf039bf33299b53791e1958c4ea3f6aa)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 47ab4afed69bb019b4e0f85e26e52dc5cee338df
Author: adesharatushar <[email protected]>
Date:   2016-12-29T22:03:34Z

    [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section 
Design Patterns for using foreachRDD
    
    ## What changes were proposed in this pull request?
    
    Added missing Java example under section "Design Patterns for using 
foreachRDD". Now this section has examples in all 3 languages, improving 
consistency of documentation.
    
    ## How was this patch tested?
    
    Manual.
    Generated docs using command "SKIP_API=1 jekyll build" and verified 
generated HTML page manually.
    
    The syntax of example has been tested for correctness using sample code on 
Java1.7 and Spark 2.2.0-SNAPSHOT.
    
    Author: adesharatushar <[email protected]>
    
    Closes #16408 from adesharatushar/streaming-doc-fix.
    
    (cherry picked from commit dba81e1dcdea1e8bd196c88d4810f9a04312acbf)
    Signed-off-by: Sean Owen <[email protected]>

commit 20ae11722d82cf3cdaa8c4023e37c1416664917d
Author: Cheng Lian <[email protected]>
Date:   2016-12-30T22:46:30Z

    [SPARK-19016][SQL][DOC] Document scalable partition handling
    
    This PR documents the scalable partition handling feature in the body of 
the programming guide.
    
    Before this PR, we only mention it in the migration guide. It's not super 
clear that external datasource tables require an extra `MSCK REPAIR TABLE` 
command is to have per-partition information persisted since 2.1.
    
    N/A.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #16424 from liancheng/scalable-partition-handling-doc.
    
    (cherry picked from commit 871f6114ac0075a1b45eda8701113fa20d647de9)
    Signed-off-by: Cheng Lian <[email protected]>

commit 3483defeb82b8333da238b21229e6a8c82820d48
Author: Shixiong Zhu <[email protected]>
Date:   2017-01-01T21:25:44Z

    [SPARK-19050][SS][TESTS] Fix EventTimeWatermarkSuite 'delay in months and 
years handled correctly'
    
    ## What changes were proposed in this pull request?
    
    `monthsSinceEpoch` in this test is like `math.floor(num)`, so `monthDiff` 
has two possible values.
    
    ## How was this patch tested?
    
    Jenkins.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #16449 from zsxwing/watermark-test-hotfix.
    
    (cherry picked from commit 2394047370d2d93bd8bc57b996fee47465c470af)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 63857c8d30ceef9bf998659fc12ea8872c0f36ea
Author: Liang-Chi Hsieh <[email protected]>
Date:   2017-01-02T14:41:57Z

    [MINOR][DOC] Minor doc change for YARN credential providers
    
    ## What changes were proposed in this pull request?
    
    The configuration `spark.yarn.security.tokens.{service}.enabled` is 
deprecated. Now we should use 
`spark.yarn.security.credentials.{service}.enabled`. Some places in the doc is 
not updated yet.
    
    ## How was this patch tested?
    
    N/A. Just doc change.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #16444 from viirya/minor-credential-provider-doc.
    
    (cherry picked from commit 0ac2f1e71f62ec925ed0e19c4654759d155efc35)
    Signed-off-by: Sean Owen <[email protected]>

commit 517f39833cf789b536defe5ba4b010828d24831f
Author: genmao.ygm <[email protected]>
Date:   2016-11-15T18:32:43Z

    [SPARK-18379][SQL] Make the parallelism of parallelPartitionDiscovery 
configurable.
    
    ## What changes were proposed in this pull request?
    
    The largest parallelism in PartitioningAwareFileIndex 
#listLeafFilesInParallel() is 10000 in hard code. We may need to make this 
number configurable. And in PR, I reduce it to 100.
    
    ## How was this patch tested?
    
    Existing ut.
    
    Author: genmao.ygm <[email protected]>
    Author: dylon <[email protected]>
    
    Closes #15829 from uncleGen/SPARK-18379.
    
    (cherry picked from commit 745ab8bc50da89c42b297de9dcb833e5f2074481)
    Signed-off-by: Sean Owen <[email protected]>

commit d489e1dc7ecf7cf081141d3f45f86c39fc3db1fe
Author: Liwei Lin <[email protected]>
Date:   2017-01-02T14:40:06Z

    [SPARK-19041][SS] Fix code snippet compilation issues in Structured 
Streaming Programming Guide
    
    ## What changes were proposed in this pull request?
    
    Currently some code snippets in the programming guide just do not compile. 
We should fix them.
    
    ## How was this patch tested?
    
    ```
    SKIP_API=1 jekyll build
    ```
    
    ## Screenshot from part of the change:
    
    
![snip20161231_37](https://cloud.githubusercontent.com/assets/15843379/21576864/cc52fcd8-cf7b-11e6-8bd6-f935d9ff4a6b.png)
    
    Author: Liwei Lin <[email protected]>
    
    Closes #16442 from lw-lin/ss-pro-guide-.

commit 94272a9600405442bfe485b17e55a84b85c25da3
Author: gatorsmile <[email protected]>
Date:   2016-12-31T11:40:28Z

    [SPARK-19028][SQL] Fixed non-thread-safe functions used in SessionCatalog
    
    ### What changes were proposed in this pull request?
    Fixed non-thread-safe functions used in SessionCatalog:
    - refreshTable
    - lookupRelation
    
    ### How was this patch tested?
    N/A
    
    Author: gatorsmile <[email protected]>
    
    Closes #16437 from gatorsmile/addSyncToLookUpTable.
    
    (cherry picked from commit 35e974076dcbc5afde8d4259ce88cb5f29d94920)
    Signed-off-by: Wenchen Fan <[email protected]>

commit 776255065c13df7b4505c225546b4b66cd929c76
Author: gatorsmile <[email protected]>
Date:   2017-01-03T19:43:47Z

    [SPARK-19048][SQL] Delete Partition Location when Dropping Managed 
Partitioned Tables in InMemoryCatalog
    
    ### What changes were proposed in this pull request?
    The data in the managed table should be deleted after table is dropped. 
However, if the partition location is not under the location of the partitioned 
table, it is not deleted as expected. Users can specify any location for the 
partition when they adding a partition.
    
    This PR is to delete partition location when dropping managed partitioned 
tables stored in `InMemoryCatalog`.
    
    ### How was this patch tested?
    Added test cases for both HiveExternalCatalog and InMemoryCatalog
    
    Author: gatorsmile <[email protected]>
    
    Closes #16448 from gatorsmile/unsetSerdeProp.
    
    (cherry picked from commit b67b35f76b684c5176dc683e7491fd01b43f4467)
    Signed-off-by: gatorsmile <[email protected]>

commit 1ecf1a953ee0f0f0925bb8a3df54d3e762116f1a
Author: Dongjoon Hyun <[email protected]>
Date:   2017-01-04T17:56:11Z

    [SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on DecimalType 
should find a common type with `typeSoFar`
    
    ## What changes were proposed in this pull request?
    
    CSV type inferencing causes `IllegalArgumentException` on decimal numbers 
with heterogeneous precisions and scales because the current logic uses the 
last decimal type in a **partition**. Specifically, `inferRowType`, the 
**seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to 
use `findTightestCommonType`.
    
    **decimal.csv**
    ```
    9.03E+12
    1.19E+11
    ```
    
    **BEFORE**
    ```scala
    scala> spark.read.format("csv").option("inferSchema", 
true).load("decimal.csv").printSchema
    root
     |-- _c0: decimal(3,-9) (nullable = true)
    
    scala> spark.read.format("csv").option("inferSchema", 
true).load("decimal.csv").show
    16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
    java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 
exceeds max precision 3
    ```
    
    **AFTER**
    ```scala
    scala> spark.read.format("csv").option("inferSchema", 
true).load("decimal.csv").printSchema
    root
     |-- _c0: decimal(4,-9) (nullable = true)
    
    scala> spark.read.format("csv").option("inferSchema", 
true).load("decimal.csv").show
    +---------+
    |      _c0|
    +---------+
    |9.030E+12|
    | 1.19E+11|
    +---------+
    ```
    
    ## How was this patch tested?
    
    Pass the newly add test case.
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #16463 from dongjoon-hyun/SPARK-18877-BACKPORT-21.

commit 4ca1788805e4a0131ba8f0ccb7499ee0e0242837
Author: jerryshao <[email protected]>
Date:   2017-01-06T16:07:54Z

    [SPARK-19033][CORE] Add admin acls for history server
    
    ## What changes were proposed in this pull request?
    
    Current HistoryServer's ACLs is derived from application event-log, which 
means the newly changed ACLs cannot be applied to the old data, this will 
become a problem where newly added admin cannot access the old application 
history UI, only the new application can be affected.
    
    So here propose to add admin ACLs for history server, any configured 
user/group could have the view access to all the applications, while the view 
ACLs derived from application run-time still take effect.
    
    ## How was this patch tested?
    
    Unit test added.
    
    Author: jerryshao <[email protected]>
    
    Closes #16470 from jerryshao/SPARK-19033.
    
    (cherry picked from commit 4a4c3dc9ca10e52f7981b225ec44e97247986905)
    Signed-off-by: Tom Graves <[email protected]>

commit ce9bfe6db63582d632f7d57cbf37ee7b29135198
Author: zuotingbing <[email protected]>
Date:   2017-01-06T17:57:49Z

    [SPARK-19083] sbin/start-history-server.sh script use of $@ without quotes
    
    JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19083#
    
    sbin/start-history-server.sh script use of $ without quotes, this will 
affect the length of args which used in HistoryServerArguments::parse(args: 
List[String])
    
    Author: zuotingbing <[email protected]>
    
    Closes #16484 from zuotingbing/sh.
    
    (cherry picked from commit a9a137377e4cf293325ccd7368698f20b5d6b98a)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit ee735a8a85d7f015188f7cb31975f60cc969e453
Author: Tathagata Das <[email protected]>
Date:   2017-01-06T19:29:01Z

    [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for 
update mode and source/sink options
    
    ## What changes were proposed in this pull request?
    
    Updates
    - Updated Late Data Handling section by adding a figure for Update Mode. 
Its more intuitive to explain late data handling with Update Mode, so I added 
the new figure before the Append Mode figure.
    - Updated Output Modes section with Update mode
    - Added options for all the sources and sinks
    
    ---------------------------
    ---------------------------
    
    
![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)
    
    ---------------------------
    ---------------------------
    <img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" 
src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png";>
    <img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" 
src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png";>
    
    ---------------------------
    ---------------------------
    
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
    
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
    
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)
    
    Author: Tathagata Das <[email protected]>
    
    Closes #16468 from tdas/SPARK-19074.
    
    (cherry picked from commit b59cddaba01cbdf50dbe8fe7ef7b9913bad9552d)
    Signed-off-by: Tathagata Das <[email protected]>

commit 86b66216de411f8cbc79ede62b353f7cbb550903
Author: [email protected] <[email protected]>
Date:   2017-01-07T19:07:49Z

    [SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for 
original and loaded model
    
    ## What changes were proposed in this pull request?
    
    While adding DistributedLDAModel training summary for SparkR, I found that 
the logPrior for original and loaded model is different.
    For example, in the test("read/write DistributedLDAModel"), I add the test:
    val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
    val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
    assert(logPrior === logPrior2)
    The test fails:
    -4.394180878889078 did not equal -4.294290536919573
    
    The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only 
returns the value of a single vertex instead of the aggregation of all 
vertices. Therefore, when the loaded model does the aggregation in a different 
order, it returns different `logPrior`.
    
    Please refer to #16464 for details.
    ## How was this patch tested?
    Add a new unit test for testing logPrior.
    
    Author: [email protected] <[email protected]>
    
    Closes #16491 from wangmiao1981/ldabug.
    
    (cherry picked from commit 036b50347c56a3541c526b1270093163b9b79e45)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit c95b58557dec2f4708d5efd9314edd80e0975fc8
Author: Sean Owen <[email protected]>
Date:   2017-01-07T19:15:51Z

    [SPARK-19106][DOCS] Styling for the configuration docs is broken
    
    configuration.html section headings were not specified correctly in 
markdown and weren't rendering, being recognized correctly. Removed extra p 
tags and pulled level 4 titles up to level 3, since level 3 had been skipped. 
This improves the TOC.
    
    Doc build, manual check.
    
    Author: Sean Owen <[email protected]>
    
    Closes #16490 from srowen/SPARK-19106.
    
    (cherry picked from commit 54138f6e89abfc17101b4f2812715784a2b98331)
    Signed-off-by: Sean Owen <[email protected]>

commit ecc16220d2d9eace81de44c4b0aff1c364a35e3f
Author: Dongjoon Hyun <[email protected]>
Date:   2017-01-08T02:55:01Z

    [SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` 
with `LOCATION`
    
    ## What changes were proposed in this pull request?
    
    This PR adds a new behavior change description on `CREATE TABLE ... 
LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 
1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as 
[SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276).
    
    ## How was this patch tested?
    
    ```
    SKIP_API=1 jekyll build
    ```
    
    **Newly Added Description**
    <img width="913" alt="new" 
src="https://cloud.githubusercontent.com/assets/9700541/21743606/7efe2b12-d4ba-11e6-8a0d-551222718ea2.png";>
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #16400 from dongjoon-hyun/SPARK-18941.
    
    (cherry picked from commit 923e594844a7ad406195b91877f0fb374d5a454b)
    Signed-off-by: gatorsmile <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17718: [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySu...

Reply via email to