[GitHub] spark pull request #15535: [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactore...

tdas Tue, 18 Oct 2016 13:48:52 -0700

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/15535


    [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener 
APIs for branch-2.0

    This is the branch-2.0 PR of #15530 to make the APIs consistent with the 
master. Since these APIs are experimental and not direct user facing 
(StreamingQueryListener is advanced Structured Streaming APIs), its okay to 
change them in branch-2.0.
    
    ## What changes were proposed in this pull request?
    
    As per @rxin request, here are further API changes
    - Changed `Stream(Started/Progress/Terminated)` events to `Stream*Event`
    - Changed the fields in `StreamingQueryListener.on***` from `query*` to 
`event`
    
    ## How was this patch tested?
    Existing unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-17731-1-branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15535
    
----
commit b65b041af8b64413c7d460d4ea110b2044d6f36e
Author: Felix Cheung <[email protected]>
Date:   2016-08-22T22:53:10Z

    [SPARK-16508][SPARKR] doc updates and more CRAN check fixes
    
    replace ``` ` ``` in code doc with `\code{thing}`
    remove added `...` for drop(DataFrame)
    fix remaining CRAN check warnings
    
    create doc with knitr
    
    junyangq
    
    Author: Felix Cheung <[email protected]>
    
    Closes #14734 from felixcheung/rdoccleanup.
    
    (cherry picked from commit 71afeeea4ec8e67edc95b5d504c557c88a2598b9)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit ff2f873800fcc3d699e52e60fd0e69eb01d12503
Author: Eric Liang <[email protected]>
Date:   2016-08-22T23:32:14Z

    [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in 
block manager replication
    
    ## What changes were proposed in this pull request?
    
    This is a straightforward clone of JoshRosen 's original patch. I have 
follow-up changes to fix block replication for repl-defined classes as well, 
but those appear to be flaking tests so I'm going to leave that for SPARK-17042
    
    ## How was this patch tested?
    
    End-to-end test in ReplSuite (also more tests in DistributedSuite from the 
original patch).
    
    Author: Eric Liang <[email protected]>
    
    Closes #14311 from ericl/spark-16550.
    
    (cherry picked from commit 8e223ea67acf5aa730ccf688802f17f6fc10907c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 225898961bc4bc71d56f33c027adbb2d0929ae5a
Author: Shivaram Venkataraman <[email protected]>
Date:   2016-08-23T00:09:32Z

    [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    This change adds CRAN documentation checks to be run as a part of 
`R/run-tests.sh` . As this script is also used by Jenkins this means that we 
will get documentation checks on every PR going forward.
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #14759 from shivaram/sparkr-cran-jenkins.
    
    (cherry picked from commit 920806ab272ba58a369072a5eeb89df5e9b470a6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit eaea1c86b897d302107a9b6833a27a2b24ca31a0
Author: Cheng Lian <[email protected]>
Date:   2016-08-23T01:11:47Z

    [SPARK-17182][SQL] Mark Collect as non-deterministic
    
    ## What changes were proposed in this pull request?
    
    This PR marks the abstract class `Collect` as non-deterministic since the 
results of `CollectList` and `CollectSet` depend on the actual order of input 
rows.
    
    ## How was this patch tested?
    
    Existing test cases should be enough.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #14749 from liancheng/spark-17182-non-deterministic-collect.
    
    (cherry picked from commit 2cdd92a7cd6f85186c846635b422b977bdafbcdd)
    Signed-off-by: Wenchen Fan <[email protected]>

commit d16f9a0b7c464728d7b11899740908e23820a797
Author: Felix Cheung <[email protected]>
Date:   2016-08-23T03:15:03Z

    [SPARKR][MINOR] Update R DESCRIPTION file
    
    ## What changes were proposed in this pull request?
    
    Update DESCRIPTION
    
    ## How was this patch tested?
    
    Run install and CRAN tests
    
    Author: Felix Cheung <[email protected]>
    
    Closes #14764 from felixcheung/rpackagedescription.
    
    (cherry picked from commit d2b3d3e63e1a9217de6ef507c350308017664a62)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 811a2cef03647c5be29fef522c423921c79b1bc3
Author: Davies Liu <[email protected]>
Date:   2016-08-23T16:45:13Z

    [SPARK-13286] [SQL] add the next expression of SQLException as cause
    
    Some JDBC driver (for example PostgreSQL) does not use the underlying 
exception as cause, but have another APIs (getNextException) to access that, so 
it it's included in the error logging, making us hard to find the root cause, 
especially in batch mode.
    
    This PR will pull out the next exception and add it as cause (if it's 
different) or suppressed (if there is another different cause).
    
    Can't reproduce this on the default JDBC driver, so did not add a 
regression test.
    
    Author: Davies Liu <[email protected]>
    
    Closes #14722 from davies/keep_cause.
    
    (cherry picked from commit 9afdfc94f49395e69a7959e881c19d787ce00c3e)
    Signed-off-by: Davies Liu <[email protected]>

commit cc4018996740b3a68d4a557615c59c67b8996ebb
Author: Junyang Qian <[email protected]>
Date:   2016-08-23T18:22:32Z

    [SPARKR][MINOR] Remove reference link for common Windows environment 
variables
    
    ## What changes were proposed in this pull request?
    
    The PR removes reference link in the doc for environment variables for 
common Windows folders. The cran check gave code 503: service unavailable on 
the original link.
    
    ## How was this patch tested?
    
    Manual check.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14767 from junyangq/SPARKR-RemoveLink.
    
    (cherry picked from commit 8fd63e808e15c8a7e78fef847183c86f332daa91)
    Signed-off-by: Felix Cheung <[email protected]>

commit a2a7506d06fe9d878d55cf5498f5bfef9a69171c
Author: hyukjinkwon <[email protected]>
Date:   2016-08-23T20:21:43Z

    [MINOR][DOC] Use standard quotes instead of "curly quote" marks from Mac in 
structured streaming programming guides
    
    This PR fixes curly quotes (`â` and `â` ) to standard quotes (`"`).
    
    This will be a actual problem when users copy and paste the examples. This 
would not work.
    
    This seems only happening in `structured-streaming-programming-guide.md`.
    
    Manually built.
    
    This will change some examples to be correctly marked down as below:
    
    ![2016-08-23 3 24 
13](https://cloud.githubusercontent.com/assets/6477701/17882878/2a38332e-694a-11e6-8e84-76bdb89151e0.png)
    
    to
    
    ![2016-08-23 3 26 
06](https://cloud.githubusercontent.com/assets/6477701/17882888/376eaa28-694a-11e6-8b88-32ea83997037.png)
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #14770 from HyukjinKwon/minor-quotes.
    
    (cherry picked from commit 588559911de94bbe0932526ee1e1dd36a581a423)
    Signed-off-by: Sean Owen <[email protected]>

commit a772b4b5dea46cda1204a50a4909d40f8933ad77
Author: Josh Rosen <[email protected]>
Date:   2016-08-23T20:31:58Z

    [SPARK-17194] Use single quotes when generating SQL for string literals
    
    When Spark emits SQL for a string literal, it should wrap the string in 
single quotes, not double quotes. Databases which adhere more strictly to the 
ANSI SQL standards, such as Postgres, allow only single-quotes to be used for 
denoting string literals (see http://stackoverflow.com/a/1992331/590203).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #14763 from JoshRosen/SPARK-17194.
    
    (cherry picked from commit bf8ff833e30b39e5e5e35ba8dcac31b79323838c)
    Signed-off-by: Herman van Hovell <[email protected]>

commit a6e6a047bb9215df55b009957d4c560624d886fc
Author: Weiqing Yang <[email protected]>
Date:   2016-08-24T06:44:45Z

    [MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'
    
    ## What changes were proposed in this pull request?
    This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.
    
    ## How was this patch tested?
    Manual.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #14769 from Sherry302/cleanComment.
    
    (cherry picked from commit b9994ad05628077016331e6b411fbc09017b1e63)
    Signed-off-by: Reynold Xin <[email protected]>

commit df87f161c9e40a49235ea722f6a662a488b41c4c
Author: Wenchen Fan <[email protected]>
Date:   2016-08-24T06:46:09Z

    [SPARK-17186][SQL] remove catalog table type INDEX
    
    ## What changes were proposed in this pull request?
    
    Actually Spark SQL doesn't support index, the catalog table type `INDEX` is 
from Hive. However, most operations in Spark SQL can't handle index table, e.g. 
create table, alter table, etc.
    
    Logically index table should be invisible to end users, and Hive also 
generates special table name for index table to avoid users accessing it 
directly. Hive has special SQL syntax to create/show/drop index tables.
    
    At Spark SQL side, although we can describe index table directly, but the 
result is unreadable, we should use the dedicated SQL syntax to do it(e.g. 
`SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the 
result is always empty.(Can hive read index table directly?)
    
    This PR remove the table type `INDEX`, to make it clear that Spark SQL 
doesn't support index currently.
    
    ## How was this patch tested?
    
    existing tests.
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #14752 from cloud-fan/minor2.
    
    (cherry picked from commit 52fa45d62a5a0bc832442f38f9e634c5d8e29e08)
    Signed-off-by: Reynold Xin <[email protected]>

commit ce7dce1755a8d36ec7346adc3de26d8fdc4f05e9
Author: Weiqing Yang <[email protected]>
Date:   2016-08-24T09:12:44Z

    [MINOR][BUILD] Fix Java CheckStyle Error
    
    As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing 
list), besides the critical bugs, it's better to fix the code style errors 
before the release.
    
    Before:
    ```
    ./dev/lint-java
    Checkstyle checks failed at following occurrences:
    [ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525]
 (sizes) LineLength: Line is longer than 100 characters (found 119).
    [ERROR] 
src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64]
 (sizes) LineLength: Line is longer than 100 characters (found 103).
    ```
    After:
    ```
    ./dev/lint-java
    Using `mvn` from path: /usr/local/bin/mvn
    Checkstyle checks passed.
    ```
    Manual.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #14768 from Sherry302/fixjavastyle.
    
    (cherry picked from commit 673a80d2230602c9e6573a23e35fb0f6b832bfca)
    Signed-off-by: Sean Owen <[email protected]>

commit 33d79b58735770ac613540c21095a1e404f065b0
Author: VinceShieh <[email protected]>
Date:   2016-08-24T09:16:58Z

    [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer 
when some quantiles are duplicated
    
    ## What changes were proposed in this pull request?
    
    In cases when QuantileDiscretizerSuite is called upon a numeric array with 
duplicated elements,  we will  take the unique elements generated from 
approxQuantiles as input for Bucketizer.
    
    ## How was this patch tested?
    
    An unit test is added in QuantileDiscretizerSuite
    
    QuantileDiscretizer.fit will throw an illegal exception when calling 
setSplits on a list of splits
    with duplicated elements. Bucketizer.setSplits should only accept either a 
numeric vector of two
    or more unique cut points, although that may produce less number of buckets 
than requested.
    
    Signed-off-by: VinceShieh <vincent.xieintel.com>
    
    Author: VinceShieh <[email protected]>
    
    Closes #14747 from VinceShieh/SPARK-17086.
    
    (cherry picked from commit 92c0eaf348b42b3479610da0be761013f9d81c54)
    Signed-off-by: Sean Owen <[email protected]>

commit 29091d7cd60c20bf019dc9c1625a22e80ea50928
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T17:40:09Z

    [SPARKR][MINOR] Fix doc for show method
    
    ## What changes were proposed in this pull request?
    
    The original doc of `show` put methods for multiple classes together but 
the text only talks about `SparkDataFrame`. This PR tries to fix this problem.
    
    ## How was this patch tested?
    
    Manual test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14776 from junyangq/SPARK-FixShowDoc.
    
    (cherry picked from commit d2932a0e987132c694ed59515b7c77adaad052e6)
    Signed-off-by: Felix Cheung <[email protected]>

commit 9f924a01b27ebba56080c9ad01b84fff026d5dcd
Author: Sean Owen <[email protected]>
Date:   2016-08-24T19:04:09Z

    [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the 
same java used in the spark environment
    
    ## What changes were proposed in this pull request?
    
    Update to py4j 0.10.3 to enable JAVA_HOME support
    
    ## How was this patch tested?
    
    Pyspark tests
    
    Author: Sean Owen <[email protected]>
    
    Closes #14748 from srowen/SPARK-16781.
    
    (cherry picked from commit 0b3a4be92ca6b38eef32ea5ca240d9f91f68aa65)
    Signed-off-by: Sean Owen <[email protected]>

commit 43273377a38a9136ff5e56929630930f076af5af
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T23:00:04Z

    [SPARKR][MINOR] Add more examples to window function docs
    
    ## What changes were proposed in this pull request?
    
    This PR adds more examples to window function docs to make them more 
accessible to the users.
    
    It also fixes default value issues for `lag` and `lead`.
    
    ## How was this patch tested?
    
    Manual test, R unit test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14779 from junyangq/SPARKR-FixWindowFunctionDocs.
    
    (cherry picked from commit 18708f76c366c6e01b5865981666e40d8642ac20)
    Signed-off-by: Felix Cheung <[email protected]>

commit 9f363a690102f04a2a486853c1b89134455518bc
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T23:04:14Z

    [SPARKR][MINOR] Add installation message for remote master mode and improve 
other messages
    
    ## What changes were proposed in this pull request?
    
    This PR gives informative message to users when they try to connect to a 
remote master but don't have Spark package in their local machine.
    
    As a clarification, for now, automatic installation will only happen if 
they start SparkR in R console (rather than from sparkr-shell) and connect to 
local master. In the remote master mode, local Spark package is still needed, 
but we will not trigger the install.spark function because the versions have to 
match those on the cluster, which involves more user input. Instead, we here 
try to provide detailed message that may help the users.
    
    Some of the other messages have also been slightly changed.
    
    ## How was this patch tested?
    
    Manual test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14761 from junyangq/SPARK-16579-V1.
    
    (cherry picked from commit 3a60be4b15a5ab9b6e0c4839df99dac7738aa7fe)
    Signed-off-by: Felix Cheung <[email protected]>

commit 3258f27a881dfeb5ab8bae90c338603fa4b6f9d8
Author: hyukjinkwon <[email protected]>
Date:   2016-08-25T04:19:35Z

    [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write 
dateFormat/timestampFormat options for CSV and JSON
    
    ## What changes were proposed in this pull request?
    
    This PR backports https://github.com/apache/spark/pull/14279 to 2.0.
    
    ## How was this patch tested?
    
    Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing 
tests cover the default cases.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #14799 from HyukjinKwon/SPARK-16216-json-csv-backport.

commit aa57083af4cecb595bac09e437607d7142b54913
Author: Sameer Agarwal <[email protected]>
Date:   2016-08-25T04:24:24Z

    [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
    
    ## What changes were proposed in this pull request?
    
    Given that filters based on non-deterministic constraints shouldn't be 
pushed down in the query plan, unnecessarily inferring them is confusing and a 
source of potential bugs. This patch simplifies the inferring logic by simply 
ignoring them.
    
    ## How was this patch tested?
    
    Added a new test in `ConstraintPropagationSuite`.
    
    Author: Sameer Agarwal <[email protected]>
    
    Closes #14795 from sameeragarwal/deterministic-constraints.
    
    (cherry picked from commit ac27557eb622a257abeb3e8551f06ebc72f87133)
    Signed-off-by: Reynold Xin <[email protected]>

commit c1c498006849a7a0a785bc84316e7f494da5f8a8
Author: Sean Owen <[email protected]>
Date:   2016-08-25T08:45:49Z

    [SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocationInfo 
== null
    
    ## What changes were proposed in this pull request?
    
    Handle null from Hadoop getLocationInfo directly instead of catching (and 
logging) exception
    
    ## How was this patch tested?
    
    Jenkins tests
    
    Author: Sean Owen <[email protected]>
    
    Closes #14760 from srowen/SPARK-17193.
    
    (cherry picked from commit 2bcd5d5ce3eaf0eb1600a12a2b55ddb40927533b)
    Signed-off-by: Sean Owen <[email protected]>

commit fb1c697143a5bb2df69d9f2c9cbddc4eb526f047
Author: Liwei Lin <[email protected]>
Date:   2016-08-25T09:24:40Z

    [SPARK-17061][SPARK-17093][SQL] MapObjects` should make copies of 
unsafe-backed data
    
    Currently `MapObjects` does not make copies of unsafe-backed data, leading 
to problems like 
[SPARK-17061](https://issues.apache.org/jira/browse/SPARK-17061) 
[SPARK-17093](https://issues.apache.org/jira/browse/SPARK-17093).
    
    This patch makes `MapObjects` make copies of unsafe-backed data.
    
    Generated code - prior to this patch:
    ```java
    ...
    /* 295 */ if (isNull12) {
    /* 296 */   convertedArray1[loopIndex1] = null;
    /* 297 */ } else {
    /* 298 */   convertedArray1[loopIndex1] = value12;
    /* 299 */ }
    ...
    ```
    
    Generated code - after this patch:
    ```java
    ...
    /* 295 */ if (isNull12) {
    /* 296 */   convertedArray1[loopIndex1] = null;
    /* 297 */ } else {
    /* 298 */   convertedArray1[loopIndex1] = value12 instanceof UnsafeRow? 
value12.copy() : value12;
    /* 299 */ }
    ...
    ```
    
    Add a new test case which would fail without this patch.
    
    Author: Liwei Lin <[email protected]>
    
    Closes #14698 from lw-lin/mapobjects-copy.
    
    (cherry picked from commit e0b20f9f24d5c3304bf517a4dcfb0da93be5bc75)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 88481ea2169e0813cfc326eb1440ddaaf3110f4a
Author: Herman van Hovell <[email protected]>
Date:   2016-08-25T09:48:13Z

    Revert "[SPARK-17061][SPARK-17093][SQL] MapObjects` should make copies of 
unsafe-backed data"
    
    This reverts commit fb1c697143a5bb2df69d9f2c9cbddc4eb526f047.

commit 184e78b9d640259ba0720574de060841dc912872
Author: Liwei Lin <[email protected]>
Date:   2016-08-25T12:16:22Z

    [SPARK-17061][SPARK-17093][SQL][BACKPORT] MapObjects should make copies of 
unsafe-backed data
    
    ## What changes were proposed in this pull request?
    This PR backports https://github.com/apache/spark/pull/14698 to branch-2.0.
    
    See that PR for more details. All credit should go to lw-lin.
    
    Author: Herman van Hovell <[email protected]>
    Author: Liwei Lin <[email protected]>
    
    Closes #14806 from hvanhovell/SPARK-17061.

commit 48ecf3d0027e61d4d4ad6711ca2d4064a6b9c9e9
Author: gatorsmile <[email protected]>
Date:   2016-08-25T12:18:58Z

    [SPARK-16991][SPARK-17099][SPARK-17120][SQL] Fix Outer Join Elimination 
when Filter's isNotNull Constraints Unable to Filter Out All Null-supplying Rows
    
    ### What changes were proposed in this pull request?
    This PR is to fix an incorrect outer join elimination when filter's 
`isNotNull` constraints is unable to filter out all null-supplying rows. For 
example, `isnotnull(coalesce(b#227, c#238))`.
    
    Users can hit this error when they try to use `using/natural outer join`, 
which is converted to a normal outer join with a `coalesce` expression on the 
`using columns`. For example,
    ```Scala
        val a = Seq((1, 2), (2, 3)).toDF("a", "b")
        val b = Seq((2, 5), (3, 4)).toDF("a", "c")
        val c = Seq((3, 1)).toDF("a", "d")
        val ab = a.join(b, Seq("a"), "fullouter")
        ab.join(c, "a").explain(true)
    ```
    The dataframe `ab` is doing `using full-outer join`, which is converted to 
a normal outer join with a `coalesce` expression. Constraints inference 
generates a `Filter` with constraints `isnotnull(coalesce(b#227, c#238))`. 
Then, it triggers a wrong outer join elimination and generates a wrong result.
    ```
    Project [a#251, b#227, c#237, d#247]
    +- Join Inner, (a#251 = a#246)
       :- Project [coalesce(a#226, a#236) AS a#251, b#227, c#237]
       :  +- Join FullOuter, (a#226 = a#236)
       :     :- Project [_1#223 AS a#226, _2#224 AS b#227]
       :     :  +- LocalRelation [_1#223, _2#224]
       :     +- Project [_1#233 AS a#236, _2#234 AS c#237]
       :        +- LocalRelation [_1#233, _2#234]
       +- Project [_1#243 AS a#246, _2#244 AS d#247]
          +- LocalRelation [_1#243, _2#244]
    
    == Optimized Logical Plan ==
    Project [a#251, b#227, c#237, d#247]
    +- Join Inner, (a#251 = a#246)
       :- Project [coalesce(a#226, a#236) AS a#251, b#227, c#237]
       :  +- Filter isnotnull(coalesce(a#226, a#236))
       :     +- Join FullOuter, (a#226 = a#236)
       :        :- LocalRelation [a#226, b#227]
       :        +- LocalRelation [a#236, c#237]
       +- LocalRelation [a#246, d#247]
    ```
    
    **A note to the `Committer`**, please also give the credit to dongjoon-hyun 
who submitted another PR for fixing this issue. 
https://github.com/apache/spark/pull/14580
    
    ### How was this patch tested?
    Added test cases
    
    Author: gatorsmile <[email protected]>
    
    Closes #14661 from gatorsmile/fixOuterJoinElimination.
    
    (cherry picked from commit d2ae6399ee2f0524b88262735adbbcb2035de8fd)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 2b32a442dfbc8494c30dcb2f6869c9dc7f258ada
Author: gatorsmile <[email protected]>
Date:   2016-08-25T12:38:41Z

    [SPARK-17167][2.0][SQL] Issue Exceptions when Analyze Table on In-Memory 
Cataloged Tables
    
    ### What changes were proposed in this pull request?
    Currently, `Analyze Table` is only used for Hive-serde tables. We should 
issue exceptions in all the other cases. When the tables are data source 
tables, we issued an exception. However, when tables are In-Memory Cataloged 
tables, we do not issue any exception.
    
    This PR is to issue an exception when the tables are in-memory cataloged. 
For example,
    ```SQL
    CREATE TABLE tbl(a INT, b INT) USING parquet
    ```
    `tbl` is a `SimpleCatalogRelation` when the hive support is not enabled.
    
    ### How was this patch tested?
    Added two test cases. One of them is just to improve the test coverage when 
the analyzed table is data source tables.
    
    Author: gatorsmile <[email protected]>
    
    Closes #14781 from gatorsmile/analyzeInMemoryTable2.

commit 356a359de038e2e9d4d0cb7c0c5b493f7036d7c3
Author: Davies Liu <[email protected]>
Date:   2016-08-15T19:41:27Z

    [SPARK-16700][PYSPARK][SQL] create DataFrame from dict/Row with schema
    
    In 2.0, we verify the data type against schema for every row for safety, 
but with performance cost, this PR make it optional.
    
    When we verify the data type for StructType, it does not support all the 
types we support in infer schema (for example, dict), this PR fix that to make 
them consistent.
    
    For Row object which is created using named arguments, the order of fields 
are sorted by name, they may be not different than the order in provided 
schema, this PR fix that by ignore the order of fields in this case.
    
    Created regression tests for them.
    
    Author: Davies Liu <[email protected]>
    
    Closes #14469 from davies/py_dict.

commit 55db26245d69bb02b7d7d5f25029b1a1cd571644
Author: Alex Bozarth <[email protected]>
Date:   2016-08-25T16:54:55Z

    [SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData
    
    ## What changes were proposed in this pull request?
    
    This is a back port of #14673 addressing merge conflicts in package.scala 
that prevented a cherry-pick to `branch-2.0` when it was merged to `master`
    
    Since the History Server currently loads all application's data it can OOM 
if too many applications have a significant task count. This trims tasks by 
`spark.ui.retainedTasks` (default: 100000)
    
    ## How was this patch tested?
    
    Manual testing and dev/run-tests
    
    Author: Alex Bozarth <[email protected]>
    
    Closes #14794 from ajbozarth/spark15083-branch-2.0.

commit b3a44306a36d6c1e5583e85961966fa5cf4f7e9a
Author: [email protected] <[email protected]>
Date:   2016-08-25T19:11:27Z

    [SPARKR][BUILD] ignore cran-check.out under R folder
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    R add cran check which will generate the cran-check.out. This file should 
be ignored in git.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    Manual test it. Run clean test and git status to make sure the file is not 
included in git.
    
    Author: [email protected] <[email protected]>
    
    Closes #14774 from wangmiao1981/ignore.
    
    (cherry picked from commit 9958ac0ce2b9e451d400604767bef2fe12a3399d)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit ff2e270ebe3a74c19140cd96f96b7a62723002b1
Author: Josh Rosen <[email protected]>
Date:   2016-08-25T22:15:01Z

    [SPARK-17205] Literal.sql should handle Infinity and NaN
    
    This patch updates `Literal.sql` to properly generate SQL for `NaN` and 
`Infinity` float and double literals: these special values need to be handled 
differently from regular values, since simply appending a suffix to the value's 
`toString()` representation will not work for these values.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #14777 from JoshRosen/SPARK-17205.
    
    (cherry picked from commit 3e4c7db4d11c474457e7886a5501108ebab0cf6d)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 73014a2aa96b538d963f360fd41bac74f358ef46
Author: Michael Allman <[email protected]>
Date:   2016-08-25T23:29:04Z

    [SPARK-17231][CORE] Avoid building debug or trace log messages unless the 
respective log level is enabled
    
    This is simply a backport of #14798 to `branch-2.0`. This backport omits 
the change to `ExternalShuffleBlockHandler.java`. In `branch-2.0`, that file 
does not contain the log message that was patched in `master`.
    
    Author: Michael Allman <[email protected]>
    
    Closes #14811 from 
mallman/spark-17231-logging_perf_improvements-2.0_backport.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15535: [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactore...

Reply via email to