[GitHub] spark pull request #15501: Branch 2.0

lastbus Sat, 15 Oct 2016 07:02:55 -0700

GitHub user lastbus opened a pull request:

    https://github.com/apache/spark/pull/15501


    Branch 2.0

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15501
    
----
commit 0297896119e11f23da4b14f62f50ec72b5fac57f
Author: Junyang Qian <[email protected]>
Date:   2016-08-20T13:59:23Z

    [SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings.
    
    This PR tries to fix all the remaining "undocumented/duplicated arguments" 
warnings given by CRAN-check.
    
    One left is doc for R `stats::glm` exported in SparkR. To mute that 
warning, we have to also provide document for all arguments of that non-SparkR 
function.
    
    Some previous conversation is in #14558.
    
    R unit test and `check-cran.sh` script (with no-test).
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14705 from junyangq/SPARK-16508-master.
    
    (cherry picked from commit 01401e965b58f7e8ab615764a452d7d18f1d4bf0)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit e62b29f29f44196a1cbe13004ff4abfd8e5be1c1
Author: Dongjoon Hyun <[email protected]>
Date:   2016-08-21T20:07:47Z

    [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) 
OVER` correctly
    
    ## What changes were proposed in this pull request?
    
    Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in 
a bottom-up fashion. During that, `WindowExpression` is not covered properly. 
This PR adds the missing propagation logic.
    
    **Before**
    ```scala
    scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
    java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 
as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING)
    ```
    
    **After**
    ```scala
    scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
    
+----------------------------------------------------------------------------------------------+
    |count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING)|
    
+----------------------------------------------------------------------------------------------+
    |                                                                           
                  0|
    
+----------------------------------------------------------------------------------------------+
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a new test case.
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #14689 from dongjoon-hyun/SPARK-17098.
    
    (cherry picked from commit 91c2397684ab791572ac57ffb2a924ff058bb64f)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 49cc44de3ad5495b2690633791941aa00a62b553
Author: Davies Liu <[email protected]>
Date:   2016-08-22T08:16:03Z

    [SPARK-17115][SQL] decrease the threshold when split expressions
    
    ## What changes were proposed in this pull request?
    
    In 2.0, we change the threshold of splitting expressions from 16K to 64K, 
which cause very bad performance on wide table, because the generated method 
can't be JIT compiled by default (above the limit of 8K bytecode).
    
    This PR will decrease it to 1K, based on the benchmark results for a wide 
table with 400 columns of LongType.
    
    It also fix a bug around splitting expression in whole-stage codegen (it 
should not split them).
    
    ## How was this patch tested?
    
    Added benchmark suite.
    
    Author: Davies Liu <[email protected]>
    
    Closes #14692 from davies/split_exprs.
    
    (cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f)
    Signed-off-by: Wenchen Fan <[email protected]>

commit 2add45fabeb0ea4f7b17b5bc4910161370e72627
Author: Jagadeesan <[email protected]>
Date:   2016-08-22T08:30:31Z

    [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - 
UNSUPPORTED OPERATIONS]
    
    Changes in  Spark Stuctured Streaming doc in this link
    
https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations
    
    Author: Jagadeesan <[email protected]>
    
    Closes #14715 from jagadeesanas2/SPARK-17085.
    
    (cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65)
    Signed-off-by: Sean Owen <[email protected]>

commit 79195982a4c6f8b1a3e02069dea00049cc806574
Author: Junyang Qian <[email protected]>
Date:   2016-08-22T17:03:48Z

    [SPARKR][MINOR] Fix Cache Folder Path in Windows
    
    ## What changes were proposed in this pull request?
    
    This PR tries to fix the scheme of local cache folder in Windows. The name 
of the environment variable should be `LOCALAPPDATA` rather than 
`%LOCALAPPDATA%`.
    
    ## How was this patch tested?
    
    Manual test in Windows 7.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14743 from junyangq/SPARKR-FixWindowsInstall.
    
    (cherry picked from commit 209e1b3c0683a9106428e269e5041980b6cc327f)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 94eff08757cee70c5b31fff7095bbb1e6ebc7ecf
Author: Sean Owen <[email protected]>
Date:   2016-08-22T18:15:53Z

    [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6
    
    ## What changes were proposed in this pull request?
    
    Collect GC discussion in one section, and documenting findings about G1 GC 
heap region size.
    
    ## How was this patch tested?
    
    Jekyll doc build
    
    Author: Sean Owen <[email protected]>
    
    Closes #14732 from srowen/SPARK-16320.
    
    (cherry picked from commit 342278c09cf6e79ed4f63422988a6bbd1e7d8a91)
    Signed-off-by: Yin Huai <[email protected]>

commit 6dcc1a3f0cc8f2ed71f7bb6b1493852a58259d2f
Author: Shivaram Venkataraman <[email protected]>
Date:   2016-08-22T19:53:52Z

    [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
    
    ## What changes were proposed in this pull request?
    
    This change adds Xiangrui Meng and Felix Cheung to the maintainers field in 
the package description.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #14758 from shivaram/sparkr-maintainers.
    
    (cherry picked from commit 6f3cd36f93c11265449fdce3323e139fec8ab22d)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8
Author: Eric Liang <[email protected]>
Date:   2016-08-22T22:48:35Z

    [SPARK-17162] Range does not support SQL generation
    
    ## What changes were proposed in this pull request?
    
    The range operator previously didn't support SQL generation, which made it 
not possible to use in views.
    
    ## How was this patch tested?
    
    Unit tests.
    
    cc hvanhovell
    
    Author: Eric Liang <[email protected]>
    
    Closes #14724 from ericl/spark-17162.
    
    (cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35)
    Signed-off-by: Reynold Xin <[email protected]>

commit b65b041af8b64413c7d460d4ea110b2044d6f36e
Author: Felix Cheung <[email protected]>
Date:   2016-08-22T22:53:10Z

    [SPARK-16508][SPARKR] doc updates and more CRAN check fixes
    
    replace ``` ` ``` in code doc with `\code{thing}`
    remove added `...` for drop(DataFrame)
    fix remaining CRAN check warnings
    
    create doc with knitr
    
    junyangq
    
    Author: Felix Cheung <[email protected]>
    
    Closes #14734 from felixcheung/rdoccleanup.
    
    (cherry picked from commit 71afeeea4ec8e67edc95b5d504c557c88a2598b9)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit ff2f873800fcc3d699e52e60fd0e69eb01d12503
Author: Eric Liang <[email protected]>
Date:   2016-08-22T23:32:14Z

    [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in 
block manager replication
    
    ## What changes were proposed in this pull request?
    
    This is a straightforward clone of JoshRosen 's original patch. I have 
follow-up changes to fix block replication for repl-defined classes as well, 
but those appear to be flaking tests so I'm going to leave that for SPARK-17042
    
    ## How was this patch tested?
    
    End-to-end test in ReplSuite (also more tests in DistributedSuite from the 
original patch).
    
    Author: Eric Liang <[email protected]>
    
    Closes #14311 from ericl/spark-16550.
    
    (cherry picked from commit 8e223ea67acf5aa730ccf688802f17f6fc10907c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 225898961bc4bc71d56f33c027adbb2d0929ae5a
Author: Shivaram Venkataraman <[email protected]>
Date:   2016-08-23T00:09:32Z

    [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    This change adds CRAN documentation checks to be run as a part of 
`R/run-tests.sh` . As this script is also used by Jenkins this means that we 
will get documentation checks on every PR going forward.
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #14759 from shivaram/sparkr-cran-jenkins.
    
    (cherry picked from commit 920806ab272ba58a369072a5eeb89df5e9b470a6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit eaea1c86b897d302107a9b6833a27a2b24ca31a0
Author: Cheng Lian <[email protected]>
Date:   2016-08-23T01:11:47Z

    [SPARK-17182][SQL] Mark Collect as non-deterministic
    
    ## What changes were proposed in this pull request?
    
    This PR marks the abstract class `Collect` as non-deterministic since the 
results of `CollectList` and `CollectSet` depend on the actual order of input 
rows.
    
    ## How was this patch tested?
    
    Existing test cases should be enough.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #14749 from liancheng/spark-17182-non-deterministic-collect.
    
    (cherry picked from commit 2cdd92a7cd6f85186c846635b422b977bdafbcdd)
    Signed-off-by: Wenchen Fan <[email protected]>

commit d16f9a0b7c464728d7b11899740908e23820a797
Author: Felix Cheung <[email protected]>
Date:   2016-08-23T03:15:03Z

    [SPARKR][MINOR] Update R DESCRIPTION file
    
    ## What changes were proposed in this pull request?
    
    Update DESCRIPTION
    
    ## How was this patch tested?
    
    Run install and CRAN tests
    
    Author: Felix Cheung <[email protected]>
    
    Closes #14764 from felixcheung/rpackagedescription.
    
    (cherry picked from commit d2b3d3e63e1a9217de6ef507c350308017664a62)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 811a2cef03647c5be29fef522c423921c79b1bc3
Author: Davies Liu <[email protected]>
Date:   2016-08-23T16:45:13Z

    [SPARK-13286] [SQL] add the next expression of SQLException as cause
    
    Some JDBC driver (for example PostgreSQL) does not use the underlying 
exception as cause, but have another APIs (getNextException) to access that, so 
it it's included in the error logging, making us hard to find the root cause, 
especially in batch mode.
    
    This PR will pull out the next exception and add it as cause (if it's 
different) or suppressed (if there is another different cause).
    
    Can't reproduce this on the default JDBC driver, so did not add a 
regression test.
    
    Author: Davies Liu <[email protected]>
    
    Closes #14722 from davies/keep_cause.
    
    (cherry picked from commit 9afdfc94f49395e69a7959e881c19d787ce00c3e)
    Signed-off-by: Davies Liu <[email protected]>

commit cc4018996740b3a68d4a557615c59c67b8996ebb
Author: Junyang Qian <[email protected]>
Date:   2016-08-23T18:22:32Z

    [SPARKR][MINOR] Remove reference link for common Windows environment 
variables
    
    ## What changes were proposed in this pull request?
    
    The PR removes reference link in the doc for environment variables for 
common Windows folders. The cran check gave code 503: service unavailable on 
the original link.
    
    ## How was this patch tested?
    
    Manual check.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14767 from junyangq/SPARKR-RemoveLink.
    
    (cherry picked from commit 8fd63e808e15c8a7e78fef847183c86f332daa91)
    Signed-off-by: Felix Cheung <[email protected]>

commit a2a7506d06fe9d878d55cf5498f5bfef9a69171c
Author: hyukjinkwon <[email protected]>
Date:   2016-08-23T20:21:43Z

    [MINOR][DOC] Use standard quotes instead of "curly quote" marks from Mac in 
structured streaming programming guides
    
    This PR fixes curly quotes (`â` and `â` ) to standard quotes (`"`).
    
    This will be a actual problem when users copy and paste the examples. This 
would not work.
    
    This seems only happening in `structured-streaming-programming-guide.md`.
    
    Manually built.
    
    This will change some examples to be correctly marked down as below:
    
    ![2016-08-23 3 24 
13](https://cloud.githubusercontent.com/assets/6477701/17882878/2a38332e-694a-11e6-8e84-76bdb89151e0.png)
    
    to
    
    ![2016-08-23 3 26 
06](https://cloud.githubusercontent.com/assets/6477701/17882888/376eaa28-694a-11e6-8b88-32ea83997037.png)
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #14770 from HyukjinKwon/minor-quotes.
    
    (cherry picked from commit 588559911de94bbe0932526ee1e1dd36a581a423)
    Signed-off-by: Sean Owen <[email protected]>

commit a772b4b5dea46cda1204a50a4909d40f8933ad77
Author: Josh Rosen <[email protected]>
Date:   2016-08-23T20:31:58Z

    [SPARK-17194] Use single quotes when generating SQL for string literals
    
    When Spark emits SQL for a string literal, it should wrap the string in 
single quotes, not double quotes. Databases which adhere more strictly to the 
ANSI SQL standards, such as Postgres, allow only single-quotes to be used for 
denoting string literals (see http://stackoverflow.com/a/1992331/590203).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #14763 from JoshRosen/SPARK-17194.
    
    (cherry picked from commit bf8ff833e30b39e5e5e35ba8dcac31b79323838c)
    Signed-off-by: Herman van Hovell <[email protected]>

commit a6e6a047bb9215df55b009957d4c560624d886fc
Author: Weiqing Yang <[email protected]>
Date:   2016-08-24T06:44:45Z

    [MINOR][SQL] Remove implemented functions from comments of 
'HiveSessionCatalog.scala'
    
    ## What changes were proposed in this pull request?
    This PR removes implemented functions from comments of 
`HiveSessionCatalog.scala`: `java_method`, `posexplode`, `str_to_map`.
    
    ## How was this patch tested?
    Manual.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #14769 from Sherry302/cleanComment.
    
    (cherry picked from commit b9994ad05628077016331e6b411fbc09017b1e63)
    Signed-off-by: Reynold Xin <[email protected]>

commit df87f161c9e40a49235ea722f6a662a488b41c4c
Author: Wenchen Fan <[email protected]>
Date:   2016-08-24T06:46:09Z

    [SPARK-17186][SQL] remove catalog table type INDEX
    
    ## What changes were proposed in this pull request?
    
    Actually Spark SQL doesn't support index, the catalog table type `INDEX` is 
from Hive. However, most operations in Spark SQL can't handle index table, e.g. 
create table, alter table, etc.
    
    Logically index table should be invisible to end users, and Hive also 
generates special table name for index table to avoid users accessing it 
directly. Hive has special SQL syntax to create/show/drop index tables.
    
    At Spark SQL side, although we can describe index table directly, but the 
result is unreadable, we should use the dedicated SQL syntax to do it(e.g. 
`SHOW INDEX ON tbl`). Spark SQL can also read index table directly, but the 
result is always empty.(Can hive read index table directly?)
    
    This PR remove the table type `INDEX`, to make it clear that Spark SQL 
doesn't support index currently.
    
    ## How was this patch tested?
    
    existing tests.
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #14752 from cloud-fan/minor2.
    
    (cherry picked from commit 52fa45d62a5a0bc832442f38f9e634c5d8e29e08)
    Signed-off-by: Reynold Xin <[email protected]>

commit ce7dce1755a8d36ec7346adc3de26d8fdc4f05e9
Author: Weiqing Yang <[email protected]>
Date:   2016-08-24T09:12:44Z

    [MINOR][BUILD] Fix Java CheckStyle Error
    
    As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing 
list), besides the critical bugs, it's better to fix the code style errors 
before the release.
    
    Before:
    ```
    ./dev/lint-java
    Checkstyle checks failed at following occurrences:
    [ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525]
 (sizes) LineLength: Line is longer than 100 characters (found 119).
    [ERROR] 
src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64]
 (sizes) LineLength: Line is longer than 100 characters (found 103).
    ```
    After:
    ```
    ./dev/lint-java
    Using `mvn` from path: /usr/local/bin/mvn
    Checkstyle checks passed.
    ```
    Manual.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #14768 from Sherry302/fixjavastyle.
    
    (cherry picked from commit 673a80d2230602c9e6573a23e35fb0f6b832bfca)
    Signed-off-by: Sean Owen <[email protected]>

commit 33d79b58735770ac613540c21095a1e404f065b0
Author: VinceShieh <[email protected]>
Date:   2016-08-24T09:16:58Z

    [SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer 
when some quantiles are duplicated
    
    ## What changes were proposed in this pull request?
    
    In cases when QuantileDiscretizerSuite is called upon a numeric array with 
duplicated elements,  we will  take the unique elements generated from 
approxQuantiles as input for Bucketizer.
    
    ## How was this patch tested?
    
    An unit test is added in QuantileDiscretizerSuite
    
    QuantileDiscretizer.fit will throw an illegal exception when calling 
setSplits on a list of splits
    with duplicated elements. Bucketizer.setSplits should only accept either a 
numeric vector of two
    or more unique cut points, although that may produce less number of buckets 
than requested.
    
    Signed-off-by: VinceShieh <vincent.xieintel.com>
    
    Author: VinceShieh <[email protected]>
    
    Closes #14747 from VinceShieh/SPARK-17086.
    
    (cherry picked from commit 92c0eaf348b42b3479610da0be761013f9d81c54)
    Signed-off-by: Sean Owen <[email protected]>

commit 29091d7cd60c20bf019dc9c1625a22e80ea50928
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T17:40:09Z

    [SPARKR][MINOR] Fix doc for show method
    
    ## What changes were proposed in this pull request?
    
    The original doc of `show` put methods for multiple classes together but 
the text only talks about `SparkDataFrame`. This PR tries to fix this problem.
    
    ## How was this patch tested?
    
    Manual test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14776 from junyangq/SPARK-FixShowDoc.
    
    (cherry picked from commit d2932a0e987132c694ed59515b7c77adaad052e6)
    Signed-off-by: Felix Cheung <[email protected]>

commit 9f924a01b27ebba56080c9ad01b84fff026d5dcd
Author: Sean Owen <[email protected]>
Date:   2016-08-24T19:04:09Z

    [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the 
same java used in the spark environment
    
    ## What changes were proposed in this pull request?
    
    Update to py4j 0.10.3 to enable JAVA_HOME support
    
    ## How was this patch tested?
    
    Pyspark tests
    
    Author: Sean Owen <[email protected]>
    
    Closes #14748 from srowen/SPARK-16781.
    
    (cherry picked from commit 0b3a4be92ca6b38eef32ea5ca240d9f91f68aa65)
    Signed-off-by: Sean Owen <[email protected]>

commit 43273377a38a9136ff5e56929630930f076af5af
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T23:00:04Z

    [SPARKR][MINOR] Add more examples to window function docs
    
    ## What changes were proposed in this pull request?
    
    This PR adds more examples to window function docs to make them more 
accessible to the users.
    
    It also fixes default value issues for `lag` and `lead`.
    
    ## How was this patch tested?
    
    Manual test, R unit test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14779 from junyangq/SPARKR-FixWindowFunctionDocs.
    
    (cherry picked from commit 18708f76c366c6e01b5865981666e40d8642ac20)
    Signed-off-by: Felix Cheung <[email protected]>

commit 9f363a690102f04a2a486853c1b89134455518bc
Author: Junyang Qian <[email protected]>
Date:   2016-08-24T23:04:14Z

    [SPARKR][MINOR] Add installation message for remote master mode and improve 
other messages
    
    ## What changes were proposed in this pull request?
    
    This PR gives informative message to users when they try to connect to a 
remote master but don't have Spark package in their local machine.
    
    As a clarification, for now, automatic installation will only happen if 
they start SparkR in R console (rather than from sparkr-shell) and connect to 
local master. In the remote master mode, local Spark package is still needed, 
but we will not trigger the install.spark function because the versions have to 
match those on the cluster, which involves more user input. Instead, we here 
try to provide detailed message that may help the users.
    
    Some of the other messages have also been slightly changed.
    
    ## How was this patch tested?
    
    Manual test.
    
    Author: Junyang Qian <[email protected]>
    
    Closes #14761 from junyangq/SPARK-16579-V1.
    
    (cherry picked from commit 3a60be4b15a5ab9b6e0c4839df99dac7738aa7fe)
    Signed-off-by: Felix Cheung <[email protected]>

commit 3258f27a881dfeb5ab8bae90c338603fa4b6f9d8
Author: hyukjinkwon <[email protected]>
Date:   2016-08-25T04:19:35Z

    [SPARK-16216][SQL][BRANCH-2.0] Backport Read/write 
dateFormat/timestampFormat options for CSV and JSON
    
    ## What changes were proposed in this pull request?
    
    This PR backports https://github.com/apache/spark/pull/14279 to 2.0.
    
    ## How was this patch tested?
    
    Unit tests were added in `CSVSuite` and `JsonSuite`. For JSON, existing 
tests cover the default cases.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #14799 from HyukjinKwon/SPARK-16216-json-csv-backport.

commit aa57083af4cecb595bac09e437607d7142b54913
Author: Sameer Agarwal <[email protected]>
Date:   2016-08-25T04:24:24Z

    [SPARK-17228][SQL] Not infer/propagate non-deterministic constraints
    
    ## What changes were proposed in this pull request?
    
    Given that filters based on non-deterministic constraints shouldn't be 
pushed down in the query plan, unnecessarily inferring them is confusing and a 
source of potential bugs. This patch simplifies the inferring logic by simply 
ignoring them.
    
    ## How was this patch tested?
    
    Added a new test in `ConstraintPropagationSuite`.
    
    Author: Sameer Agarwal <[email protected]>
    
    Closes #14795 from sameeragarwal/deterministic-constraints.
    
    (cherry picked from commit ac27557eb622a257abeb3e8551f06ebc72f87133)
    Signed-off-by: Reynold Xin <[email protected]>

commit c1c498006849a7a0a785bc84316e7f494da5f8a8
Author: Sean Owen <[email protected]>
Date:   2016-08-25T08:45:49Z

    [SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocationInfo 
== null
    
    ## What changes were proposed in this pull request?
    
    Handle null from Hadoop getLocationInfo directly instead of catching (and 
logging) exception
    
    ## How was this patch tested?
    
    Jenkins tests
    
    Author: Sean Owen <[email protected]>
    
    Closes #14760 from srowen/SPARK-17193.
    
    (cherry picked from commit 2bcd5d5ce3eaf0eb1600a12a2b55ddb40927533b)
    Signed-off-by: Sean Owen <[email protected]>

commit fb1c697143a5bb2df69d9f2c9cbddc4eb526f047
Author: Liwei Lin <[email protected]>
Date:   2016-08-25T09:24:40Z

    [SPARK-17061][SPARK-17093][SQL] MapObjects` should make copies of 
unsafe-backed data
    
    Currently `MapObjects` does not make copies of unsafe-backed data, leading 
to problems like 
[SPARK-17061](https://issues.apache.org/jira/browse/SPARK-17061) 
[SPARK-17093](https://issues.apache.org/jira/browse/SPARK-17093).
    
    This patch makes `MapObjects` make copies of unsafe-backed data.
    
    Generated code - prior to this patch:
    ```java
    ...
    /* 295 */ if (isNull12) {
    /* 296 */   convertedArray1[loopIndex1] = null;
    /* 297 */ } else {
    /* 298 */   convertedArray1[loopIndex1] = value12;
    /* 299 */ }
    ...
    ```
    
    Generated code - after this patch:
    ```java
    ...
    /* 295 */ if (isNull12) {
    /* 296 */   convertedArray1[loopIndex1] = null;
    /* 297 */ } else {
    /* 298 */   convertedArray1[loopIndex1] = value12 instanceof UnsafeRow? 
value12.copy() : value12;
    /* 299 */ }
    ...
    ```
    
    Add a new test case which would fail without this patch.
    
    Author: Liwei Lin <[email protected]>
    
    Closes #14698 from lw-lin/mapobjects-copy.
    
    (cherry picked from commit e0b20f9f24d5c3304bf517a4dcfb0da93be5bc75)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 88481ea2169e0813cfc326eb1440ddaaf3110f4a
Author: Herman van Hovell <[email protected]>
Date:   2016-08-25T09:48:13Z

    Revert "[SPARK-17061][SPARK-17093][SQL] MapObjects` should make copies of 
unsafe-backed data"
    
    This reverts commit fb1c697143a5bb2df69d9f2c9cbddc4eb526f047.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15501: Branch 2.0

Reply via email to