[GitHub] spark pull request #17262: [SPARK-17261][SQL] Fixed missing closing bracket ...

elviento Sat, 11 Mar 2017 12:44:28 -0800

GitHub user elviento opened a pull request:

    https://github.com/apache/spark/pull/17262


    [SPARK-17261][SQL] Fixed missing closing bracket 
spark/sql/DataFrameSuite.scala

    ## What changes were proposed in this pull request?
    
    Fixed missing closing bracket in branch-2.0 line:1704 of 
DataFrameSuite.scala which was found during ./dev/make-distribution.sh
    
    /spark/sql/core/target/scala-2.11/test-classes...
    
/spark/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:1704: 
Missing closing brace `}' assumed here
    [error] }
    [error] ^
    [error] one error found
    [error] Compile failed at Mar 11, 2017 2:36:12 PM [0.610s]
    
    ## How was this patch tested?
    
    Tested:
    $SPARK_SRC/spark/sql/core/target/scala-2.11/test-classes...
    
    Successful Build:
    $SPARK_SRC/dev/make-distribution.sh --tgz -Psparkr -Phadoop-2.7 -Phive 
-Phive-thriftserver -Pyarn

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/elviento/spark fix-dataframesuite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17262
    
----
commit f4594900d86bb39358ff19047dfa8c1e4b78aa6b
Author: Andrew Mills <[email protected]>
Date:   2016-09-26T20:41:10Z

    [Docs] Update spark-standalone.md to fix link
    
    Corrected a link to the configuration.html page, it was pointing to a page 
that does not exist (configurations.html).
    
    Documentation change, verified in preview.
    
    Author: Andrew Mills <[email protected]>
    
    Closes #15244 from ammills01/master.
    
    (cherry picked from commit 00be16df642317137f17d2d7d2887c41edac3680)
    Signed-off-by: Andrew Or <[email protected]>

commit 98bbc4410181741d903a703eac289408cb5b2c5e
Author: Josh Rosen <[email protected]>
Date:   2016-09-27T21:14:27Z

    [SPARK-17618] Guard against invalid comparisons between UnsafeRow and other 
formats
    
    This patch ports changes from #15185 to Spark 2.x. In that patch, a  
correctness bug in Spark 1.6.x which was caused by an invalid `equals()` 
comparison between an `UnsafeRow` and another row of a different format. Spark 
2.x is not affected by that specific correctness bug but it can still reap the 
error-prevention benefits of that patch's changes, which modify  
``UnsafeRow.equals()` to throw an IllegalArgumentException if it is called with 
an object that is not an `UnsafeRow`.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #15265 from JoshRosen/SPARK-17618-master.
    
    (cherry picked from commit 2f84a686604b298537bfd4d087b41594d2aa7ec6)
    Signed-off-by: Josh Rosen <[email protected]>

commit 2cd327ef5e4c3f6b8468ebb2352479a1686b7888
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-09-27T23:00:39Z

    [SPARK-17056][CORE] Fix a wrong assert regarding unroll memory in 
MemoryStore
    
    ## What changes were proposed in this pull request?
    
    There is an assert in MemoryStore's putIteratorAsValues method which is 
used to check if unroll memory is not released too much. This assert looks 
wrong.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #14642 from viirya/fix-unroll-memory.
    
    (cherry picked from commit e7bce9e1876de6ee975ccc89351db58119674aef)
    Signed-off-by: Josh Rosen <[email protected]>

commit 1b02f8820ddaf3f2a0e7acc9a7f27afc20683cca
Author: Josh Rosen <[email protected]>
Date:   2016-09-28T07:59:00Z

    [SPARK-17666] Ensure that RecordReaders are closed by data source file 
scans (backport)
    
    This is a branch-2.0 backport of #15245.
    
    ## What changes were proposed in this pull request?
    
    This patch addresses a potential cause of resource leaks in data source 
file scans. As reported in 
[SPARK-17666](https://issues.apache.org/jira/browse/SPARK-17666), tasks which 
do not fully-consume their input may cause file handles / network connections 
(e.g. S3 connections) to be leaked. Spark's `NewHadoopRDD` uses a TaskContext 
callback to [close its record 
readers](https://github.com/apache/spark/blame/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L208),
 but the new data source file scans will only close record readers once their 
iterators are fully-consumed.
    
    This patch modifies `RecordReaderIterator` and `HadoopFileLinesReader` to 
add `close()` methods and modifies all six implementations of 
`FileFormat.buildReader()` to register TaskContext task completion callbacks to 
guarantee that cleanup is eventually performed.
    
    ## How was this patch tested?
    
    Tested manually for now.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #15271 from JoshRosen/SPARK-17666-backport.

commit 4d73d5cd82ebc980f996c78f9afb8a97418ab7ab
Author: hyukjinkwon <[email protected]>
Date:   2016-09-28T10:19:04Z

    [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to fix wrongly indented examples in PySpark documentation
    
    ```
    -        >>> json_sdf = spark.readStream.format("json")\
    -                                       .schema(sdf_schema)\
    -                                       .load(tempfile.mkdtemp())
    +        >>> json_sdf = spark.readStream.format("json") \\
    +        ...     .schema(sdf_schema) \\
    +        ...     .load(tempfile.mkdtemp())
    ```
    
    ```
    -        people.filter(people.age > 30).join(department, people.deptId == 
department.id)\
    +        people.filter(people.age > 30).join(department, people.deptId == 
department.id) \\
    ```
    
    ```
    -        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, 1.23), 
(2, 4.56)])), \
    -                        LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    +        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, 1.23), 
(2, 4.56)])),
    +        ...             LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    ```
    
    ```
    -        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, -1.23), 
(2, 4.56e-7)])), \
    -                        LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    +        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, -1.23), 
(2, 4.56e-7)])),
    +        ...             LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    ```
    
    ```
    -        ...      for x in iterator:
    -        ...           print(x)
    +        ...     for x in iterator:
    +        ...          print(x)
    ```
    
    ## How was this patch tested?
    
    Manually tested.
    
    **Before**
    
    ![2016-09-26 8 36 
02](https://cloud.githubusercontent.com/assets/6477701/18834471/05c7a478-8431-11e6-94bb-09aa37b12ddb.png)
    
    ![2016-09-26 9 22 
16](https://cloud.githubusercontent.com/assets/6477701/18834472/06c8735c-8431-11e6-8775-78631eab0411.png)
    
    <img width="601" alt="2016-09-27 2 29 27" 
src="https://cloud.githubusercontent.com/assets/6477701/18861294/29c0d5b4-84bf-11e6-99c5-3c9d913c125d.png";>
    
    <img width="1056" alt="2016-09-27 2 29 58" 
src="https://cloud.githubusercontent.com/assets/6477701/18861298/31694cd8-84bf-11e6-9e61-9888cb8c2089.png";>
    
    <img width="1079" alt="2016-09-27 2 30 05" 
src="https://cloud.githubusercontent.com/assets/6477701/18861301/359722da-84bf-11e6-97f9-5f5365582d14.png";>
    
    **After**
    
    ![2016-09-26 9 29 
47](https://cloud.githubusercontent.com/assets/6477701/18834467/0367f9da-8431-11e6-86d9-a490d3297339.png)
    
    ![2016-09-26 9 30 
24](https://cloud.githubusercontent.com/assets/6477701/18834463/f870fae0-8430-11e6-9482-01fc47898492.png)
    
    <img width="515" alt="2016-09-27 2 28 19" 
src="https://cloud.githubusercontent.com/assets/6477701/18861305/3ff88b88-84bf-11e6-902c-9f725e8a8b10.png";>
    
    <img width="652" alt="2016-09-27 3 50 59" 
src="https://cloud.githubusercontent.com/assets/6477701/18863053/592fbc74-84ca-11e6-8dbf-99cf57947de8.png";>
    
    <img width="709" alt="2016-09-27 3 51 03" 
src="https://cloud.githubusercontent.com/assets/6477701/18863060/601607be-84ca-11e6-80aa-a401df41c321.png";>
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15242 from HyukjinKwon/minor-example-pyspark.
    
    (cherry picked from commit 2190037757a81d3172f75227f7891d968e1f0d90)
    Signed-off-by: Sean Owen <[email protected]>

commit 4c694e452278e46231720e778a80c586b9e565f1
Author: w00228970 <[email protected]>
Date:   2016-09-28T19:02:59Z

    [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch 
failure
    
    | Time        |Thread 1 ,  Job1          | Thread 2 ,  Job2  |
    |:-------------:|:-------------:|:-----:|
    | 1 | abort stage due to FetchFailed |  |
    | 2 | failedStages += failedStage |    |
    | 3 |      |  task failed due to  FetchFailed |
    | 4 |      |  can not post ResubmitFailedStages because failedStages is not 
empty |
    
    Then job2 of thread2 never resubmit the failed stage and hang.
    
    We should not add the failedStages when abortStage for fetch failure
    
    added unit test
    
    Author: w00228970 <[email protected]>
    Author: wangfei <[email protected]>
    
    Closes #15213 from scwf/dag-resubmit.
    
    (cherry picked from commit 46d1203bf2d01b219c4efc7e0e77a844c0c664da)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit d358298f1082edd31489a1b08f428c8e60278d69
Author: Eric Liang <[email protected]>
Date:   2016-09-28T23:19:06Z

    [SPARK-17673][SQL] Incorrect exchange reuse with RowDataSourceScan 
(backport)
    
    This backports https://github.com/apache/spark/pull/15273 to branch-2.0
    
    Also verified the test passes after the patch was applied. rxin
    
    Author: Eric Liang <[email protected]>
    
    Closes #15282 from ericl/spark-17673-2.

commit 0a69477a10adb3969a20ae870436299ef5152788
Author: Herman van Hovell <[email protected]>
Date:   2016-09-28T23:25:10Z

    [SPARK-17641][SQL] Collect_list/Collect_set should not collect null values.
    
    ## What changes were proposed in this pull request?
    We added native versions of `collect_set` and `collect_list` in Spark 2.0. 
These currently also (try to) collect null values, this is different from the 
original Hive implementation. This PR fixes this by adding a null check to the 
`Collect.update` method.
    
    ## How was this patch tested?
    Added a regression test to `DataFrameAggregateSuite`.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #15208 from hvanhovell/SPARK-17641.
    
    (cherry picked from commit 7d09232028967978d9db314ec041a762599f636b)
    Signed-off-by: Reynold Xin <[email protected]>

commit 933d2c1ea4e5f5c4ec8d375b5ccaa4577ba4be38
Author: Patrick Wendell <[email protected]>
Date:   2016-09-28T23:27:45Z

    Preparing Spark release v2.0.1-rc4

commit 7d612a7d5277183d3bee3882a687c76dc8ea0e9a
Author: Patrick Wendell <[email protected]>
Date:   2016-09-28T23:27:54Z

    Preparing development version 2.0.2-SNAPSHOT

commit ca8130050964fac8baa568918f0b67c44a7a2518
Author: Takeshi YAMAMURO <[email protected]>
Date:   2016-09-29T12:26:03Z

    [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis
    
    ## What changes were proposed in this pull request?
    This pr is just to fix the document of `spark-kinesis-integration`.
    Since `SPARK-17418` prevented all the kinesis stuffs (including kinesis 
example code)
    from publishing,  `bin/run-example streaming.KinesisWordCountASL` and 
`bin/run-example streaming.JavaKinesisWordCountASL` does not work.
    Instead, it fetches the kinesis jar from the Spark Package.
    
    Author: Takeshi YAMAMURO <[email protected]>
    
    Closes #15260 from maropu/DocFixKinesis.
    
    (cherry picked from commit b2e9731ca494c0c60d571499f68bb8306a3c9fe5)
    Signed-off-by: Sean Owen <[email protected]>

commit 7ffafa3bfecb8bc92b79eddea1ca18166efd3385
Author: èæå <[email protected]>
Date:   2016-07-13T16:21:27Z

    [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown 
predicates correctly in non-deterministic condition.
    
    ## What changes were proposed in this pull request?
    
    Currently our Optimizer may reorder the predicates to run them more 
efficient, but in non-deterministic condition, change the order between 
deterministic parts and non-deterministic parts may change the number of input 
rows. For example:
    ```SELECT a FROM t WHERE rand() < 0.1 AND a = 1```
    And
    ```SELECT a FROM t WHERE a = 1 AND rand() < 0.1```
    may call rand() for different times and therefore the output rows differ.
    
    This PR improved this condition by checking whether the predicate is placed 
before any non-deterministic predicates.
    
    ## How was this patch tested?
    
    Expanded related testcases in FilterPushdownSuite.
    
    Author: èæå <[email protected]>
    
    Closes #14012 from jiangxb1987/ppd.
    
    (cherry picked from commit f376c37268848dbb4b2fb57677e22ef2bf207b49)
    Signed-off-by: Josh Rosen <[email protected]>

commit f7839e47c3bda86d61c3b2be72c168aab4a5674f
Author: Josh Rosen <[email protected]>
Date:   2016-09-29T02:03:05Z

    [SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath 
aggregates
    
    ## What changes were proposed in this pull request?
    
    This patch fixes a minor correctness issue impacting the pushdown of 
filters beneath aggregates. Specifically, if a filter condition references no 
grouping or aggregate columns (e.g. `WHERE false`) then it would be incorrectly 
pushed beneath an aggregate.
    
    Intuitively, the only case where you can push a filter beneath an aggregate 
is when that filter is deterministic and is defined over the grouping columns / 
expressions, since in that case the filter is acting to exclude entire groups 
from the query (like a `HAVING` clause). The existing code would only push 
deterministic filters beneath aggregates when all of the filter's references 
were grouping columns, but this logic missed the case where a filter has no 
references. For example, `WHERE false` is deterministic but is independent of 
the actual data.
    
    This patch fixes this minor bug by adding a new check to ensure that we 
don't push filters beneath aggregates when those filters don't reference any 
columns.
    
    ## How was this patch tested?
    
    New regression test in FilterPushdownSuite.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #15289 from JoshRosen/SPARK-17712.
    
    (cherry picked from commit 37eb9184f1e9f1c07142c66936671f4711ef407d)
    Signed-off-by: Josh Rosen <[email protected]>

commit 7c9450b007205958984f39a881415cdbe75e0c34
Author: Gang Wu <[email protected]>
Date:   2016-09-29T19:51:05Z

    [SPARK-17672] Spark 2.0 history server web Ui takes too long for a single 
application
    
    Added a new API getApplicationInfo(appId: String) in class 
ApplicationHistoryProvider and class SparkUI to get app info. In this change, 
FsHistoryProvider can directly fetch one app info in O(1) time complexity 
compared to O(n) before the change which used an Iterator.find() interface.
    
    Both ApplicationCache and OneApplicationResource classes adopt this new api.
    
     manual tests
    
    Author: Gang Wu <[email protected]>
    
    Closes #15247 from wgtmac/SPARK-17671.
    
    (cherry picked from commit cb87b3ced9453b5717fa8e8637b97a2f3f25fdd7)
    Signed-off-by: Andrew Or <[email protected]>

commit 0cdd7370a61618d042417ee387a3c32ee5c924e6
Author: Bjarne Fruergaard <[email protected]>
Date:   2016-09-29T22:39:57Z

    [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with 
SparseVector
    
    ## What changes were proposed in this pull request?
    
    * changes the implementation of gemv with transposed SparseMatrix and 
SparseVector both in mllib-local and mllib (identical)
    * adds a test that was failing before this change, but succeeds with these 
changes.
    
    The problem in the previous implementation was that it only increments `i`, 
that is enumerating the columns of a row in the SparseMatrix, when the 
row-index of the vector matches the column-index of the SparseMatrix. In cases 
where a particular row of the SparseMatrix has non-zero values at 
column-indices lower than corresponding non-zero row-indices of the 
SparseVector, the non-zero values of the SparseVector are enumerated without 
ever matching the column-index at index `i` and the remaining column-indices 
i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this 
issue.
    
    ## How was this patch tested?
    
    I have run the specific `gemv` tests in both mllib-local and mllib. I am 
currently still running `./dev/run-tests`.
    
    ## ___
    As per instructions, I hereby state that this is my original work and that 
I license the work to the project (Apache Spark) under the project's open 
source license.
    
    Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on 
these parts before.
    
    Author: Bjarne Fruergaard <[email protected]>
    
    Closes #15296 from bwahlgreen/bugfix-spark-17721.
    
    (cherry picked from commit 29396e7d1483d027960b9a1bed47008775c4253e)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit a99ea4c9e0e2f91e4b524987788f0acee88e564d
Author: Bryan Cutler <[email protected]>
Date:   2016-09-29T23:31:30Z

    Updated the following PR with minor changes to allow cherry-pick to 
branch-2.0
    
    [SPARK-17697][ML] Fixed bug in summary calculations that pattern match 
against label without casting
    
    In calling LogisticRegression.evaluate and 
GeneralizedLinearRegression.evaluate using a Dataset where the Label is not of 
a double type, calculations pattern match against a double and throw a 
MatchError.  This fix casts the Label column to a DoubleType to ensure there is 
no MatchError.
    
    Added unit tests to call evaluate with a dataset that has Label as other 
numeric types.
    
    Author: Bryan Cutler <[email protected]>
    
    Closes #15288 from BryanCutler/binaryLOR-numericCheck-SPARK-17697.
    
    (cherry picked from commit 2f739567080d804a942cfcca0e22f91ab7cbea36)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 744aac8e6ff04d7a3f1e8ccad335605ac8fe2f29
Author: Dongjoon Hyun <[email protected]>
Date:   2016-10-01T05:05:59Z

    [MINOR][DOC] Add an up-to-date description for default serialization during 
shuffling
    
    ## What changes were proposed in this pull request?
    
    This PR aims to make the doc up-to-date. The documentation is generally 
correct, but after https://issues.apache.org/jira/browse/SPARK-13926, Spark 
starts to choose Kyro as a default serialization library during shuffling of 
simple types, arrays of simple types, or string type.
    
    ## How was this patch tested?
    
    This is a documentation update.
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #15315 from dongjoon-hyun/SPARK-DOC-SERIALIZER.
    
    (cherry picked from commit 15e9bbb49e00b3982c428d39776725d0dea2cdfa)
    Signed-off-by: Reynold Xin <[email protected]>

commit b57e2acb134d94dafc81686da875c5dd3ea35c74
Author: Jagadeesan <[email protected]>
Date:   2016-10-03T09:46:38Z

    [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,â¦
    
    ## What changes were proposed in this pull request?
    
    To build R docs (which are built when R tests are run), users need to 
install pandoc and rmarkdown. This was done for Jenkins in 
~~[SPARK-17420](https://issues.apache.org/jira/browse/SPARK-17420)~~
    
    â¦ pandoc]
    
    Author: Jagadeesan <[email protected]>
    
    Closes #15309 from jagadeesanas2/SPARK-17736.
    
    (cherry picked from commit a27033c0bbaae8f31db9b91693947ed71738ed11)
    Signed-off-by: Sean Owen <[email protected]>

commit 613863b116b6cbc9ac83845c68a2d11b3b02f7cb
Author: zero323 <[email protected]>
Date:   2016-10-04T00:57:54Z

    [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow 
__getitem__ contract
    
    ## What changes were proposed in this pull request?
    
    Replaces` ValueError` with `IndexError` when index passed to `ml` / `mllib` 
`SparseVector.__getitem__` is out of range. This ensures correct iteration 
behavior.
    
    Replaces `ValueError` with `IndexError` for `DenseMatrix` and `SparkMatrix` 
in `ml` / `mllib`.
    
    ## How was this patch tested?
    
    PySpark `ml` / `mllib` unit tests. Additional unit tests to prove that the 
problem has been resolved.
    
    Author: zero323 <[email protected]>
    
    Closes #15144 from zero323/SPARK-17587.
    
    (cherry picked from commit d8399b600cef706c22d381b01fab19c610db439a)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 5843932021cc8bbe0277943c6c480cfeae1b29e2
Author: Herman van Hovell <[email protected]>
Date:   2016-10-04T02:32:59Z

    [SPARK-17753][SQL] Allow a complex expression as the input a value based 
case statement
    
    ## What changes were proposed in this pull request?
    We currently only allow relatively simple expressions as the input for a 
value based case statement. Expressions like `case (a > 1) or (b = 2) when true 
then 1 when false then 0 end` currently fail. This PR adds support for such 
expressions.
    
    ## How was this patch tested?
    Added a test to the ExpressionParserSuite.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #15322 from hvanhovell/SPARK-17753.
    
    (cherry picked from commit 2bbecdec2023143fd144e4242ff70822e0823986)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 7429199e5b34d5594e3fcedb57eda789d16e26f3
Author: Dongjoon Hyun <[email protected]>
Date:   2016-10-04T04:28:16Z

    [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException 
in Thriftserver
    
    ## What changes were proposed in this pull request?
    
    Currently, Spark Thrift Server raises `IllegalArgumentException` for 
queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT 
if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2.
    
    **Before**
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Error: java.lang.IllegalArgumentException: Unrecognized type name: null 
(state=,code=0)
    Closing: 0: jdbc:hive2://localhost:10000
    
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Error: java.lang.IllegalArgumentException: Unrecognized type name: null 
(state=,code=0)
    Closing: 0: jdbc:hive2://localhost:10000
    ```
    
    **After**
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    +-------+--+
    | NULL  |
    +-------+--+
    | NULL  |
    +-------+--+
    1 row selected (3.242 seconds)
    Beeline version 1.2.1.spark2 by Apache Hive
    Closing: 0: jdbc:hive2://localhost:10000
    
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    Connecting to jdbc:hive2://localhost:10000
    Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
    Driver: Hive JDBC (version 1.2.1.spark2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    +-------------------------+--+
    | (IF(true, NULL, NULL))  |
    +-------------------------+--+
    | NULL                    |
    +-------------------------+--+
    1 row selected (0.201 seconds)
    Beeline version 1.2.1.spark2 by Apache Hive
    Closing: 0: jdbc:hive2://localhost:10000
    ```
    
    ## How was this patch tested?
    
    * Pass the Jenkins test with a new testsuite.
    * Also, Manually, after starting Spark Thrift Server, run the following 
command.
    ```sql
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
    $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
    ```
    
    **Hive 1.2**
    ```sql
    hive> create table null_table as select null;
    hive> desc null_table;
    OK
    _c0                     void
    ```
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #15325 from dongjoon-hyun/SPARK-17112.
    
    (cherry picked from commit c571cfb2d0e1e224107fc3f0c672730cae9804cb)
    Signed-off-by: Reynold Xin <[email protected]>

commit 3dbe8097facb854195729da7bd577f6c14eb2b2a
Author: ding <[email protected]>
Date:   2016-10-04T07:00:10Z

    [SPARK-17559][MLLIB] persist edges if their storage level is non in 
PeriodicGraphCheckpointer
    
    ## What changes were proposed in this pull request?
    When use PeriodicGraphCheckpointer to persist graph, sometimes the edges 
isn't persisted. As currently only when vertices's storage level is none, graph 
is persisted. However there is a chance vertices's storage level is not none 
while edges's is none. Eg. graph created by a outerJoinVertices operation, 
vertices is automatically cached while edges is not. In this way, edges will 
not be persisted if we use PeriodicGraphCheckpointer do persist. We need 
separately check edges's storage level and persisted it if it's none.
    
    ## How was this patch tested?
     manual tests
    
    Author: ding <[email protected]>
    
    Closes #15124 from dding3/spark-persisitEdge.
    
    (cherry picked from commit 126baa8d32bc0e7bf8b43f9efa84f2728f02347d)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 50f6be7598547fed5190a920fd3cebb4bc908524
Author: Felix Cheung <[email protected]>
Date:   2016-10-04T16:22:26Z

    [SPARKR][DOC] minor formatting and output cleanup for R vignettes
    
    Clean up output, format table, truncate long example output, hide warnings
    
    (new - Left; existing - Right)
    
![image](https://cloud.githubusercontent.com/assets/8969467/19064018/5dcde4d0-89bc-11e6-857b-052df3f52a4e.png)
    
    
![image](https://cloud.githubusercontent.com/assets/8969467/19064034/6db09956-89bc-11e6-8e43-232d5c3fe5e6.png)
    
    
![image](https://cloud.githubusercontent.com/assets/8969467/19064058/88f09590-89bc-11e6-9993-61639e29dfdd.png)
    
    
![image](https://cloud.githubusercontent.com/assets/8969467/19064066/95ccbf64-89bc-11e6-877f-45af03ddcadc.png)
    
    
![image](https://cloud.githubusercontent.com/assets/8969467/19064082/a8445404-89bc-11e6-8532-26d8bc9b206f.png)
    
    Run create-doc.sh manually
    
    Author: Felix Cheung <[email protected]>
    
    Closes #15340 from felixcheung/vignettes.
    
    (cherry picked from commit 068c198e956346b90968a4d74edb7bc820c4be28)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit a9165bb1b704483ad16331945b0968cbb1a97139
Author: Marcelo Vanzin <[email protected]>
Date:   2016-10-04T16:38:44Z

    [SPARK-17549][SQL] Only collect table size stat in driver for cached 
relation.
    
    This reverts commit 9ac68dbc5720026ea92acc61d295ca64d0d3d132. Turns out
    the original fix was correct.
    
    Original change description:
    The existing code caches all stats for all columns for each partition
    in the driver; for a large relation, this causes extreme memory usage,
    which leads to gc hell and application failures.
    
    It seems that only the size in bytes of the data is actually used in the
    driver, so instead just colllect that. In executors, the full stats are
    still kept, but that's not a big problem; we expect the data to be 
distributed
    and thus not really incur in too much memory pressure in each individual
    executor.
    
    There are also potential improvements on the executor side, since the data
    being stored currently is very wasteful (e.g. storing boxed types vs.
    primitive types for stats). But that's a separate issue.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #15304 from vanzin/SPARK-17549.2.
    
    (cherry picked from commit 8d969a2125d915da1506c17833aa98da614a257f)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit a4f7df423e1e0aa512dfc496bc9de13831eae3f3
Author: Ergin Seyfe <[email protected]>
Date:   2016-10-04T19:39:01Z

    [SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector
    
    This is the PR for branch2.0: PR https://github.com/apache/spark/pull/15337
    
    Added VoidObjectInspector to the list of PrimitiveObjectInspectors
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    Executing following query was failing.
    select SOME_UDAF*(a.arr)
    from (
    select Array(null) as arr from dim_one_row
    ) a
    
    After the fix, I am getting the correct output:
    res0: Array[org.apache.spark.sql.Row] = Array([null])
    
    Author: Ergin Seyfe <eseyfefb.com>
    
    Closes #15337 from seyfe/add_void_object_inspector.
    
    Author: Ergin Seyfe <[email protected]>
    
    Closes #15345 from seyfe/add_void_object_inspector_2.0.

commit b8df2e53c38a30f51c710543c81279a59a9ab4fc
Author: Shixiong Zhu <[email protected]>
Date:   2016-10-05T21:54:55Z

    [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of 
BlockManagerSuite
    
    ## What changes were proposed in this pull request?
    
    Mock SparkContext to reduce memory usage of BlockManagerSuite
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15350 from zsxwing/SPARK-17778.
    
    (cherry picked from commit 221b418b1c9db7b04c600b6300d18b034a4f444e)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 3b6463a794a754d630d69398f009c055664dd905
Author: Herman van Hovell <[email protected]>
Date:   2016-10-05T23:05:30Z

    [SPARK-17758][SQL] Last returns wrong result in case of empty partition
    
    ## What changes were proposed in this pull request?
    The result of the `Last` function can be wrong when the last partition 
processed is empty. It can return `null` instead of the expected value. For 
example, this can happen when we process partitions in the following order:
    ```
    - Partition 1 [Row1, Row2]
    - Partition 2 [Row3]
    - Partition 3 []
    ```
    In this case the `Last` function will currently return a null, instead of 
the value of `Row3`.
    
    This PR fixes this by adding a `valueSet` flag to the `Last` function.
    
    ## How was this patch tested?
    We only used end to end tests for `DeclarativeAggregateFunction`s. I have 
added an evaluator for these functions so we can tests them in catalyst. I have 
added a `LastTestSuite` to test the `Last` aggregate function.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #15348 from hvanhovell/SPARK-17758.
    
    (cherry picked from commit 5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3)
    Signed-off-by: Yin Huai <[email protected]>

commit 1c2dff1eeeb045f3f5c3c1423ba07371b03965d7
Author: Michael Armbrust <[email protected]>
Date:   2016-10-05T23:48:43Z

    [SPARK-17643] Remove comparable requirement from Offset (backport for 
branch-2.0)
    
    ## What changes were proposed in this pull request?
    
    Backport 
https://github.com/apache/spark/commit/988c71457354b0a443471f501cef544a85b1a76a 
to branch-2.0
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #15362 from zsxwing/SPARK-17643-2.0.

commit 225372adfb843afcbf9928db3989f2f8393ae6d8
Author: Reynold Xin <[email protected]>
Date:   2016-10-06T17:33:45Z

    [SPARK-17798][SQL] Remove redundant Experimental annotations in 
sql.streaming
    
    ## What changes were proposed in this pull request?
    I was looking through API annotations to catch mislabeled APIs, and 
realized DataStreamReader and DataStreamWriter classes are already annotated as 
Experimental, and as a result there is no need to annotate each method within 
them.
    
    ## How was this patch tested?
    N/A
    
    Author: Reynold Xin <[email protected]>
    
    Closes #15373 from rxin/SPARK-17798.
    
    (cherry picked from commit 79accf45ace5549caa0cbab02f94fc87bedb5587)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit a2bf09588ed98ef33028fcf4d72c15f06af2e9ad
Author: Shixiong Zhu <[email protected]>
Date:   2016-10-06T19:51:12Z

    [SPARK-17780][SQL] Report Throwable to user in StreamExecution
    
    ## What changes were proposed in this pull request?
    
    When using an incompatible source for structured streaming, it may throw 
NoClassDefFoundError. It's better to just catch Throwable and report it to the 
user since the streaming thread is dying.
    
    ## How was this patch tested?
    
    `test("NoClassDefFoundError from an incompatible source")`
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15352 from zsxwing/SPARK-17780.
    
    (cherry picked from commit 9a48e60e6319d85f2c3be3a3c608dab135e18a73)
    Signed-off-by: Michael Armbrust <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17262: [SPARK-17261][SQL] Fixed missing closing bracket ...

Reply via email to