[GitHub] spark pull request #16181: Branch 2.1

zhuangxue Tue, 06 Dec 2016 17:06:44 -0800

GitHub user zhuangxue opened a pull request:

    https://github.com/apache/spark/pull/16181


    Branch 2.1

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16181
    
----
commit dcbc4265839d8a6f0300af6bcc1e8d8c102ec23f
Author: Susan X. Huynh <[email protected]>
Date:   2016-11-05T17:45:15Z

    [SPARK-17964][SPARKR] Enable SparkR with Mesos client mode and cluster mode
    
    ## What changes were proposed in this pull request?
    
    Enabled SparkR with Mesos client mode and cluster mode. Just a few changes 
were required to get this working on Mesos: (1) removed the SparkR on Mesos 
error checks and (2) do not require "--class" to be specified for R apps. The 
logic to check spark.mesos.executor.home was already in there.
    
    sun-rui
    
    ## How was this patch tested?
    
    1. SparkSubmitSuite
    2. On local mesos cluster (on laptop): ran SparkR shell, spark-submit 
client mode, and spark-submit cluster mode, with the 
"examples/src/main/R/dataframe.R" example application.
    3. On multi-node mesos cluster: ran SparkR shell, spark-submit client mode, 
and spark-submit cluster mode, with the "examples/src/main/R/dataframe.R" 
example application. I tested with the following --conf values set: 
spark.mesos.executor.docker.image and spark.mesos.executor.home
    
    This contribution is my original work and I license the work to the project 
under the project's open source license.
    
    Author: Susan X. Huynh <[email protected]>
    
    Closes #15700 from susanxhuynh/susan-r-branch.
    
    (cherry picked from commit 9a87c313859a6557bbf7bca7239043cb77ea23be)
    Signed-off-by: Sean Owen <[email protected]>

commit e9f1d4aaa472cf69165ffe75ed9b92618fa3900f
Author: hyukjinkwon <[email protected]>
Date:   2016-11-06T04:47:33Z

    [MINOR][DOCUMENTATION] Fix some minor descriptions in functions 
consistently with expressions
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to improve documentation and fix some descriptions 
equivalent to several minor fixes identified in 
https://github.com/apache/spark/pull/15677
    
    Also, this suggests to change `Note:` and `NOTE:` to `.. note::` 
consistently with the others which marks up pretty.
    
    ## How was this patch tested?
    
    Jenkins tests and manually.
    
    For PySpark, `Note:` and `NOTE:` to `.. note::` make the document as below:
    
    **From**
    
    ![2016-11-04 6 53 
35](https://cloud.githubusercontent.com/assets/6477701/20002648/42989922-a2c5-11e6-8a32-b73eda49e8c3.png)
    ![2016-11-04 6 53 
45](https://cloud.githubusercontent.com/assets/6477701/20002650/429fb310-a2c5-11e6-926b-e030d7eb0185.png)
    ![2016-11-04 6 54 
11](https://cloud.githubusercontent.com/assets/6477701/20002649/429d570a-a2c5-11e6-9e7e-44090f337e32.png)
    ![2016-11-04 6 53 
51](https://cloud.githubusercontent.com/assets/6477701/20002647/4297fc74-a2c5-11e6-801a-b89fbcbfca44.png)
    ![2016-11-04 6 53 
51](https://cloud.githubusercontent.com/assets/6477701/20002697/749f5780-a2c5-11e6-835f-022e1f2f82e3.png)
    
    **To**
    
    ![2016-11-04 7 03 
48](https://cloud.githubusercontent.com/assets/6477701/20002659/4961b504-a2c5-11e6-9ee0-ef0751482f47.png)
    ![2016-11-04 7 04 
03](https://cloud.githubusercontent.com/assets/6477701/20002660/49871d3a-a2c5-11e6-85ea-d9a5d11efeff.png)
    ![2016-11-04 7 04 
28](https://cloud.githubusercontent.com/assets/6477701/20002662/498e0f14-a2c5-11e6-803d-c0c5aeda4153.png)
    ![2016-11-04 7 33 
39](https://cloud.githubusercontent.com/assets/6477701/20002731/a76e30d2-a2c5-11e6-993b-0481b8342d6b.png)
    ![2016-11-04 7 33 
39](https://cloud.githubusercontent.com/assets/6477701/20002731/a76e30d2-a2c5-11e6-993b-0481b8342d6b.png)
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15765 from HyukjinKwon/minor-function-doc.
    
    (cherry picked from commit 15d392688456ad9f963417843c52a7b610f771d2)
    Signed-off-by: Felix Cheung <[email protected]>

commit c42301f1eb09565cfaa044b05984ed67879bd946
Author: sethah <[email protected]>
Date:   2016-11-06T05:38:07Z

    [SPARK-18276][ML] ML models should copy the training summary and set parent
    
    ## What changes were proposed in this pull request?
    
    Only some of the models which contain a training summary currently set the 
summaries in the copy method. Linear/Logistic regression do, GLR, GMM, KM, and 
BKM do not. Additionally, these copy methods did not set the parent pointer of 
the copied model. This patch modifies the copy methods of the four models 
mentioned above to copy the training summary and set the parent.
    
    ## How was this patch tested?
    
    Add unit tests in Linear/Logistic/GeneralizedLinear regression and 
GaussianMixture/KMeans/BisectingKMeans to check the parent pointer of the 
copied model and check that the copied model has a summary.
    
    Author: sethah <[email protected]>
    
    Closes #15773 from sethah/SPARK-18276.
    
    (cherry picked from commit 23ce0d1e91076d90c1a87d698a94d283d08cf899)
    Signed-off-by: Yanbo Liang <[email protected]>

commit dcbf3fd4bd42059aed9c966d4f0cdf58815eb802
Author: hyukjinkwon <[email protected]>
Date:   2016-11-06T14:11:37Z

    [SPARK-17854][SQL] rand/randn allows null/long as input seed
    
    ## What changes were proposed in this pull request?
    
    This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and 
`LongType` as input in SQL. In this case, it treats the values as `0`.
    
    So, this PR includes both changes below:
    - `null` support
    
      It seems MySQL also accepts this.
    
      ``` sql
      mysql> select rand(0);
      +---------------------+
      | rand(0)             |
      +---------------------+
      | 0.15522042769493574 |
      +---------------------+
      1 row in set (0.00 sec)
    
      mysql> select rand(NULL);
      +---------------------+
      | rand(NULL)          |
      +---------------------+
      | 0.15522042769493574 |
      +---------------------+
      1 row in set (0.00 sec)
      ```
    
      and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)
    
      So the codes below:
    
      ``` scala
      spark.range(1).selectExpr("rand(null)").show()
      ```
    
      prints..
    
      **Before**
    
      ```
        Input argument to rand must be an integer literal.;; line 1 pos 0
      org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
      at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
      at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
      ```
    
      **After**
    
      ```
        +-----------------------+
        |rand(CAST(NULL AS INT))|
        +-----------------------+
        |    0.13385709732307427|
        +-----------------------+
      ```
    - `LongType` support in SQL.
    
      In addition, it make the function allows to take `LongType` consistently 
within Scala/SQL.
    
      In more details, the codes below:
    
      ``` scala
      spark.range(1).select(rand(1), rand(1L)).show()
      spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
      ```
    
      prints..
    
      **Before**
    
      ```
      +------------------+------------------+
      |           rand(1)|           rand(1)|
      +------------------+------------------+
      |0.2630967864682161|0.2630967864682161|
      +------------------+------------------+
    
      Input argument to rand must be an integer literal.;; line 1 pos 0
      org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
      at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
      at
      ```
    
      **After**
    
      ```
      +------------------+------------------+
      |           rand(1)|           rand(1)|
      +------------------+------------------+
      |0.2630967864682161|0.2630967864682161|
      +------------------+------------------+
    
      +------------------+------------------+
      |           rand(1)|           rand(1)|
      +------------------+------------------+
      |0.2630967864682161|0.2630967864682161|
      +------------------+------------------+
      ```
    ## How was this patch tested?
    
    Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15432 from HyukjinKwon/SPARK-17854.
    
    (cherry picked from commit 340f09d100cb669bc6795f085aac6fa05630a076)
    Signed-off-by: Sean Owen <[email protected]>

commit d2f2cf68a62a3f8beb7cdfef8393acfdcb785975
Author: Wojciech Szymanski <[email protected]>
Date:   2016-11-06T15:43:13Z

    [SPARK-18210][ML] Pipeline.copy does not create an instance with the same 
UID
    
    ## What changes were proposed in this pull request?
    
    Motivation:
    `org.apache.spark.ml.Pipeline.copy(extra: ParamMap)` does not create an 
instance with the same UID. It does not conform to the method specification 
from its base class `org.apache.spark.ml.param.Params.copy(extra: ParamMap)`
    
    Solution:
    - fix for Pipeline UID
    - introduced new tests for `org.apache.spark.ml.Pipeline.copy`
    - minor improvements in test for `org.apache.spark.ml.PipelineModel.copy`
    
    ## How was this patch tested?
    
    Introduced new unit test: 
`org.apache.spark.ml.PipelineSuite."Pipeline.copy"`
    Improved existing unit test: 
`org.apache.spark.ml.PipelineSuite."PipelineModel.copy"`
    
    Author: Wojciech Szymanski <[email protected]>
    
    Closes #15759 from wojtek-szymanski/SPARK-18210.
    
    (cherry picked from commit b89d0556dff0520ab35882382242fbfa7d9478eb)
    Signed-off-by: Yanbo Liang <[email protected]>

commit a8fbcdbf252634b1ebc910d8f5e86c16c39167f8
Author: hyukjinkwon <[email protected]>
Date:   2016-11-07T02:52:05Z

    [SPARK-18269][SQL] CSV datasource should read null properly when schema is 
lager than parsed tokens
    
    ## What changes were proposed in this pull request?
    
    Currently, there are the three cases when reading CSV by datasource when it 
is `PERMISSIVE` parse mode.
    
    - schema == parsed tokens (from each line)
      No problem to cast the value in the tokens to the field in the schema as 
they are equal.
    
    - schema < parsed tokens (from each line)
      It slices the tokens into the number of fields in schema.
    
    - schema > parsed tokens (from each line)
      It appends `null` into parsed tokens so that safely values can be casted 
with the schema.
    
    However, when `null` is appended in the third case, we should take `null` 
into account when casting the values.
    
    In case of `StringType`, it is fine as `UTF8String.fromString(datum)` 
produces `null` when the input is `null`. Therefore, this case will happen only 
when schema is explicitly given and schema includes data types that are not 
`StringType`.
    
    The codes below:
    
    ```scala
    val path = "/tmp/a"
    Seq("1").toDF().write.text(path.getAbsolutePath)
    val schema = StructType(
      StructField("a", IntegerType, true) ::
      StructField("b", IntegerType, true) :: Nil)
    spark.read.schema(schema).option("header", "false").csv(path).show()
    ```
    
    prints
    
    **Before**
    
    ```
    java.lang.NumberFormatException: null
    at java.lang.Integer.parseInt(Integer.java:542)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:24)
    ```
    
    **After**
    
    ```
    +---+----+
    |  a|   b|
    +---+----+
    |  1|null|
    +---+----+
    ```
    
    ## How was this patch tested?
    
    Unit test in `CSVSuite.scala` and `CSVTypeCastSuite.scala`
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15767 from HyukjinKwon/SPARK-18269.
    
    (cherry picked from commit 556a3b7d07f36c29ceb88fb6c24cc229e0e53ee4)
    Signed-off-by: Reynold Xin <[email protected]>

commit 9c78d355c541c2abfb4945e5d67bf0d2ba4b4d16
Author: Wenchen Fan <[email protected]>
Date:   2016-11-07T02:57:13Z

    [SPARK-18173][SQL] data source tables should support truncating partition
    
    ## What changes were proposed in this pull request?
    
    Previously `TRUNCATE TABLE ... PARTITION` will always truncate the whole 
table for data source tables, this PR fixes it and improve `InMemoryCatalog` to 
make this command work with it.
    ## How was this patch tested?
    
    existing tests
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #15688 from cloud-fan/truncate.
    
    (cherry picked from commit 46b2e499935386e28899d860110a6ab16c107c0c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 9ebd5e563d26cf42b9d32e8926de109101360d43
Author: Reynold Xin <[email protected]>
Date:   2016-11-07T06:42:05Z

    [SPARK-18167][SQL] Disable flaky hive partition pruning test.
    
    (cherry picked from commit 07ac3f09daf2b28436bc69f76badd1e36d756e4d)
    Signed-off-by: Reynold Xin <[email protected]>

commit 2fa1a632ae4e68ffa01fad0d6150219c13355724
Author: Reynold Xin <[email protected]>
Date:   2016-11-07T06:44:55Z

    [SPARK-18296][SQL] Use consistent naming for expression test suites
    
    ## What changes were proposed in this pull request?
    We have an undocumented naming convention to call expression unit tests 
ExpressionsSuite, and the end-to-end tests FunctionsSuite. It'd be great to 
make all test suites consistent with this naming convention.
    
    ## How was this patch tested?
    This is a test-only naming change.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #15793 from rxin/SPARK-18296.
    
    (cherry picked from commit 9db06c442cf85e41d51c7b167817f4e7971bf0da)
    Signed-off-by: Reynold Xin <[email protected]>

commit 4101029579de920215b426ca6537c1f0e4e4e5ae
Author: gatorsmile <[email protected]>
Date:   2016-11-07T09:16:37Z

    [SPARK-16904][SQL] Removal of Hive Built-in Hash Functions and 
TestHiveFunctionRegistry
    
    ### What changes were proposed in this pull request?
    
    Currently, the Hive built-in `hash` function is not being used in Spark 
since Spark 2.0. The public interface does not allow users to unregister the 
Spark built-in functions. Thus, users will never use Hive's built-in `hash` 
function.
    
    The only exception here is `TestHiveFunctionRegistry`, which allows users 
to unregister the built-in functions. Thus, we can load Hive's hash function in 
the test cases. If we disable it, 10+ test cases will fail because the results 
are different from the Hive golden answer files.
    
    This PR is to remove `hash` from the list of `hiveFunctions` in 
`HiveSessionCatalog`. It will also remove `TestHiveFunctionRegistry`. This 
removal makes us easier to remove `TestHiveSessionState` in the future.
    ### How was this patch tested?
    N/A
    
    Author: gatorsmile <[email protected]>
    
    Closes #14498 from gatorsmile/removeHash.
    
    (cherry picked from commit 57626a55703a189e03148398f67c36cd0e557044)
    Signed-off-by: Reynold Xin <[email protected]>

commit df40ee2b483989a47cb85d248280cc02f527112d
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-11-07T11:18:19Z

    [SPARK-18125][SQL] Fix a compilation error in codegen due to splitExpression
    
    ## What changes were proposed in this pull request?
    
    As reported in the jira, sometimes the generated java code in codegen will 
cause compilation error.
    
    Code snippet to test it:
    
        case class Route(src: String, dest: String, cost: Int)
        case class GroupedRoutes(src: String, dest: String, routes: Seq[Route])
    
        val ds = sc.parallelize(Array(
          Route("a", "b", 1),
          Route("a", "b", 2),
          Route("a", "c", 2),
          Route("a", "d", 10),
          Route("b", "a", 1),
          Route("b", "a", 5),
          Route("b", "c", 6))
        ).toDF.as[Route]
    
        val grped = ds.map(r => GroupedRoutes(r.src, r.dest, Seq(r)))
          .groupByKey(r => (r.src, r.dest))
          .reduceGroups { (g1: GroupedRoutes, g2: GroupedRoutes) =>
            GroupedRoutes(g1.src, g1.dest, g1.routes ++ g2.routes)
          }.map(_._2)
    
    The problem here is, in `ReferenceToExpressions` we evaluate the children 
vars to local variables. Then the result expression is evaluated to use those 
children variables. In the above case, the result expression code is too long 
and will be split by `CodegenContext.splitExpression`. So those local variables 
cannot be accessed and cause compilation error.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #15693 from viirya/fix-codege-compilation-error.
    
    (cherry picked from commit a814eeac6b3c38d1294b88c60cd083fc4d01bd25)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 6b332909f044f2d47f49cbf699f2f2f22206decf
Author: Yanbo Liang <[email protected]>
Date:   2016-11-07T12:07:19Z

    [SPARK-18291][SPARKR][ML] SparkR glm predict should output original label 
when family = binomial.
    
    ## What changes were proposed in this pull request?
    SparkR ```spark.glm``` predict should output original label when family = 
"binomial".
    
    ## How was this patch tested?
    Add unit test.
    You can also run the following code to test:
    ```R
    training <- suppressWarnings(createDataFrame(iris))
    training <- training[training$Species %in% c("versicolor", "virginica"), ]
    model <- spark.glm(training, Species ~ Sepal_Length + Sepal_Width,family = 
binomial(link = "logit"))
    showDF(predict(model, training))
    ```
    Before this change:
    ```
    
+------------+-----------+------------+-----------+----------+-----+-------------------+
    |Sepal_Length|Sepal_Width|Petal_Length|Petal_Width|   Species|label|        
 prediction|
    
+------------+-----------+------------+-----------+----------+-----+-------------------+
    |         7.0|        3.2|         4.7|        1.4|versicolor|  0.0| 
0.8271421517601544|
    |         6.4|        3.2|         4.5|        1.5|versicolor|  0.0| 
0.6044595910413112|
    |         6.9|        3.1|         4.9|        1.5|versicolor|  0.0| 
0.7916340858281998|
    |         5.5|        2.3|         4.0|        1.3|versicolor|  
0.0|0.16080518180591158|
    |         6.5|        2.8|         4.6|        1.5|versicolor|  0.0| 
0.6112229217050189|
    |         5.7|        2.8|         4.5|        1.3|versicolor|  0.0| 
0.2555087295500885|
    |         6.3|        3.3|         4.7|        1.6|versicolor|  0.0| 
0.5681507664364834|
    |         4.9|        2.4|         3.3|        1.0|versicolor|  
0.0|0.05990570219972002|
    |         6.6|        2.9|         4.6|        1.3|versicolor|  0.0| 
0.6644434078306246|
    |         5.2|        2.7|         3.9|        1.4|versicolor|  
0.0|0.11293577405862379|
    |         5.0|        2.0|         3.5|        1.0|versicolor|  
0.0|0.06152372321585971|
    |         5.9|        3.0|         4.2|        1.5|versicolor|  
0.0|0.35250697207602555|
    |         6.0|        2.2|         4.0|        1.0|versicolor|  
0.0|0.32267018290814303|
    |         6.1|        2.9|         4.7|        1.4|versicolor|  0.0|  
0.433391153814592|
    |         5.6|        2.9|         3.6|        1.3|versicolor|  0.0| 
0.2280744262436993|
    |         6.7|        3.1|         4.4|        1.4|versicolor|  0.0| 
0.7219848389339459|
    |         5.6|        3.0|         4.5|        1.5|versicolor|  
0.0|0.23527698971404695|
    |         5.8|        2.7|         4.1|        1.0|versicolor|  0.0|  
0.285024533520016|
    |         6.2|        2.2|         4.5|        1.5|versicolor|  0.0| 
0.4107047877447493|
    |         5.6|        2.5|         3.9|        1.1|versicolor|  
0.0|0.20083561961645083|
    
+------------+-----------+------------+-----------+----------+-----+-------------------+
    ```
    After this change:
    ```
    
+------------+-----------+------------+-----------+----------+-----+----------+
    |Sepal_Length|Sepal_Width|Petal_Length|Petal_Width|   
Species|label|prediction|
    
+------------+-----------+------------+-----------+----------+-----+----------+
    |         7.0|        3.2|         4.7|        1.4|versicolor|  0.0| 
virginica|
    |         6.4|        3.2|         4.5|        1.5|versicolor|  0.0| 
virginica|
    |         6.9|        3.1|         4.9|        1.5|versicolor|  0.0| 
virginica|
    |         5.5|        2.3|         4.0|        1.3|versicolor|  
0.0|versicolor|
    |         6.5|        2.8|         4.6|        1.5|versicolor|  0.0| 
virginica|
    |         5.7|        2.8|         4.5|        1.3|versicolor|  
0.0|versicolor|
    |         6.3|        3.3|         4.7|        1.6|versicolor|  0.0| 
virginica|
    |         4.9|        2.4|         3.3|        1.0|versicolor|  
0.0|versicolor|
    |         6.6|        2.9|         4.6|        1.3|versicolor|  0.0| 
virginica|
    |         5.2|        2.7|         3.9|        1.4|versicolor|  
0.0|versicolor|
    |         5.0|        2.0|         3.5|        1.0|versicolor|  
0.0|versicolor|
    |         5.9|        3.0|         4.2|        1.5|versicolor|  
0.0|versicolor|
    |         6.0|        2.2|         4.0|        1.0|versicolor|  
0.0|versicolor|
    |         6.1|        2.9|         4.7|        1.4|versicolor|  
0.0|versicolor|
    |         5.6|        2.9|         3.6|        1.3|versicolor|  
0.0|versicolor|
    |         6.7|        3.1|         4.4|        1.4|versicolor|  0.0| 
virginica|
    |         5.6|        3.0|         4.5|        1.5|versicolor|  
0.0|versicolor|
    |         5.8|        2.7|         4.1|        1.0|versicolor|  
0.0|versicolor|
    |         6.2|        2.2|         4.5|        1.5|versicolor|  
0.0|versicolor|
    |         5.6|        2.5|         3.9|        1.1|versicolor|  
0.0|versicolor|
    
+------------+-----------+------------+-----------+----------+-----+----------+
    ```
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #15788 from yanboliang/spark-18291.
    
    (cherry picked from commit daa975f4bfa4f904697bf3365a4be9987032e490)
    Signed-off-by: Yanbo Liang <[email protected]>

commit 7a84edb2475446ff3a98e8cc8dcf62ee801fbbb9
Author: Tathagata Das <[email protected]>
Date:   2016-11-07T18:43:36Z

    [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether 
default starting offset in latest
    
    ## What changes were proposed in this pull request?
    
    Added test to check whether default starting offset in latest
    
    ## How was this patch tested?
    new unit test
    
    Author: Tathagata Das <[email protected]>
    
    Closes #15778 from tdas/SPARK-18283.
    
    (cherry picked from commit b06c23db9aedae48c9eba9d702ae82fa5647cfe5)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit d1eac3ef4af2f8c58395ff6f8bb58a1806a8c09b
Author: Weiqing Yang <[email protected]>
Date:   2016-11-07T20:33:01Z

    [SPARK-17108][SQL] Fix BIGINT and INT comparison failure in spark sql
    
    ## What changes were proposed in this pull request?
    
    Add a function to check if two integers are compatible when invoking 
`acceptsType()` in `DataType`.
    ## How was this patch tested?
    
    Manually.
    E.g.
    
    ```
        spark.sql("create table t3(a map<bigint, array<string>>)")
        spark.sql("select * from t3 where a[1] is not null")
    ```
    
    Before:
    
    ```
    cannot resolve 't.`a`[1]' due to data type mismatch: argument 2 requires 
bigint type, however, '1' is of int type.; line 1 pos 22
    org.apache.spark.sql.AnalysisException: cannot resolve 't.`a`[1]' due to 
data type mismatch: argument 2 requires bigint type, however, '1' is of int 
type.; line 1 pos 22
        at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:82)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:307)
    ```
    
    After:
     Run the sql queries above. No errors.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #15448 from weiqingy/SPARK_17108.
    
    (cherry picked from commit 0d95662e7fff26669d4f70e88fdac7a4128a4f49)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 9873d57f2c76d1a6995c4ff5a45be1259a7948f0
Author: Kazuaki Ishizaki <[email protected]>
Date:   2016-11-07T23:14:57Z

    [SPARK-17490][SQL] Optimize SerializeFromObject() for a primitive array
    
    Waiting for merging #13680
    
    This PR optimizes `SerializeFromObject()` for an primitive array. This is 
derived from #13758 to address one of problems by using a simple way in #13758.
    
    The current implementation always generates `GenericArrayData` from 
`SerializeFromObject()` for any type of an array in a logical plan. This 
involves a boxing at a constructor of `GenericArrayData` when 
`SerializedFromObject()` has an primitive array.
    
    This PR enables to generate `UnsafeArrayData` from `SerializeFromObject()` 
for a primitive array. It can avoid boxing to create an instance of `ArrayData` 
in the generated code by Catalyst.
    
    This PR also generate `UnsafeArrayData` in a case for 
`RowEncoder.serializeFor` or `CatalystTypeConverters.createToCatalystConverter`.
    
    Performance improvement of `SerializeFromObject()` is up to 2.0x
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 4.4.11-200.fc22.x86_64
    Intel Xeon E3-12xx v2 (Ivy Bridge)
    
    Without this PR
    Write an array in Dataset:               Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Int                                            556 /  608         15.1      
    66.3       1.0X
    Double                                        1668 / 1746          5.0      
   198.8       0.3X
    
    with this PR
    Write an array in Dataset:               Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Int                                            352 /  401         23.8      
    42.0       1.0X
    Double                                         821 /  885         10.2      
    97.9       0.4X
    ```
    
    Here is an example program that will happen in mllib as described in 
[SPARK-16070](https://issues.apache.org/jira/browse/SPARK-16070).
    
    ```
    sparkContext.parallelize(Seq(Array(1, 2)), 1).toDS.map(e => e).show
    ```
    
    Generated code before applying this PR
    
    ``` java
    /* 039 */   protected void processNext() throws java.io.IOException {
    /* 040 */     while (inputadapter_input.hasNext()) {
    /* 041 */       InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
    /* 042 */       int[] inputadapter_value = (int[])inputadapter_row.get(0, 
null);
    /* 043 */
    /* 044 */       Object mapelements_obj = ((Expression) 
references[0]).eval(null);
    /* 045 */       scala.Function1 mapelements_value1 = (scala.Function1) 
mapelements_obj;
    /* 046 */
    /* 047 */       boolean mapelements_isNull = false || false;
    /* 048 */       int[] mapelements_value = null;
    /* 049 */       if (!mapelements_isNull) {
    /* 050 */         Object mapelements_funcResult = null;
    /* 051 */         mapelements_funcResult = 
mapelements_value1.apply(inputadapter_value);
    /* 052 */         if (mapelements_funcResult == null) {
    /* 053 */           mapelements_isNull = true;
    /* 054 */         } else {
    /* 055 */           mapelements_value = (int[]) mapelements_funcResult;
    /* 056 */         }
    /* 057 */
    /* 058 */       }
    /* 059 */       mapelements_isNull = mapelements_value == null;
    /* 060 */
    /* 061 */       serializefromobject_argIsNulls[0] = mapelements_isNull;
    /* 062 */       serializefromobject_argValue = mapelements_value;
    /* 063 */
    /* 064 */       boolean serializefromobject_isNull = false;
    /* 065 */       for (int idx = 0; idx < 1; idx++) {
    /* 066 */         if (serializefromobject_argIsNulls[idx]) { 
serializefromobject_isNull = true; break; }
    /* 067 */       }
    /* 068 */
    /* 069 */       final ArrayData serializefromobject_value = 
serializefromobject_isNull ? null : new 
org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_argValue);
    /* 070 */       serializefromobject_holder.reset();
    /* 071 */
    /* 072 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 073 */
    /* 074 */       if (serializefromobject_isNull) {
    /* 075 */         serializefromobject_rowWriter.setNullAt(0);
    /* 076 */       } else {
    /* 077 */         // Remember the current cursor so that we can calculate 
how many bytes are
    /* 078 */         // written later.
    /* 079 */         final int serializefromobject_tmpCursor = 
serializefromobject_holder.cursor;
    /* 080 */
    /* 081 */         if (serializefromobject_value instanceof UnsafeArrayData) 
{
    /* 082 */           final int serializefromobject_sizeInBytes = 
((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
    /* 083 */           // grow the global buffer before writing data.
    /* 084 */           
serializefromobject_holder.grow(serializefromobject_sizeInBytes);
    /* 085 */           ((UnsafeArrayData) 
serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, 
serializefromobject_holder.cursor);
    /* 086 */           serializefromobject_holder.cursor += 
serializefromobject_sizeInBytes;
    /* 087 */
    /* 088 */         } else {
    /* 089 */           final int serializefromobject_numElements = 
serializefromobject_value.numElements();
    /* 090 */           
serializefromobject_arrayWriter.initialize(serializefromobject_holder, 
serializefromobject_numElements, 4);
    /* 091 */
    /* 092 */           for (int serializefromobject_index = 0; 
serializefromobject_index < serializefromobject_numElements; 
serializefromobject_index++) {
    /* 093 */             if 
(serializefromobject_value.isNullAt(serializefromobject_index)) {
    /* 094 */               
serializefromobject_arrayWriter.setNullInt(serializefromobject_index);
    /* 095 */             } else {
    /* 096 */               final int serializefromobject_element = 
serializefromobject_value.getInt(serializefromobject_index);
    /* 097 */               
serializefromobject_arrayWriter.write(serializefromobject_index, 
serializefromobject_element);
    /* 098 */             }
    /* 099 */           }
    /* 100 */         }
    /* 101 */
    /* 102 */         serializefromobject_rowWriter.setOffsetAndSize(0, 
serializefromobject_tmpCursor, serializefromobject_holder.cursor - 
serializefromobject_tmpCursor);
    /* 103 */       }
    /* 104 */       
serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
    /* 105 */       append(serializefromobject_result);
    /* 106 */       if (shouldStop()) return;
    /* 107 */     }
    /* 108 */   }
    /* 109 */ }
    ```
    
    Generated code after applying this PR
    
    ``` java
    /* 035 */   protected void processNext() throws java.io.IOException {
    /* 036 */     while (inputadapter_input.hasNext()) {
    /* 037 */       InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
    /* 038 */       int[] inputadapter_value = (int[])inputadapter_row.get(0, 
null);
    /* 039 */
    /* 040 */       Object mapelements_obj = ((Expression) 
references[0]).eval(null);
    /* 041 */       scala.Function1 mapelements_value1 = (scala.Function1) 
mapelements_obj;
    /* 042 */
    /* 043 */       boolean mapelements_isNull = false || false;
    /* 044 */       int[] mapelements_value = null;
    /* 045 */       if (!mapelements_isNull) {
    /* 046 */         Object mapelements_funcResult = null;
    /* 047 */         mapelements_funcResult = 
mapelements_value1.apply(inputadapter_value);
    /* 048 */         if (mapelements_funcResult == null) {
    /* 049 */           mapelements_isNull = true;
    /* 050 */         } else {
    /* 051 */           mapelements_value = (int[]) mapelements_funcResult;
    /* 052 */         }
    /* 053 */
    /* 054 */       }
    /* 055 */       mapelements_isNull = mapelements_value == null;
    /* 056 */
    /* 057 */       boolean serializefromobject_isNull = mapelements_isNull;
    /* 058 */       final ArrayData serializefromobject_value = 
serializefromobject_isNull ? null : 
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(mapelements_value);
    /* 059 */       serializefromobject_isNull = serializefromobject_value == 
null;
    /* 060 */       serializefromobject_holder.reset();
    /* 061 */
    /* 062 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 063 */
    /* 064 */       if (serializefromobject_isNull) {
    /* 065 */         serializefromobject_rowWriter.setNullAt(0);
    /* 066 */       } else {
    /* 067 */         // Remember the current cursor so that we can calculate 
how many bytes are
    /* 068 */         // written later.
    /* 069 */         final int serializefromobject_tmpCursor = 
serializefromobject_holder.cursor;
    /* 070 */
    /* 071 */         if (serializefromobject_value instanceof UnsafeArrayData) 
{
    /* 072 */           final int serializefromobject_sizeInBytes = 
((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
    /* 073 */           // grow the global buffer before writing data.
    /* 074 */           
serializefromobject_holder.grow(serializefromobject_sizeInBytes);
    /* 075 */           ((UnsafeArrayData) 
serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, 
serializefromobject_holder.cursor);
    /* 076 */           serializefromobject_holder.cursor += 
serializefromobject_sizeInBytes;
    /* 077 */
    /* 078 */         } else {
    /* 079 */           final int serializefromobject_numElements = 
serializefromobject_value.numElements();
    /* 080 */           
serializefromobject_arrayWriter.initialize(serializefromobject_holder, 
serializefromobject_numElements, 4);
    /* 081 */
    /* 082 */           for (int serializefromobject_index = 0; 
serializefromobject_index < serializefromobject_numElements; 
serializefromobject_index++) {
    /* 083 */             if 
(serializefromobject_value.isNullAt(serializefromobject_index)) {
    /* 084 */               
serializefromobject_arrayWriter.setNullInt(serializefromobject_index);
    /* 085 */             } else {
    /* 086 */               final int serializefromobject_element = 
serializefromobject_value.getInt(serializefromobject_index);
    /* 087 */               
serializefromobject_arrayWriter.write(serializefromobject_index, 
serializefromobject_element);
    /* 088 */             }
    /* 089 */           }
    /* 090 */         }
    /* 091 */
    /* 092 */         serializefromobject_rowWriter.setOffsetAndSize(0, 
serializefromobject_tmpCursor, serializefromobject_holder.cursor - 
serializefromobject_tmpCursor);
    /* 093 */       }
    /* 094 */       
serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
    /* 095 */       append(serializefromobject_result);
    /* 096 */       if (shouldStop()) return;
    /* 097 */     }
    /* 098 */   }
    /* 099 */ }
    ```
    
    Added a test in `DatasetSuite`, `RowEncoderSuite`, and 
`CatalystTypeConvertersSuite`
    
    Author: Kazuaki Ishizaki <[email protected]>
    
    Closes #15044 from kiszk/SPARK-17490.
    
    (cherry picked from commit 19cf208063f035d793d2306295a251a9af7e32f6)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 4af82d56f79ac3cceb08b702413ae2b35dfea48b
Author: hyukjinkwon <[email protected]>
Date:   2016-11-08T00:54:40Z

    [SPARK-18295][SQL] Make to_json function null safe (matching it to 
from_json)
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to match up the behaviour of `to_json` to `from_json` 
function for null-safety.
    
    Currently, it throws `NullPointException` but this PR fixes this to produce 
`null` instead.
    
    with the data below:
    
    ```scala
    import spark.implicits._
    
    val df = Seq(Some(Tuple1(Tuple1(1))), None).toDF("a")
    df.show()
    ```
    
    ```
    +----+
    |   a|
    +----+
    | [1]|
    |null|
    +----+
    ```
    
    the codes below
    
    ```scala
    import org.apache.spark.sql.functions._
    
    df.select(to_json($"a")).show()
    ```
    
    produces..
    
    **Before**
    
    throws `NullPointException` as below:
    
    ```
    java.lang.NullPointerException
      at 
org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeFields(JacksonGenerator.scala:138)
      at 
org.apache.spark.sql.catalyst.json.JacksonGenerator$$anonfun$write$1.apply$mcV$sp(JacksonGenerator.scala:194)
      at 
org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeObject(JacksonGenerator.scala:131)
      at 
org.apache.spark.sql.catalyst.json.JacksonGenerator.write(JacksonGenerator.scala:193)
      at 
org.apache.spark.sql.catalyst.expressions.StructToJson.eval(jsonExpressions.scala:544)
      at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
      at 
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:48)
      at 
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:30)
      at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    ```
    
    **After**
    
    ```
    +---------------+
    |structtojson(a)|
    +---------------+
    |       {"_1":1}|
    |           null|
    +---------------+
    ```
    
    ## How was this patch tested?
    
    Unit test in `JsonExpressionsSuite.scala` and `JsonFunctionsSuite.scala`.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15792 from HyukjinKwon/SPARK-18295.
    
    (cherry picked from commit 3eda05703f02413540f180ade01f0f114e70b9cc)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 29f59c73301628fb63086660f64fdb5272a312fe
Author: Ryan Blue <[email protected]>
Date:   2016-11-08T01:36:15Z

    [SPARK-18086] Add support for Hive session vars.
    
    ## What changes were proposed in this pull request?
    
    This adds support for Hive variables:
    
    * Makes values set via `spark-sql --hivevar name=value` accessible
    * Adds `getHiveVar` and `setHiveVar` to the `HiveClient` interface
    * Adds a SessionVariables trait for sessions like Hive that support 
variables (including Hive vars)
    * Adds SessionVariables support to variable substitution
    * Adds SessionVariables support to the SET command
    
    ## How was this patch tested?
    
    * Adds a test to all supported Hive versions for accessing Hive variables
    * Adds HiveVariableSubstitutionSuite
    
    Author: Ryan Blue <[email protected]>
    
    Closes #15738 from rdblue/SPARK-18086-add-hivevar-support.
    
    (cherry picked from commit 9b0593d5e99bb919c4abb8d0836a126ec2eaf1d5)
    Signed-off-by: Reynold Xin <[email protected]>

commit 4943929d85a2aaf404c140d2d2589a597f484976
Author: Liwei Lin <[email protected]>
Date:   2016-11-08T01:49:24Z

    [SPARK-18261][STRUCTURED STREAMING] Add statistics to MemorySink for joining
    
    ## What changes were proposed in this pull request?
    
    Right now, there is no way to join the output of a memory sink with any 
table:
    
    > UnsupportedOperationException: LeafNode MemoryPlan must implement 
statistics
    
    This patch adds statistics to MemorySink, making joining snapshots of 
memory streams with tables possible.
    
    ## How was this patch tested?
    
    Added a test case.
    
    Author: Liwei Lin <[email protected]>
    
    Closes #15786 from lw-lin/memory-sink-stat.
    
    (cherry picked from commit c1a0c66bd2662bc40f312da474c3b95229fe92d0)
    Signed-off-by: Reynold Xin <[email protected]>

commit 4cb4e5ff0ab9537758bf0b418ddd40dfe9537609
Author: gatorsmile <[email protected]>
Date:   2016-11-08T02:34:21Z

    [SPARK-18217][SQL] Disallow creating permanent views based on temporary 
views or UDFs
    
    ### What changes were proposed in this pull request?
    Based on the discussion in 
[SPARK-18209](https://issues.apache.org/jira/browse/SPARK-18209). It doesn't 
really make sense to create permanent views based on temporary views or 
temporary UDFs.
    
    To disallow the supports and issue the exceptions, this PR needs to detect 
whether a temporary view/UDF is being used when defining a permanent view. 
Basically, this PR can be split to two sub-tasks:
    
    **Task 1:** detecting a temporary view from the query plan of view 
definition.
    When finding an unresolved temporary view, Analyzer replaces it by a 
`SubqueryAlias` with the corresponding logical plan, which is stored in an 
in-memory HashMap. After replacement, it is impossible to detect whether the 
`SubqueryAlias` is added/generated from a temporary view. Thus, to detect the 
usage of a temporary view in view definition, this PR traverses the unresolved 
logical plan and uses the name of an `UnresolvedRelation` to detect whether it 
is a (global) temporary view.
    
    **Task 2:** detecting a temporary UDF from the query plan of view 
definition.
    Detecting usage of a temporary UDF in view definition is not straightfoward.
    
    First, in the analyzed plan, we are having different forms to represent the 
functions. More importantly, some classes (e.g., `HiveGenericUDF`) are not 
accessible from `CreateViewCommand`, which is part of  `sql/core`. Thus, we 
used the unanalyzed plan `child` of `CreateViewCommand` to detect the usage of 
a temporary UDF. Because the plan has already been successfully analyzed, we 
can assume the functions have been defined/registered.
    
    Second, in Spark, the functions have four forms: Spark built-in functions, 
built-in hash functions, permanent UDFs and temporary UDFs. We do not have any 
direct way to determine whether a function is temporary or not. Thus, we 
introduced a function `isTemporaryFunction` in `SessionCatalog`. This function 
contains the detailed logics to determine whether a function is temporary or 
not.
    
    ### How was this patch tested?
    Added test cases.
    
    Author: gatorsmile <[email protected]>
    
    Closes #15764 from gatorsmile/blockTempFromPermViewCreation.
    
    (cherry picked from commit 1da64e1fa0970277d1fb47dec8adca47b068b1ec)
    Signed-off-by: Reynold Xin <[email protected]>

commit c8879bf1ee2af9ccd5d5656571d931d2fc1da024
Author: fidato <[email protected]>
Date:   2016-11-08T02:41:17Z

    [SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles
    
    ## What changes were proposed in this pull request?
    
    This Pull request comprises of the critical bug SPARK-16575 changes. This 
change rectifies the issue with BinaryFileRDD partition calculations as  upon 
creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of 
two partitions only.
    ## How was this patch tested?
    
    The original issue ie. getNumPartitions on binary Files RDD (always having 
two partitions) was first replicated and then tested upon the changes. Also the 
unit tests have been checked and passed.
    
    This contribution is my original work and I licence the work to the project 
under the project's open source license
    
    srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have 
a look .
    
    Author: fidato <[email protected]>
    
    Closes #15327 from fidato13/SPARK-16575.
    
    (cherry picked from commit 6f3697136aa68dc39d3ce42f43a7af554d2a3bf9)
    Signed-off-by: Reynold Xin <[email protected]>

commit ee400f67a471c9445d9d7e4957113fc62bff6abf
Author: Kazuaki Ishizaki <[email protected]>
Date:   2016-11-08T11:01:54Z

    [SPARK-18207][SQL] Fix a compilation error due to HashExpression.doGenCode
    
    This PR avoids a compilation error due to more than 64KB Java byte code 
size. This error occur since  generate java code for computing a hash value for 
a row is too big. This PR fixes this compilation error by splitting a big code 
chunk into multiple methods by calling `CodegenContext.splitExpression` at 
`HashExpression.doGenCode`
    
    The test case requires a calculation of hash code for a row that includes 
1000 String fields. `HashExpression.doGenCode` generate a lot of Java code for 
this computation into one function. As a result, the size of the corresponding 
Java bytecode is more than 64 KB.
    
    Generated code without this PR
    ````java
    /* 027 */   public UnsafeRow apply(InternalRow i) {
    /* 028 */     boolean isNull = false;
    /* 029 */
    /* 030 */     int value1 = 42;
    /* 031 */
    /* 032 */     boolean isNull2 = i.isNullAt(0);
    /* 033 */     UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
    /* 034 */     if (!isNull2) {
    /* 035 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value2.getBaseObject(),
 value2.getBaseOffset(), value2.numBytes(), value1);
    /* 036 */     }
    /* 037 */
    /* 038 */
    /* 039 */     boolean isNull3 = i.isNullAt(1);
    /* 040 */     UTF8String value3 = isNull3 ? null : (i.getUTF8String(1));
    /* 041 */     if (!isNull3) {
    /* 042 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value3.getBaseObject(),
 value3.getBaseOffset(), value3.numBytes(), value1);
    /* 043 */     }
    /* 044 */
    /* 045 */
    ...
    /* 7024 */
    /* 7025 */     boolean isNull1001 = i.isNullAt(999);
    /* 7026 */     UTF8String value1001 = isNull1001 ? null : 
(i.getUTF8String(999));
    /* 7027 */     if (!isNull1001) {
    /* 7028 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value1001.getBaseObject(),
 value1001.getBaseOffset(), value1001.numBytes(), value1);
    /* 7029 */     }
    /* 7030 */
    /* 7031 */
    /* 7032 */     boolean isNull1002 = i.isNullAt(1000);
    /* 7033 */     UTF8String value1002 = isNull1002 ? null : 
(i.getUTF8String(1000));
    /* 7034 */     if (!isNull1002) {
    /* 7035 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value1002.getBaseObject(),
 value1002.getBaseOffset(), value1002.numBytes(), value1);
    /* 7036 */     }
    ````
    
    Generated code with this PR
    ````java
    /* 3807 */   private void apply_249(InternalRow i) {
    /* 3808 */
    /* 3809 */     boolean isNull998 = i.isNullAt(996);
    /* 3810 */     UTF8String value998 = isNull998 ? null : 
(i.getUTF8String(996));
    /* 3811 */     if (!isNull998) {
    /* 3812 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value998.getBaseObject(),
 value998.getBaseOffset(), value998.numBytes(), value1);
    /* 3813 */     }
    /* 3814 */
    /* 3815 */     boolean isNull999 = i.isNullAt(997);
    /* 3816 */     UTF8String value999 = isNull999 ? null : 
(i.getUTF8String(997));
    /* 3817 */     if (!isNull999) {
    /* 3818 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value999.getBaseObject(),
 value999.getBaseOffset(), value999.numBytes(), value1);
    /* 3819 */     }
    /* 3820 */
    /* 3821 */     boolean isNull1000 = i.isNullAt(998);
    /* 3822 */     UTF8String value1000 = isNull1000 ? null : 
(i.getUTF8String(998));
    /* 3823 */     if (!isNull1000) {
    /* 3824 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value1000.getBaseObject(),
 value1000.getBaseOffset(), value1000.numBytes(), value1);
    /* 3825 */     }
    /* 3826 */
    /* 3827 */     boolean isNull1001 = i.isNullAt(999);
    /* 3828 */     UTF8String value1001 = isNull1001 ? null : 
(i.getUTF8String(999));
    /* 3829 */     if (!isNull1001) {
    /* 3830 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value1001.getBaseObject(),
 value1001.getBaseOffset(), value1001.numBytes(), value1);
    /* 3831 */     }
    /* 3832 */
    /* 3833 */   }
    /* 3834 */
    ...
    /* 4532 */   private void apply_0(InternalRow i) {
    /* 4533 */
    /* 4534 */     boolean isNull2 = i.isNullAt(0);
    /* 4535 */     UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
    /* 4536 */     if (!isNull2) {
    /* 4537 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value2.getBaseObject(),
 value2.getBaseOffset(), value2.numBytes(), value1);
    /* 4538 */     }
    /* 4539 */
    /* 4540 */     boolean isNull3 = i.isNullAt(1);
    /* 4541 */     UTF8String value3 = isNull3 ? null : (i.getUTF8String(1));
    /* 4542 */     if (!isNull3) {
    /* 4543 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value3.getBaseObject(),
 value3.getBaseOffset(), value3.numBytes(), value1);
    /* 4544 */     }
    /* 4545 */
    /* 4546 */     boolean isNull4 = i.isNullAt(2);
    /* 4547 */     UTF8String value4 = isNull4 ? null : (i.getUTF8String(2));
    /* 4548 */     if (!isNull4) {
    /* 4549 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value4.getBaseObject(),
 value4.getBaseOffset(), value4.numBytes(), value1);
    /* 4550 */     }
    /* 4551 */
    /* 4552 */     boolean isNull5 = i.isNullAt(3);
    /* 4553 */     UTF8String value5 = isNull5 ? null : (i.getUTF8String(3));
    /* 4554 */     if (!isNull5) {
    /* 4555 */       value1 = 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value5.getBaseObject(),
 value5.getBaseOffset(), value5.numBytes(), value1);
    /* 4556 */     }
    /* 4557 */
    /* 4558 */   }
    ...
    /* 7344 */   public UnsafeRow apply(InternalRow i) {
    /* 7345 */     boolean isNull = false;
    /* 7346 */
    /* 7347 */     value1 = 42;
    /* 7348 */     apply_0(i);
    /* 7349 */     apply_1(i);
    ...
    /* 7596 */     apply_248(i);
    /* 7597 */     apply_249(i);
    /* 7598 */     apply_250(i);
    /* 7599 */     apply_251(i);
    ...
    ````
    
    Add a new test in `DataFrameSuite`
    
    Author: Kazuaki Ishizaki <[email protected]>
    
    Closes #15745 from kiszk/SPARK-18207.
    
    (cherry picked from commit 47731e1865fa1e3a8881a1f4420017bdc026e455)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 3b360e57a249b33b1b50e58d01a1b78a1c922d88
Author: root <root@izbp1gsnrlfzjxh82cz80vz.(none)>
Date:   2016-11-08T11:09:32Z

    [SPARK-18137][SQL] Fix RewriteDistinctAggregates UnresolvedException when a 
UDAF has a foldable TypeCheck
    
    ## What changes were proposed in this pull request?
    
    In RewriteDistinctAggregates rewrite funtion,after the UDAF's childs are 
mapped to AttributeRefference, If the UDAF(such as ApproximatePercentile) has a 
foldable TypeCheck for the input, It will failed because the 
AttributeRefference is not foldable,then the UDAF is not resolved, and then 
nullify on the unresolved object will throw a Exception.
    
    In this PR, only map Unfoldable child to AttributeRefference, this can 
avoid the UDAF's foldable TypeCheck. and then only Expand Unfoldable child, 
there is no need to Expand a static value(foldable value).
    
    **Before sql result**
    
    > select percentile_approxy(key,0.99999),count(distinct key),sume(distinc 
key) from src limit 1
    > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call 
to dataType on unresolved object, tree: 'percentile_approx(CAST(src.`key` AS 
DOUBLE), CAST(0.99999BD AS DOUBLE), 10000)
    > at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:92)
    >     at 
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$.org$apache$spark$sql$catalyst$optimizer$RewriteDistinctAggregates$$nullify(RewriteDistinctAggregates.scala:261)
    
    **After sql result**
    
    > select percentile_approxy(key,0.99999),count(distinct key),sume(distinc 
key) from src limit 1
    > [498.0,309,79136]
    ## How was this patch tested?
    
    Add a test case in HiveUDFSuit.
    
    Author: root <root@iZbp1gsnrlfzjxh82cz80vZ.(none)>
    
    Closes #15668 from windpiger/RewriteDistinctUDAFUnresolveExcep.
    
    (cherry picked from commit c291bd2745a8a2e4ba91d8697879eb8da10287e2)
    Signed-off-by: Herman van Hovell <[email protected]>

commit ef6b6d3d4790c1da7e3fddb961dd8efc977e033f
Author: chie8842 <[email protected]>
Date:   2016-11-08T13:45:37Z

    [SPARK-13770][DOCUMENTATION][ML] Document the ML feature Interaction
    
    I created Scala and Java example and added documentation.
    
    Author: chie8842 <[email protected]>
    
    Closes #15658 from hayashidac/SPARK-13770.
    
    (cherry picked from commit ee2e741ac16b01d9cae0eadd35af774547bbd415)
    Signed-off-by: Sean Owen <[email protected]>

commit 9595a71066ff222b28c7505db6b9426d78acaea8
Author: Wenchen Fan <[email protected]>
Date:   2016-11-08T14:28:29Z

    [SPARK-18346][SQL] TRUNCATE TABLE should fail if no partition is matched 
for the given non-partial partition spec
    
    ## What changes were proposed in this pull request?
    
    a follow up of https://github.com/apache/spark/pull/15688
    
    ## How was this patch tested?
    
    updated test in `DDLSuite`
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #15805 from cloud-fan/truncate.
    
    (cherry picked from commit 73feaa30ebfb62c81c7ce2c60ce2163611dd8852)
    Signed-off-by: Wenchen Fan <[email protected]>

commit 876eee2b1610d7de5ed6f86d06bf6105d7c9de16
Author: Kishor Patil <[email protected]>
Date:   2016-11-08T18:13:09Z

    [SPARK-18357] Fix yarn files/archive broken issue andd unit tests
    
    ## What changes were proposed in this pull request?
    
    The #15627 broke functionality with yarn --files --archives does not accept 
any files.
    This patch ensures that --files and --archives accept unique files.
    
    ## How was this patch tested?
    
    A. I added unit tests.
    B. Also, manually tested --files with --archives to throw exception if 
duplicate files are specified and continue if unique files are specified.
    
    Author: Kishor Patil <[email protected]>
    
    Closes #15810 from kishorvpatil/SPARK18357.
    
    (cherry picked from commit 245e5a2f80e3195b7f8a38b480b29bfc23af66bf)
    Signed-off-by: Tom Graves <[email protected]>

commit 21bbf94b41fbd193e370a3820131e449aaf0e3db
Author: Joseph K. Bradley <[email protected]>
Date:   2016-11-08T20:58:29Z

    [SPARK-17748][ML] Minor cleanups to one-pass linear regression with elastic 
net
    
    ## What changes were proposed in this pull request?
    
    * Made SingularMatrixException private ml
    * WeightedLeastSquares: Changed to allow tol >= 0 instead of only tol > 0
    
    ## How was this patch tested?
    
    existing tests
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #15779 from jkbradley/wls-cleanups.
    
    (cherry picked from commit 26e1c53aceee37e3687a372ff6c6f05463fd8a94)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit ba80eaf72a9d78a3838677595b42b4bffdda0357
Author: Shixiong Zhu <[email protected]>
Date:   2016-11-08T21:14:56Z

    [SPARK-18280][CORE] Fix potential deadlock in 
`StandaloneSchedulerBackend.dead`
    
    ## What changes were proposed in this pull request?
    
    "StandaloneSchedulerBackend.dead" is called in a RPC thread, so it should 
not call "SparkContext.stop" in the same thread. "SparkContext.stop" will block 
until all RPC threads exit, if it's called inside a RPC thread, it will be 
dead-lock.
    
    This PR add a thread local flag inside RPC threads. `SparkContext.stop` 
uses it to decide if launching a new thread to stop the SparkContext.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15775 from zsxwing/SPARK-18280.

commit 988f9080a08e861c34a8734de8304e6e0e5a22c7
Author: Burak Yavuz <[email protected]>
Date:   2016-11-08T23:08:09Z

    [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore
    
    ## What changes were proposed in this pull request?
    
    If the rename operation in the state store fails (`fs.rename` returns 
`false`), the StateStore should throw an exception and have the task retry. 
Currently if renames fail, nothing happens during execution immediately. 
However, you will observe that snapshot operations will fail, and then any 
attempt at recovery (executor failure / checkpoint recovery) also fails.
    
    ## How was this patch tested?
    
    Unit test
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #15804 from brkyvz/rename-state.
    
    (cherry picked from commit 6f7ecb0f2975d24a71e4240cf623f5bd8992bbeb)
    Signed-off-by: Tathagata Das <[email protected]>

commit 98dd7ac719d592e64488a4ecd1ea3543b326fe29
Author: Felix Cheung <[email protected]>
Date:   2016-11-09T00:00:45Z

    [SPARK-18239][SPARKR] Gradient Boosted Tree for R
    
    ## What changes were proposed in this pull request?
    
    Gradient Boosted Tree in R.
    With a few minor improvements to RandomForest in R.
    
    Since this is relatively isolated I'd like to target this for branch-2.1
    
    ## How was this patch tested?
    
    manual tests, unit tests
    
    Author: Felix Cheung <[email protected]>
    
    Closes #15746 from felixcheung/rgbt.
    
    (cherry picked from commit 55964c15a7b639f920dfe6c104ae4fdcd673705c)
    Signed-off-by: Felix Cheung <[email protected]>

commit 0dc14f12917626a5d7f0c9a21e4edd0b63587470
Author: Eric Liang <[email protected]>
Date:   2016-11-09T07:00:46Z

    [SPARK-18333][SQL] Revert hacks in parquet and orc reader to support case 
insensitive resolution
    
    ## What changes were proposed in this pull request?
    
    These are no longer needed after 
https://issues.apache.org/jira/browse/SPARK-17183
    
    cc cloud-fan
    
    ## How was this patch tested?
    
    Existing parquet and orc tests.
    
    Author: Eric Liang <[email protected]>
    
    Closes #15799 from ericl/sc-4929.
    
    (cherry picked from commit 4afa39e223c70e91b6ee19e9ea76fa9115203d74)
    Signed-off-by: Wenchen Fan <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16181: Branch 2.1

Reply via email to