GitHub user ahshahid opened a pull request:

    https://github.com/apache/spark/pull/15549

    Bootstrap perf

    reducing the generated code for struct, if the field elements of the struct 
are of same primitive data type.
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SnappyDataInc/spark bootstrap_perf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15549
    
----
commit 0e9333b275c8307a24bb6c7e8409ea48d4bac3d6
Author: Ryan Blue <b...@apache.org>
Date:   2016-07-08T19:37:26Z

    [SPARK-16420] Ensure compression streams are closed.
    
    ## What changes were proposed in this pull request?
    
    This uses the try/finally pattern to ensure streams are closed after use. 
`UnsafeShuffleWriter` wasn't closing compression streams, causing them to leak 
resources until garbage collected. This was causing a problem with codecs that 
use off-heap memory.
    
    ## How was this patch tested?
    
    Current tests are sufficient. This should not change behavior.
    
    Author: Ryan Blue <b...@apache.org>
    
    Closes #14093 from rdblue/SPARK-16420-unsafe-shuffle-writer-leak.
    
    (cherry picked from commit 67e085ef6dd62774095f3187844c091db1a6a72c)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit e3424fd7716d0c3f6ce82acd200bda704e42d3eb
Author: wujian <jan.chou...@gmail.com>
Date:   2016-07-08T21:38:05Z

    [SPARK-16281][SQL] Implement parse_url SQL function
    
    ## What changes were proposed in this pull request?
    
    This PR adds parse_url SQL functions in order to remove Hive fallback.
    
    A new implementation of #13999
    
    ## How was this patch tested?
    
    Pass the exist tests including new testcases.
    
    Author: wujian <jan.chou...@gmail.com>
    
    Closes #14008 from janplus/SPARK-16281.
    
    (cherry picked from commit f5fef69143b2a83bb8b168b7417e92659af0c72c)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 07f562f5881f1896a41077a367c31af704551d78
Author: Yin Huai <yh...@databricks.com>
Date:   2016-07-08T22:56:46Z

    [SPARK-16453][BUILD] release-build.sh is missing hive-thriftserver for 
scala 2.10
    
    ## What changes were proposed in this pull request?
    This PR adds hive-thriftserver profile to scala 2.10 build created by 
release-build.sh.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #14108 from yhuai/SPARK-16453.
    
    (cherry picked from commit 60ba436b7010436c77dfe5219a9662accc25bffa)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 463cbf72fd6db1d0646df432f56cd121b0eed625
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-08T23:07:12Z

    [SPARK-16387][SQL] JDBC Writer should use dialect to quote field names.
    
    ## What changes were proposed in this pull request?
    
    Currently, JDBC Writer uses dialects to get datatypes, but doesn't to quote 
field names. This PR uses dialects to quote the field names, too.
    
    **Reported Error Scenario (MySQL case)**
    ```scala
    scala> val url="jdbc:mysql://localhost:3306/temp"
    scala> val prop = new java.util.Properties
    scala> prop.setProperty("user","root")
    scala> spark.createDataset(Seq("a","b","c")).toDF("order")
    scala> df.write.mode("overwrite").jdbc(url, "temptable", prop)
    ...MySQLSyntaxErrorException: ... near 'order TEXT )
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins tests and manually do the above case.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14107 from dongjoon-hyun/SPARK-16387.
    
    (cherry picked from commit 3b22291b5f0317609cd71ce7af78e4c5063d66e8)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit c425230fdf1654aecaa84aba02b6844923c56d61
Author: cody koeninger <c...@koeninger.org>
Date:   2016-07-09T00:47:58Z

    [SPARK-13569][STREAMING][KAFKA] pattern based topic subscription
    
    ## What changes were proposed in this pull request?
    Allow for kafka topic subscriptions based on a regex pattern.
    
    ## How was this patch tested?
    Unit tests, manual tests
    
    Author: cody koeninger <c...@koeninger.org>
    
    Closes #14026 from koeninger/SPARK-13569.
    
    (cherry picked from commit fd6e8f0e2269a2e7f24f79d5c2041816ea308c86)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 16202ba684eae8d200e063abfe154c3d1b8106a5
Author: Sean Owen <so...@cloudera.com>
Date:   2016-07-09T03:17:50Z

    [SPARK-16376][WEBUI][SPARK WEB UI][APP-ID] HTTP ERROR 500 when using rest 
api "/applications//jobs" if array "stageIds" is empty
    
    ## What changes were proposed in this pull request?
    
    Avoid error finding max of empty Seq when stageIds is empty. It does fix 
the immediate problem; I don't know if it results in meaningful output, but not 
an error at least.
    
    ## How was this patch tested?
    
    Jenkins tests
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #14105 from srowen/SPARK-16376.
    
    (cherry picked from commit 6cef0183c0f0392dad78fec54635afdb9341b7f3)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 5024c4cb8f08019197670151d9bf9299e30586e4
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-09T03:18:49Z

    [SPARK-16432] Empty blocks fail to serialize due to assert in 
ChunkedByteBuffer
    
    ## What changes were proposed in this pull request?
    
    It's possible to also change the callers to not pass in empty chunks, but 
it seems cleaner to just allow `ChunkedByteBuffer` to handle empty arrays. cc 
JoshRosen
    
    ## How was this patch tested?
    
    Unit tests, also checked that the original reproduction case in 
https://github.com/apache/spark/pull/11748#issuecomment-230760283 is resolved.
    
    Author: Eric Liang <e...@databricks.com>
    
    Closes #14099 from ericl/spark-16432.
    
    (cherry picked from commit d8b06f18dc3e35938d15099beac98221d6f528b5)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 50d7002b6aa95bad2a89f771f02e629ca7fc524f
Author: Michael Gummelt <mgumm...@mesosphere.io>
Date:   2016-07-09T03:20:26Z

    [SPARK-11857][MESOS] Deprecate fine grained
    
    ## What changes were proposed in this pull request?
    
    Documentation changes to indicate that fine-grained mode is now deprecated. 
 No code changes were made, and all fine-grained mode instructions were left in 
place.  We can remove all of that once the deprecation cycle completes (Does 
Spark have a standard deprecation cycle?  One major version?)
    
    Blocked on https://github.com/apache/spark/pull/14059
    
    ## How was this patch tested?
    
    Viewed in Github
    
    Author: Michael Gummelt <mgumm...@mesosphere.io>
    
    Closes #14078 from mgummelt/deprecate-fine-grained.
    
    (cherry picked from commit b1db26acc51003e68e4e8d7d324cf74e3aa03cfd)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit a33643cbf0f8b68bde5bd6f9a706ee0f5be377f9
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-07-09T12:35:45Z

    [SPARK-16401][SQL] Data Source API: Enable Extending RelationProvider and 
CreatableRelationProvider without Extending SchemaRelationProvider
    
    #### What changes were proposed in this pull request?
    When users try to implement a data source API with extending only 
`RelationProvider` and `CreatableRelationProvider`, they will hit an error when 
resolving the relation.
    ```Scala
    spark.read
    .format("org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema")
      .load()
      .write.
    format("org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema")
      .save()
    ```
    
    The error they hit is like
    ```
    org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema does not 
allow user-specified schemas.;
    org.apache.spark.sql.AnalysisException: 
org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema does not 
allow user-specified schemas.;
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:319)
        at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:494)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
    ```
    
    Actually, the bug fix is simple. 
[`DataSource.createRelation(sparkSession.sqlContext, mode, options, 
data)`](https://github.com/gatorsmile/spark/blob/dd644f8117e889cebd6caca58702a7c7e3d88bef/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L429)
 already returns a BaseRelation. We should not assign schema to 
`userSpecifiedSchema`. That schema assignment only makes sense for the data 
sources that extend `FileFormat`.
    
    #### How was this patch tested?
    Added a test case.
    
    Author: gatorsmile <gatorsm...@gmail.com>
    
    Closes #14075 from gatorsmile/dataSource.
    
    (cherry picked from commit 7374e518e2641fddfe57003340db410224b37581)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit 139d5eae038d846081e8c92518bdf7923d984afa
Author: Reynold Xin <r...@databricks.com>
Date:   2016-07-11T05:05:16Z

    [SPARK-16476] Restructure MimaExcludes for easier union excludes
    
    ## What changes were proposed in this pull request?
    It is currently fairly difficult to have proper mima excludes when we cut a 
version branch. I'm proposing a small change to take the exclude list out of 
the exclude function, and put it in a variable so we can easily union excludes.
    
    After this change, we can bump pom.xml version to 2.1.0-SNAPSHOT, without 
bumping the diff base version. Note that I also deleted all the exclude rules 
for version 1.x, to cut down the size of the file.
    
    ## How was this patch tested?
    N/A - this is a build infra change.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14128 from rxin/SPARK-16476.
    
    (cherry picked from commit 52b5bb0b7fabe6cc949f514c548f9fbc6a4fa181)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit aa8cbcd199b5dcfd95b6a5e6f214f291e27d5781
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-07-11T08:21:13Z

    [SPARK-16355][SPARK-16354][SQL] Fix Bugs When LIMIT/TABLESAMPLE is 
Non-foldable, Zero or Negative
    
    #### What changes were proposed in this pull request?
    **Issue 1:** When a query containing LIMIT/TABLESAMPLE 0, the statistics 
could be zero. Results are correct but it could cause a huge performance 
regression. For example,
    ```Scala
    Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
      .createOrReplaceTempView("test")
    val df1 = spark.table("test")
    val df2 = spark.table("test").limit(0)
    val df = df1.join(df2, Seq("k"), "left")
    ```
    The statistics of both `df` and `df2` are zero. The statistics values 
should never be zero; otherwise `sizeInBytes` of `BinaryNode` will also be zero 
(product of children). This PR is to increase it to `1` when the num of rows is 
equal to 0.
    
    **Issue 2:** When a query containing negative LIMIT/TABLESAMPLE, we should 
issue exceptions. Negative values could break the implementation assumption of 
multiple parts. For example, statistics calculation.  Below is the example 
query.
    ```SQL
    SELECT * FROM testData TABLESAMPLE (-1 rows)
    SELECT * FROM testData LIMIT -1
    ```
    This PR is to issue an appropriate exception in this case.
    
    **Issue 3:** Spark SQL follows the restriction of LIMIT clause in Hive. The 
argument to the LIMIT clause must evaluate to a constant value. It can be a 
numeric literal, or another kind of numeric expression involving operators, 
casts, and function return values. You cannot refer to a column or use a 
subquery. Currently, we do not detect whether the expression in LIMIT clause is 
foldable or not. If non-foldable, we might issue a strange error message. For 
example,
    ```SQL
    SELECT * FROM testData LIMIT rand() > 0.2
    ```
    Then, a misleading error message is issued, like
    ```
    assertion failed: No plan for GlobalLimit (_nondeterministic#203 > 0.2)
    +- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
       +- LocalLimit (_nondeterministic#202 > 0.2)
          +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
             +- LogicalRDD [key#11, value#12]
    
    java.lang.AssertionError: assertion failed: No plan for GlobalLimit 
(_nondeterministic#203 > 0.2)
    +- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
       +- LocalLimit (_nondeterministic#202 > 0.2)
          +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
             +- LogicalRDD [key#11, value#12]
    ```
    This PR detects it and then issues a meaningful error message.
    
    #### How was this patch tested?
    Added test cases.
    
    Author: gatorsmile <gatorsm...@gmail.com>
    
    Closes #14034 from gatorsmile/limit.
    
    (cherry picked from commit e22627894126dceb7491300b63f1fe028b1e2e2c)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit 7e4ba66d938a8bf312e991dfa034d420a0b7b360
Author: Xin Ren <iamsh...@126.com>
Date:   2016-07-11T12:05:28Z

    [SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide for R 
language binding
    
    https://issues.apache.org/jira/browse/SPARK-16381
    
    ## What changes were proposed in this pull request?
    
    Update SQL examples and programming guide for R language binding.
    
    Here I just follow example 
https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction,
 created a separate R file to store all the example code.
    
    ## How was this patch tested?
    
    Manual test on my local machine.
    Screenshot as below:
    
    ![screen shot 2016-07-06 at 4 52 25 
pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png)
    
    Author: Xin Ren <iamsh...@126.com>
    
    Closes #14082 from keypointt/SPARK-16381.
    
    (cherry picked from commit 9cb1eb7af779e74165552977002158a7dad9bb09)
    Signed-off-by: Cheng Lian <l...@databricks.com>

commit f97dd8a8fd61ab1964b4a7dc4fd0ddecf801c612
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-11T13:15:47Z

    [SPARK-16459][SQL] Prevent dropping current database
    
    This PR prevents dropping the current database to avoid errors like the 
followings.
    
    ```scala
    scala> sql("create database delete_db")
    scala> sql("use delete_db")
    scala> sql("drop database delete_db")
    scala> sql("create table t as select 1")
    org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 
`delete_db` not found;
    ```
    
    Pass the Jenkins tests including an updated testcase.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14115 from dongjoon-hyun/SPARK-16459.
    
    (cherry picked from commit 7ac79da0e4607f7f89a3617edf53c2b174b378e8)
    Signed-off-by: Herman van Hovell <hvanhov...@databricks.com>

commit 72cf743240c2f36fb45f5bf44be2ca16367320fc
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-07-11T19:42:43Z

    [SPARK-16318][SQL] Implement all remaining xpath functions (branch-2.0)
    
    ## What changes were proposed in this pull request?
    This patch implements all remaining xpath functions that Hive supports and 
not natively supported in Spark: xpath_int, xpath_short, xpath_long, 
xpath_float, xpath_double, xpath_string, and xpath.
    
    This is based on https://github.com/apache/spark/pull/13991 but for 
branch-2.0.
    
    ## How was this patch tested?
    Added unit tests and end-to-end tests.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14131 from petermaxlee/xpath-branch-2.0.

commit aea33bf05fef49683eaa858f653aad5a30f37e4a
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-11T20:45:22Z

    [SPARK-16458][SQL] SessionCatalog should support `listColumns` for 
temporary tables
    
    ## What changes were proposed in this pull request?
    
    Temporary tables are used frequently, but `spark.catalog.listColumns` does 
not support those tables. This PR make `SessionCatalog` supports temporary 
table column listing.
    
    **Before**
    ```scala
    scala> spark.range(10).createOrReplaceTempView("t1")
    
    scala> spark.catalog.listTables().collect()
    res1: Array[org.apache.spark.sql.catalog.Table] = Array(Table[name=`t1`, 
tableType=`TEMPORARY`, isTemporary=`true`])
    
    scala> spark.catalog.listColumns("t1").collect()
    org.apache.spark.sql.AnalysisException: Table `t1` does not exist in 
database `default`.;
    ```
    
    **After**
    ```
    scala> spark.catalog.listColumns("t1").collect()
    res2: Array[org.apache.spark.sql.catalog.Column] = Array(Column[name='id', 
description='id', dataType='bigint', nullable='false', isPartition='false', 
isBucket='false'])
    ```
    ## How was this patch tested?
    
    Pass the Jenkins tests including a new testcase.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14114 from dongjoon-hyun/SPARK-16458.
    
    (cherry picked from commit 840853ed06d63694bf98b21a889a960aac6ac0ac)
    Signed-off-by: Herman van Hovell <hvanhov...@databricks.com>

commit b938ca76ebd92e17233addfc29cb7c3692957a7b
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2016-07-11T21:31:11Z

    [SPARKR][DOC] SparkR ML user guides update for 2.0
    
    ## What changes were proposed in this pull request?
    * Update SparkR ML section to make them consistent with SparkR API docs.
    * Since #13972 adds labelling support for the ```include_example``` Jekyll 
plugin, so that we can split the single ```ml.R``` example file into multiple 
line blocks with different labels, and include them in different 
algorithms/models in the generated HTML page.
    
    ## How was this patch tested?
    Only docs update, manually check the generated docs.
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #14011 from yanboliang/r-user-guide-update.
    
    (cherry picked from commit 2ad031be67c7a0f0c4895c084c891330a9ec935e)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit cb463b6db30491e4e881b8fb5981dfdbf9e73d34
Author: Felix Cheung <felixcheun...@hotmail.com>
Date:   2016-07-11T21:34:48Z

    [SPARK-16144][SPARKR] update R API doc for mllib
    
    ## What changes were proposed in this pull request?
    
    From SPARK-16140/PR #13921 - the issue is we left write.ml doc empty:
    
![image](https://cloud.githubusercontent.com/assets/8969467/16481934/856dd0ea-3e62-11e6-9474-e4d57d1ca001.png)
    
    Here's what I meant as the fix:
    
![image](https://cloud.githubusercontent.com/assets/8969467/16481943/911f02ec-3e62-11e6-9d68-17363a9f5628.png)
    
    
![image](https://cloud.githubusercontent.com/assets/8969467/16481950/9bc057aa-3e62-11e6-8127-54870701c4b1.png)
    
    I didn't realize there was already a JIRA on this. mengxr yanboliang
    
    ## How was this patch tested?
    
    check doc generated.
    
    Author: Felix Cheung <felixcheun...@hotmail.com>
    
    Closes #13993 from felixcheung/rmllibdoc.
    
    (cherry picked from commit 7f38b9d5f469b2550bc481cbf9adb9acc3779712)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit 02d584ccbb95daae3607bd733ab37764ec454c84
Author: James Thomas <jamesjoetho...@gmail.com>
Date:   2016-07-12T00:57:51Z

    [SPARK-16114][SQL] structured streaming event time window example
    
    ## What changes were proposed in this pull request?
    
    A structured streaming example with event time windowing.
    
    ## How was this patch tested?
    
    Run locally
    
    Author: James Thomas <jamesjoetho...@gmail.com>
    
    Closes #13957 from jjthomas/current.
    
    (cherry picked from commit 9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 81d7f484ac3b68792e49e47c2b7c9994cf17487a
Author: Xin Ren <iamsh...@126.com>
Date:   2016-07-12T01:09:14Z

    [MINOR][STREAMING][DOCS] Minor changes on kinesis integration
    
    ## What changes were proposed in this pull request?
    
    Some minor changes for documentation page "Spark Streaming + Kinesis 
Integration".
    
    Moved "streaming-kinesis-arch.png" before the bullet list, not in between 
the bullets.
    
    ## How was this patch tested?
    
    Tested manually, on my local machine.
    
    Author: Xin Ren <iamsh...@126.com>
    
    Closes #14097 from keypointt/kinesisDoc.
    
    (cherry picked from commit 05d7151ccbccdd977ec2f2301d5b12566018c988)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit b716e104b917a598d4e56abcfa1517a36b9232a6
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-07-12T01:11:06Z

    [SPARK-16433][SQL] Improve StreamingQuery.explain when no data arrives
    
    ## What changes were proposed in this pull request?
    
    Display `No physical plan. Waiting for data.` instead of `N/A`  for 
StreamingQuery.explain when no data arrives because `N/A` doesn't provide 
meaningful information.
    
    ## How was this patch tested?
    
    Existing unit tests.
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #14100 from zsxwing/SPARK-16433.
    
    (cherry picked from commit 91a443b849e4d1ccc50a32b25fdd2bb502cf9b84)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit b37177c22f5c0f927b8d9f3a38dba9617d36c944
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-07-12T01:41:36Z

    [SPARK-16430][SQL][STREAMING] Fixed bug in the maxFilesPerTrigger in 
FileStreamSource
    
    ## What changes were proposed in this pull request?
    
    Incorrect list of files were being allocated to a batch. This caused a file 
to read multiple times in the multiple batches.
    
    ## How was this patch tested?
    
    Added unit tests
    
    Author: Tathagata Das <tathagata.das1...@gmail.com>
    
    Closes #14143 from tdas/SPARK-16430-1.
    
    (cherry picked from commit e50efd53f073890d789a8448f850cc219cca7708)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 689261465ad1dd443ebf764ad837243418b986ef
Author: Sameer Agarwal <sam...@databricks.com>
Date:   2016-07-12T03:26:01Z

    [SPARK-16488] Fix codegen variable namespace collision in pmod and 
partitionBy
    
    This patch fixes a variable namespace collision bug in pmod and partitionBy
    
    Regression test for one possible occurrence. A more general fix in 
`ExpressionEvalHelper.checkEvaluation` will be in a subsequent PR.
    
    Author: Sameer Agarwal <sam...@databricks.com>
    
    Closes #14144 from sameeragarwal/codegen-bug.
    
    (cherry picked from commit 9cc74f95edb6e4f56151966139cd0dc24e377949)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 9e0d2e22637f6cef2ab91aadcdeb8f06f677e397
Author: WeichenXu <weichenxu...@outlook.com>
Date:   2016-07-12T08:23:59Z

    [MINOR][ML] update comment where is inconsistent with code in 
ml.regression.LinearRegression
    
    ## What changes were proposed in this pull request?
    
    In `train` method of `ml.regression.LinearRegression` when handling 
situation `std(label) == 0`
    the code replace `std(label)` with `mean(label)` but the relative comment 
is inconsistent, I update it.
    
    ## How was this patch tested?
    
    N/A
    
    Author: WeichenXu <weichenxu...@outlook.com>
    
    Closes #14121 from WeichenXu123/update_lr_comment.
    
    (cherry picked from commit fc11c509e234c5414687f7fbd13af113a1f52f10)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 7b63e7d924cb82db37fe5d0f9b35f556bab37d39
Author: WeichenXu <weichenxu...@outlook.com>
Date:   2016-07-12T12:04:34Z

    [SPARK-16470][ML][OPTIMIZER] Check linear regression training whether 
actually reach convergence and add warning if not
    
    ## What changes were proposed in this pull request?
    
    In `ml.regression.LinearRegression`, it use breeze `LBFGS` and `OWLQN` 
optimizer to do data training, but do not check whether breeze's optimizer 
returned result actually reached convergence.
    
    The `LBFGS` and `OWLQN` optimizer in breeze finish iteration may result the 
following situations:
    
    1) reach max iteration number
    2) function reach value convergence
    3) objective function stop improving
    4) gradient reach convergence
    5) search failed(due to some internal numerical error)
    
    I add warning printing code so that
    if the iteration result is (1) or (3) or (5) in above, it will print a 
warning with respective reason string.
    
    ## How was this patch tested?
    
    Manual.
    
    Author: WeichenXu <weichenxu...@outlook.com>
    
    Closes #14122 from WeichenXu123/add_lr_not_convergence_warn.
    
    (cherry picked from commit 6cb75db9ab1a4f227069bec2763b89546b88b0ee)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit f419476546f133040a21d7662b6509185f1a5d53
Author: Reynold Xin <r...@databricks.com>
Date:   2016-07-12T17:07:23Z

    [SPARK-16489][SQL] Guard against variable reuse mistakes in expression code 
generation
    
    In code generation, it is incorrect for expressions to reuse variable names 
across different instances of itself. As an example, SPARK-16488 reports a bug 
in which pmod expression reuses variable name "r".
    
    This patch updates ExpressionEvalHelper test harness to always project two 
instances of the same expression, which will help us catch variable reuse 
problems in expression unit tests. This patch also fixes the bug in crc32 
expression.
    
    This is a test harness change, but I also created a new test suite for 
testing the test harness.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14146 from rxin/SPARK-16489.
    
    (cherry picked from commit c377e49e38a290e5c4fbc178278069788674dfb7)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 2f47b37784288b533405d7ef1cad1f7bac324ec0
Author: sharkd <sharkd...@gmail.com>
Date:   2016-07-12T17:10:35Z

    [SPARK-16414][YARN] Fix bugs for "Can not get user config when calling 
SparkHadoopUtil.get.conf on yarn cluser mode"
    
    ## What changes were proposed in this pull request?
    
    The `SparkHadoopUtil` singleton was instantiated before `ApplicationMaster` 
in `ApplicationMaster.main` when deploying spark on yarn cluster mode, the 
`conf` in the `SparkHadoopUtil` singleton didn't include user's configuration.
    
    So, we should load the properties file with the Spark configuration and set 
entries as system properties before `SparkHadoopUtil` first instantiate.
    
    ## How was this patch tested?
    
    Add a test case
    
    Author: sharkd <sharkd...@gmail.com>
    Author: sharkdtu <shark...@tencent.com>
    
    Closes #14088 from sharkdtu/master.
    
    (cherry picked from commit d513c99c19e229f72d03006e251725a43c13fefd)

commit 4303d292b55fc8709780994b05b41e73a52c001a
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-07-13T00:05:20Z

    [SPARK-16284][SQL] Implement reflect SQL function
    
    ## What changes were proposed in this pull request?
    This patch implements reflect SQL function, which can be used to invoke a 
Java method in SQL. Slightly different from Hive, this implementation requires 
the class name and the method name to be literals. This implementation also 
supports only a smaller number of data types, and requires the function to be 
static, as suggested by rxin in #13969.
    
    java_method is an alias for reflect, so this should also resolve 
SPARK-16277.
    
    ## How was this patch tested?
    Added expression unit tests and an end-to-end test.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14138 from petermaxlee/reflect-static.
    
    (cherry picked from commit 56bd399a86c4e92be412d151200cb5e4a5f6a48a)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit 41df62c595474d7afda6dbe76a558d8cb3be7ff2
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-13T06:09:02Z

    [SPARK-16514][SQL] Fix various regex codegen bugs
    
    ## What changes were proposed in this pull request?
    
    RegexExtract and RegexReplace currently crash on non-nullable input due use 
of a hard-coded local variable name (e.g. compiles fail with 
`java.lang.Exception: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 85, 
Column 26: Redefinition of local variable "m" `).
    
    This changes those variables to use fresh names, and also in a few other 
places.
    
    ## How was this patch tested?
    
    Unit tests. rxin
    
    Author: Eric Liang <e...@databricks.com>
    
    Closes #14168 from ericl/sc-3906.
    
    (cherry picked from commit 1c58fa905b6543d366d00b2e5394dfd633987f6d)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 5173f847c55a7b810d1c494c8b23c740ba110c39
Author: aokolnychyi <okolnychyyan...@gmail.com>
Date:   2016-07-13T08:12:05Z

    [SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examples
    
    - Hard-coded Spark SQL sample snippets were moved into source files under 
examples sub-project.
    - Removed the inconsistency between Scala and Java Spark SQL examples
    - Scala and Java Spark SQL examples were updated
    
    The work is still in progress. All involved examples were tested manually. 
An additional round of testing will be done after the code review.
    
    
![image](https://cloud.githubusercontent.com/assets/6235869/16710314/51851606-462a-11e6-9fbe-0818daef65e4.png)
    
    Author: aokolnychyi <okolnychyyan...@gmail.com>
    
    Closes #14119 from aokolnychyi/spark_16303.
    
    (cherry picked from commit 772c213ec702c80d0f25aa6f30b2dffebfbe2d0d)
    Signed-off-by: Cheng Lian <l...@databricks.com>

commit 4b93a833b75d72043fd7770250c25247e690666d
Author: Sean Owen <so...@cloudera.com>
Date:   2016-07-13T09:44:07Z

    [SPARK-15889][STREAMING] Follow-up fix to erroneous condition in StreamTest
    
    ## What changes were proposed in this pull request?
    
    A second form of AssertQuery now actually invokes the condition; avoids a 
build warning too
    
    ## How was this patch tested?
    
    Jenkins; running StreamTest
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #14133 from srowen/SPARK-15889.2.
    
    (cherry picked from commit c190d89bd3cf677400c49238498207b87da9ee78)
    Signed-off-by: Sean Owen <so...@cloudera.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to