[GitHub] spark pull request #16752: Branch 2.0

kishorbp Tue, 31 Jan 2017 00:55:40 -0800

GitHub user kishorbp opened a pull request:

    https://github.com/apache/spark/pull/16752


    Branch 2.0

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16752
    
----
commit b25a8e6e167717fbe92e6a9b69a8a2510bf926ca
Author: frreiss <[email protected]>
Date:   2016-09-22T09:31:15Z

    [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.
    
    ## What changes were proposed in this pull request?
    
    Modified the documentation to clarify that `build/mvn` and `pom.xml` always 
add Java 7-specific parameters to `MAVEN_OPTS`, and that developers can safely 
ignore warnings about `-XX:MaxPermSize` that may result from compiling or 
running tests with Java 8.
    
    ## How was this patch tested?
    
    Rebuilt HTML documentation, made sure that building-spark.html displays 
correctly in a browser.
    
    Author: frreiss <[email protected]>
    
    Closes #15005 from frreiss/fred-17421a.
    
    (cherry picked from commit 646f383465c123062cbcce288a127e23984c7c7f)
    Signed-off-by: Sean Owen <[email protected]>

commit f14f47f072a392df0ebe908f1c57b6eb858105b7
Author: Shivaram Venkataraman <[email protected]>
Date:   2016-09-22T18:52:42Z

    Skip building R vignettes if Spark is not built
    
    ## What changes were proposed in this pull request?
    
    When we build the docs separately we don't have the JAR files from the 
Spark build in
    the same tree. As the SparkR vignettes need to launch a SparkContext to be 
built, we skip building them if JAR files don't exist
    
    ## How was this patch tested?
    
    To test this we can run the following:
    ```
    build/mvn -DskipTests -Psparkr clean
    ./R/create-docs.sh
    ```
    You should see a line `Skipping R vignettes as Spark JARs not found` at the 
end
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #15200 from shivaram/sparkr-vignette-skip.
    
    (cherry picked from commit 9f24a17c59b1130d97efa7d313c06577f7344338)
    Signed-off-by: Reynold Xin <[email protected]>

commit 243bdb11d89ee379acae1ea1ed78df10797e86d1
Author: Burak Yavuz <[email protected]>
Date:   2016-09-22T20:05:41Z

    [SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames
    
    Consider you have a bucket as `s3a://some-bucket`
    and under it you have files:
    ```
    s3a://some-bucket/file1.parquet
    s3a://some-bucket/file2.parquet
    ```
    Getting the parent path of `s3a://some-bucket/file1.parquet` yields
    `s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the 
hash map.
    
    When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at 
the end) to get the list of files, and we're left with an empty list!
    
    This PR fixes this by adding a `/` at the end of the `URI` iff the given 
`Path` doesn't have a parent, i.e. is the root. This is a no-op if the path 
already had a `/` at the end, and is handled through the Hadoop Path, path 
merging semantics.
    
    Unit test in `FileCatalogSuite`.
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #15169 from brkyvz/SPARK-17613.
    
    (cherry picked from commit 85d609cf25c1da2df3cd4f5d5aeaf3cbcf0d674c)
    Signed-off-by: Josh Rosen <[email protected]>

commit 47fc0b9f40d814bc8e19f86dad591d4aed467222
Author: Shixiong Zhu <[email protected]>
Date:   2016-09-22T21:26:45Z

    [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process 
is dead
    
    ## What changes were proposed in this pull request?
    
    When the Python process is dead, the JVM StreamingContext is still running. 
Hence we will see a lot of Py4jException before the JVM process exits. It's 
better to stop the JVM StreamingContext to avoid those annoying logs.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15201 from zsxwing/stop-jvm-ssc.
    
    (cherry picked from commit 3cdae0ff2f45643df7bc198cb48623526c7eb1a6)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 0a593db360b3b7771f45f482cf45e8500f0faa76
Author: Herman van Hovell <[email protected]>
Date:   2016-09-22T21:29:27Z

    [SPARK-17616][SQL] Support a single distinct aggregate combined with a 
non-partial aggregate
    
    We currently cannot execute an aggregate that contains a single distinct 
aggregate function and an one or more non-partially plannable aggregate 
functions, for example:
    ```sql
    select   grp,
             collect_list(col1),
             count(distinct col2)
    from     tbl_a
    group by 1
    ```
    This is a regression from Spark 1.6. This is caused by the fact that the 
single distinct aggregation code path assumes that all aggregates can be 
planned in two phases (is partially aggregatable). This PR works around this 
issue by triggering the `RewriteDistinctAggregates` in such cases (this is 
similar to the approach taken in 1.6).
    
    Created `RewriteDistinctAggregatesSuite` which checks if the aggregates 
with distinct aggregate functions get rewritten into two `Aggregates` and an 
`Expand`. Added a regression test to `DataFrameAggregateSuite`.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #15187 from hvanhovell/SPARK-17616.
    
    (cherry picked from commit 0d634875026ccf1eaf984996e9460d7673561f80)
    Signed-off-by: Herman van Hovell <[email protected]>

commit c2cb84165960998821c53d6a45507df639aa1425
Author: Burak Yavuz <[email protected]>
Date:   2016-09-23T00:22:04Z

    [SPARK-17599][SPARK-17569] Backport and to Spark 2.0 branch
    
    ## What changes were proposed in this pull request?
    
    This Backports PR #15153 and PR #15122 to Spark 2.0 branch for Structured 
Streaming.
    It is structured a bit differently because similar code paths already 
existed in the 2.0 branch. The unit test makes sure that both behaviors don't 
break.
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #15202 from brkyvz/backports-to-streaming.

commit 04141ad49806a48afccc236b699827997142bd57
Author: Patrick Wendell <[email protected]>
Date:   2016-09-23T00:43:50Z

    Preparing Spark release v2.0.1-rc2

commit c393d86d188bd94b8713c4e0f0885b3adf49176e
Author: Patrick Wendell <[email protected]>
Date:   2016-09-23T00:43:58Z

    Preparing development version 2.0.2-SNAPSHOT

commit 22216d6bd4270095f175d9f4333afe07e07a7303
Author: gatorsmile <[email protected]>
Date:   2016-09-23T01:56:40Z

    [SPARK-17502][17609][SQL][BACKPORT][2.0] Fix Multiple Bugs in DDL 
Statements on Temporary Views
    
    ### What changes were proposed in this pull request?
    This PR is to backport https://github.com/apache/spark/pull/15054 and 
https://github.com/apache/spark/pull/15160 to Spark 2.0.
    
    - When the permanent tables/views do not exist but the temporary view 
exists, the expected error should be `NoSuchTableException` for 
partition-related ALTER TABLE commands. However, it always reports a confusing 
error message. For example,
    ```
    Partition spec is invalid. The spec (a, b) must match the partition spec () 
defined in table '`testview`';
    ```
    - When the permanent tables/views do not exist but the temporary view 
exists, the expected error should be `NoSuchTableException` for `ALTER TABLE 
... UNSET TBLPROPERTIES`. However, it reports a missing table property. For 
example,
    ```
    Attempted to unset non-existent property 'p' in table '`testView`';
    ```
    - When `ANALYZE TABLE` is called on a view or a temporary view, we should 
issue an error message. However, it reports a strange error:
    ```
    ANALYZE TABLE is not supported for Project
    ```
    
    - When inserting into a temporary view that is generated from `Range`, we 
will get the following error message:
    ```
    assertion failed: No plan for 'InsertIntoTable Range (0, 10, step=1, 
splits=Some(1)), false, false
    +- Project [1 AS 1#20]
       +- OneRowRelation$
    ```
    
    This PR is to fix the above four issues.
    
    There is no place in Spark SQL that need `SessionCatalog.tableExists` to 
check temp views, so this PR makes `SessionCatalog.tableExists` only check 
permanent table/view and removes some hacks.
    
    ### How was this patch tested?
    Added multiple test cases
    
    Author: gatorsmile <[email protected]>
    
    Closes #15174 from gatorsmile/PR15054Backport.

commit 54d4eee51eca364d9334141f62e0478343345d06
Author: Gayathri Murali <[email protected]>
Date:   2016-09-23T05:44:20Z

    [SPARK-16240][ML] ML persistence backward compatibility for LDA - 2.0 
backport
    
    ## What changes were proposed in this pull request?
    
    Allow Spark 2.x to load instances of LDA, LocalLDAModel, and 
DistributedLDAModel saved from Spark 1.6.
    Backport of https://github.com/apache/spark/pull/15034 for branch-2.0
    
    ## How was this patch tested?
    
    I tested this manually, saving the 3 types from 1.6 and loading them into 
master (2.x).  In the future, we can add generic tests for testing backwards 
compatibility across all ML models in SPARK-15573.
    
    Author: Gayathri Murali <[email protected]>
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #15205 from jkbradley/lda-backward-2.0.

commit d3f90e71af57162afc0648adbc52b810a883ceac
Author: Shixiong Zhu <[email protected]>
Date:   2016-09-23T06:35:08Z

    [SPARK-17640][SQL] Avoid using -1 as the default batchId for 
FileStreamSource.FileEntry
    
    ## What changes were proposed in this pull request?
    
    Avoid using -1 as the default batchId for FileStreamSource.FileEntry so 
that we can make sure not writing any FileEntry(..., batchId = -1) into the 
log. This also avoids people misusing it in future (#15203 is an example).
    
    ## How was this patch tested?
    
    Jenkins.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15206 from zsxwing/cleanup.
    
    (cherry picked from commit 62ccf27ab4b55e734646678ae78b7e812262d14b)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 1a8ea000e7e16bdee54c47ab0f5e197c15f200a6
Author: Jeff Zhang <[email protected]>
Date:   2016-09-23T18:37:43Z

    [SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when 
running sparkr in RStudio
    
    ## What changes were proposed in this pull request?
    
    Spark will add sparkr.zip to archive only when it is yarn mode 
(SparkSubmit.scala).
    ```
        if (args.isR && clusterManager == YARN) {
          val sparkRPackagePath = RUtils.localSparkRPackagePath
          if (sparkRPackagePath.isEmpty) {
            printErrorAndExit("SPARK_HOME does not exist for R application in 
YARN mode.")
          }
          val sparkRPackageFile = new File(sparkRPackagePath.get, 
SPARKR_PACKAGE_ARCHIVE)
          if (!sparkRPackageFile.exists()) {
            printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R 
application in YARN mode.")
          }
          val sparkRPackageURI = 
Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString
    
          // Distribute the SparkR package.
          // Assigns a symbol link name "sparkr" to the shipped package.
          args.archives = mergeFileLists(args.archives, sparkRPackageURI + 
"#sparkr")
    
          // Distribute the R package archive containing all the built R 
packages.
          if (!RUtils.rPackages.isEmpty) {
            val rPackageFile =
              RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), 
R_PACKAGE_ARCHIVE)
            if (!rPackageFile.exists()) {
              printErrorAndExit("Failed to zip all the built R packages.")
            }
    
            val rPackageURI = 
Utils.resolveURI(rPackageFile.getAbsolutePath).toString
            // Assigns a symbol link name "rpkg" to the shipped package.
            args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg")
          }
        }
    ```
    So it is necessary to pass spark.master from R process to JVM. Otherwise 
sparkr.zip won't be distributed to executor.  Besides that I also pass 
spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need 
them to access secured cluster.
    
    ## How was this patch tested?
    
    Verify it manually in R Studio using the following code.
    ```
    Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark")
    .libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths()))
    library(SparkR)
    sparkR.session(master="yarn-client", sparkConfig = 
list(spark.executor.instances="1"))
    df <- as.DataFrame(mtcars)
    head(df)
    
    ```
    
    â¦
    
    Author: Jeff Zhang <[email protected]>
    
    Closes #14784 from zjffdu/SPARK-17210.
    
    (cherry picked from commit f62ddc5983a08d4d54c0a9a8210dd6cbec555671)
    Signed-off-by: Felix Cheung <[email protected]>

commit 452e468f280d69c930782a7588a87a816cc9585a
Author: Yanbo Liang <[email protected]>
Date:   2016-09-23T19:50:22Z

    [SPARK-17577][CORE][2.0 BACKPORT] Update SparkContext.addFile to make it 
work well on Windows
    
    ## What changes were proposed in this pull request?
    Update ```SparkContext.addFile``` to correct the use of ```URI``` and 
```Path```, then it can work well on Windows. This is used for branch-2.0 
backport, more details at #15131.
    
    ## How was this patch tested?
    Backport, checked by appveyor.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #15217 from yanboliang/uri-2.0.

commit b111a81f2a5547e2357d66db4ba2f05ce69a52a6
Author: Shivaram Venkataraman <[email protected]>
Date:   2016-09-23T21:35:18Z

    [SPARK-17651][SPARKR] Set R package version number along with mvn
    
    This PR sets the R package version while tagging releases. Note that since 
R doesn't accept `-SNAPSHOT` in version number field, we remove that while 
setting the next version
    
    Tested manually by running locally
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #15223 from shivaram/sparkr-version-change.
    
    (cherry picked from commit 7c382524a959a2bc9b3d2fca44f6f0b41aba4e3c)
    Signed-off-by: Reynold Xin <[email protected]>

commit 9d28cc10357a8afcfb2fa2e6eecb5c2cc2730d17
Author: Patrick Wendell <[email protected]>
Date:   2016-09-23T21:38:07Z

    Preparing Spark release v2.0.1-rc3

commit 5bc5b49fa0a5f3d395457aceff268938317f3718
Author: Patrick Wendell <[email protected]>
Date:   2016-09-23T21:38:13Z

    Preparing development version 2.0.2-SNAPSHOT

commit 9e91a1009e6f916245b4d4018de1664ea3decfe7
Author: Dhruve Ashar <[email protected]>
Date:   2016-09-23T21:59:53Z

    [SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue size 
configurable (branch 2.0)
    
    ## What changes were proposed in this pull request?
    
    Backport #14269 to 2.0.
    
    ## How was this patch tested?
    
    Jenkins.
    
    Author: Dhruve Ashar <[email protected]>
    
    Closes #15222 from zsxwing/SPARK-15703-2.0.

commit ed545763adc3f50569581c9b017b396e8997ac31
Author: Sean Owen <[email protected]>
Date:   2016-09-24T07:06:41Z

    [SPARK-10835][ML] Word2Vec should accept non-null string array, in addition 
to existing null string array
    
    ## What changes were proposed in this pull request?
    
    To match Tokenizer and for compatibility with Word2Vec, output a nullable 
string array type in NGram
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Sean Owen <[email protected]>
    
    Closes #15179 from srowen/SPARK-10835.
    
    (cherry picked from commit f3fe55439e4c865c26502487a1bccf255da33f4a)
    Signed-off-by: Sean Owen <[email protected]>

commit 88ba2e1d0492039ee2cb1caa16160ec24bea3992
Author: Burak Yavuz <[email protected]>
Date:   2016-09-26T05:57:31Z

    [SPARK-17650] malformed url's throw exceptions before bricking Executors
    
    ## What changes were proposed in this pull request?
    
    When a malformed URL was sent to Executors through `sc.addJar` and 
`sc.addFile`, the executors become unusable, because they constantly throw 
`MalformedURLException`s and can never acknowledge that the file or jar is just 
bad input.
    
    This PR tries to fix that problem by making sure MalformedURLs can never be 
submitted through `sc.addJar` and `sc.addFile`. Another solution would be to 
blacklist bad files and jars on Executors. Maybe fail the first time, and then 
ignore the second time (but print a warning message).
    
    ## How was this patch tested?
    
    Unit tests in SparkContextSuite
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #15224 from brkyvz/SPARK-17650.
    
    (cherry picked from commit 59d87d24079bc633e63ce032f0a5ddd18a3b02cb)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit cf5324127856381c40ba952e35bdb99a717163fa
Author: Shixiong Zhu <[email protected]>
Date:   2016-09-26T17:44:35Z

    [SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus
    
    ## What changes were proposed in this pull request?
    
    Log how many Spark events got dropped in LiveListenerBus so that the user 
can get insights on how to set a correct event queue size.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15220 from zsxwing/SPARK-17649.
    
    (cherry picked from commit bde85f8b70138a51052b613664facbc981378c38)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 8a58f2e8ec413591ec00da1e37b91b1bf49e4d1d
Author: Sameer Agarwal <[email protected]>
Date:   2016-09-26T20:21:08Z

    [SPARK-17652] Fix confusing exception message while reserving capacity
    
    ## What changes were proposed in this pull request?
    
    This minor patch fixes a confusing exception message while reserving 
additional capacity in the vectorized parquet reader.
    
    ## How was this patch tested?
    
    Exisiting Unit Tests
    
    Author: Sameer Agarwal <[email protected]>
    
    Closes #15225 from sameeragarwal/error-msg.
    
    (cherry picked from commit 7c7586aef9243081d02ea5065435234b5950ab66)
    Signed-off-by: Yin Huai <[email protected]>

commit f4594900d86bb39358ff19047dfa8c1e4b78aa6b
Author: Andrew Mills <[email protected]>
Date:   2016-09-26T20:41:10Z

    [Docs] Update spark-standalone.md to fix link
    
    Corrected a link to the configuration.html page, it was pointing to a page 
that does not exist (configurations.html).
    
    Documentation change, verified in preview.
    
    Author: Andrew Mills <[email protected]>
    
    Closes #15244 from ammills01/master.
    
    (cherry picked from commit 00be16df642317137f17d2d7d2887c41edac3680)
    Signed-off-by: Andrew Or <[email protected]>

commit 98bbc4410181741d903a703eac289408cb5b2c5e
Author: Josh Rosen <[email protected]>
Date:   2016-09-27T21:14:27Z

    [SPARK-17618] Guard against invalid comparisons between UnsafeRow and other 
formats
    
    This patch ports changes from #15185 to Spark 2.x. In that patch, a  
correctness bug in Spark 1.6.x which was caused by an invalid `equals()` 
comparison between an `UnsafeRow` and another row of a different format. Spark 
2.x is not affected by that specific correctness bug but it can still reap the 
error-prevention benefits of that patch's changes, which modify  
``UnsafeRow.equals()` to throw an IllegalArgumentException if it is called with 
an object that is not an `UnsafeRow`.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #15265 from JoshRosen/SPARK-17618-master.
    
    (cherry picked from commit 2f84a686604b298537bfd4d087b41594d2aa7ec6)
    Signed-off-by: Josh Rosen <[email protected]>

commit 2cd327ef5e4c3f6b8468ebb2352479a1686b7888
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-09-27T23:00:39Z

    [SPARK-17056][CORE] Fix a wrong assert regarding unroll memory in 
MemoryStore
    
    ## What changes were proposed in this pull request?
    
    There is an assert in MemoryStore's putIteratorAsValues method which is 
used to check if unroll memory is not released too much. This assert looks 
wrong.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #14642 from viirya/fix-unroll-memory.
    
    (cherry picked from commit e7bce9e1876de6ee975ccc89351db58119674aef)
    Signed-off-by: Josh Rosen <[email protected]>

commit 1b02f8820ddaf3f2a0e7acc9a7f27afc20683cca
Author: Josh Rosen <[email protected]>
Date:   2016-09-28T07:59:00Z

    [SPARK-17666] Ensure that RecordReaders are closed by data source file 
scans (backport)
    
    This is a branch-2.0 backport of #15245.
    
    ## What changes were proposed in this pull request?
    
    This patch addresses a potential cause of resource leaks in data source 
file scans. As reported in 
[SPARK-17666](https://issues.apache.org/jira/browse/SPARK-17666), tasks which 
do not fully-consume their input may cause file handles / network connections 
(e.g. S3 connections) to be leaked. Spark's `NewHadoopRDD` uses a TaskContext 
callback to [close its record 
readers](https://github.com/apache/spark/blame/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L208),
 but the new data source file scans will only close record readers once their 
iterators are fully-consumed.
    
    This patch modifies `RecordReaderIterator` and `HadoopFileLinesReader` to 
add `close()` methods and modifies all six implementations of 
`FileFormat.buildReader()` to register TaskContext task completion callbacks to 
guarantee that cleanup is eventually performed.
    
    ## How was this patch tested?
    
    Tested manually for now.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #15271 from JoshRosen/SPARK-17666-backport.

commit 4d73d5cd82ebc980f996c78f9afb8a97418ab7ab
Author: hyukjinkwon <[email protected]>
Date:   2016-09-28T10:19:04Z

    [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to fix wrongly indented examples in PySpark documentation
    
    ```
    -        >>> json_sdf = spark.readStream.format("json")\
    -                                       .schema(sdf_schema)\
    -                                       .load(tempfile.mkdtemp())
    +        >>> json_sdf = spark.readStream.format("json") \\
    +        ...     .schema(sdf_schema) \\
    +        ...     .load(tempfile.mkdtemp())
    ```
    
    ```
    -        people.filter(people.age > 30).join(department, people.deptId == 
department.id)\
    +        people.filter(people.age > 30).join(department, people.deptId == 
department.id) \\
    ```
    
    ```
    -        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, 1.23), 
(2, 4.56)])), \
    -                        LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    +        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, 1.23), 
(2, 4.56)])),
    +        ...             LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    ```
    
    ```
    -        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, -1.23), 
(2, 4.56e-7)])), \
    -                        LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    +        >>> examples = [LabeledPoint(1.1, Vectors.sparse(3, [(0, -1.23), 
(2, 4.56e-7)])),
    +        ...             LabeledPoint(0.0, Vectors.dense([1.01, 2.02, 
3.03]))]
    ```
    
    ```
    -        ...      for x in iterator:
    -        ...           print(x)
    +        ...     for x in iterator:
    +        ...          print(x)
    ```
    
    ## How was this patch tested?
    
    Manually tested.
    
    **Before**
    
    ![2016-09-26 8 36 
02](https://cloud.githubusercontent.com/assets/6477701/18834471/05c7a478-8431-11e6-94bb-09aa37b12ddb.png)
    
    ![2016-09-26 9 22 
16](https://cloud.githubusercontent.com/assets/6477701/18834472/06c8735c-8431-11e6-8775-78631eab0411.png)
    
    <img width="601" alt="2016-09-27 2 29 27" 
src="https://cloud.githubusercontent.com/assets/6477701/18861294/29c0d5b4-84bf-11e6-99c5-3c9d913c125d.png";>
    
    <img width="1056" alt="2016-09-27 2 29 58" 
src="https://cloud.githubusercontent.com/assets/6477701/18861298/31694cd8-84bf-11e6-9e61-9888cb8c2089.png";>
    
    <img width="1079" alt="2016-09-27 2 30 05" 
src="https://cloud.githubusercontent.com/assets/6477701/18861301/359722da-84bf-11e6-97f9-5f5365582d14.png";>
    
    **After**
    
    ![2016-09-26 9 29 
47](https://cloud.githubusercontent.com/assets/6477701/18834467/0367f9da-8431-11e6-86d9-a490d3297339.png)
    
    ![2016-09-26 9 30 
24](https://cloud.githubusercontent.com/assets/6477701/18834463/f870fae0-8430-11e6-9482-01fc47898492.png)
    
    <img width="515" alt="2016-09-27 2 28 19" 
src="https://cloud.githubusercontent.com/assets/6477701/18861305/3ff88b88-84bf-11e6-902c-9f725e8a8b10.png";>
    
    <img width="652" alt="2016-09-27 3 50 59" 
src="https://cloud.githubusercontent.com/assets/6477701/18863053/592fbc74-84ca-11e6-8dbf-99cf57947de8.png";>
    
    <img width="709" alt="2016-09-27 3 51 03" 
src="https://cloud.githubusercontent.com/assets/6477701/18863060/601607be-84ca-11e6-80aa-a401df41c321.png";>
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15242 from HyukjinKwon/minor-example-pyspark.
    
    (cherry picked from commit 2190037757a81d3172f75227f7891d968e1f0d90)
    Signed-off-by: Sean Owen <[email protected]>

commit 4c694e452278e46231720e778a80c586b9e565f1
Author: w00228970 <[email protected]>
Date:   2016-09-28T19:02:59Z

    [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch 
failure
    
    | Time        |Thread 1 ,  Job1          | Thread 2 ,  Job2  |
    |:-------------:|:-------------:|:-----:|
    | 1 | abort stage due to FetchFailed |  |
    | 2 | failedStages += failedStage |    |
    | 3 |      |  task failed due to  FetchFailed |
    | 4 |      |  can not post ResubmitFailedStages because failedStages is not 
empty |
    
    Then job2 of thread2 never resubmit the failed stage and hang.
    
    We should not add the failedStages when abortStage for fetch failure
    
    added unit test
    
    Author: w00228970 <[email protected]>
    Author: wangfei <[email protected]>
    
    Closes #15213 from scwf/dag-resubmit.
    
    (cherry picked from commit 46d1203bf2d01b219c4efc7e0e77a844c0c664da)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit d358298f1082edd31489a1b08f428c8e60278d69
Author: Eric Liang <[email protected]>
Date:   2016-09-28T23:19:06Z

    [SPARK-17673][SQL] Incorrect exchange reuse with RowDataSourceScan 
(backport)
    
    This backports https://github.com/apache/spark/pull/15273 to branch-2.0
    
    Also verified the test passes after the patch was applied. rxin
    
    Author: Eric Liang <[email protected]>
    
    Closes #15282 from ericl/spark-17673-2.

commit 0a69477a10adb3969a20ae870436299ef5152788
Author: Herman van Hovell <[email protected]>
Date:   2016-09-28T23:25:10Z

    [SPARK-17641][SQL] Collect_list/Collect_set should not collect null values.
    
    ## What changes were proposed in this pull request?
    We added native versions of `collect_set` and `collect_list` in Spark 2.0. 
These currently also (try to) collect null values, this is different from the 
original Hive implementation. This PR fixes this by adding a null check to the 
`Collect.update` method.
    
    ## How was this patch tested?
    Added a regression test to `DataFrameAggregateSuite`.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #15208 from hvanhovell/SPARK-17641.
    
    (cherry picked from commit 7d09232028967978d9db314ec041a762599f636b)
    Signed-off-by: Reynold Xin <[email protected]>

commit 933d2c1ea4e5f5c4ec8d375b5ccaa4577ba4be38
Author: Patrick Wendell <[email protected]>
Date:   2016-09-28T23:27:45Z

    Preparing Spark release v2.0.1-rc4

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16752: Branch 2.0

Reply via email to