[GitHub] spark pull request: Branch 1.6

paramtatini Mon, 14 Dec 2015 22:17:10 -0800

GitHub user paramtatini opened a pull request:

    https://github.com/apache/spark/pull/10304


    Branch 1.6

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10304
    
----
commit ff156a3a660e1730de220b404a61e1bda8b7682e
Author: Wenchen Fan <[email protected]>
Date:   2015-11-20T20:04:42Z

    [SPARK-11819][SQL] nice error message for missing encoder
    
    before this PR, when users try to get an encoder for an un-supported class, 
they will only get a very simple error message like `Encoder for type xxx is 
not supported`.
    
    After this PR, the error message become more friendly, for example:
    ```
    No Encoder found for abc.xyz.NonEncodable
    - array element class: "abc.xyz.NonEncodable"
    - field (class: "scala.Array", name: "arrayField")
    - root class: "abc.xyz.AnotherClass"
    ```
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #9810 from cloud-fan/error-message.
    
    (cherry picked from commit 3b9d2a347f9c796b90852173d84189834e499e25)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 6fc96875460d881f13ec3082c4a2b32144ea45e9
Author: Josh Rosen <[email protected]>
Date:   2015-11-20T21:17:35Z

    [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test
    
    This patch reduces some RPC timeouts in order to speed up the slow 
"AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two 
minutes to run.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #9869 from JoshRosen/SPARK-11650.
    
    (cherry picked from commit 652def318e47890bd0a0977dc982cc07f99fb06a)
    Signed-off-by: Josh Rosen <[email protected]>

commit 9c8e17984d95a8d225525a592a921a5af81e4440
Author: Nong Li <[email protected]>
Date:   2015-11-20T22:19:34Z

    [SPARK-11724][SQL] Change casting between int and timestamp to consistently 
treat int in seconds.
    
    Hive has since changed this behavior as well. 
https://issues.apache.org/jira/browse/HIVE-3454
    
    Author: Nong Li <[email protected]>
    Author: Nong Li <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #9685 from nongli/spark-11724.
    
    (cherry picked from commit 9ed4ad4265cf9d3135307eb62dae6de0b220fc21)
    Signed-off-by: Yin Huai <[email protected]>

commit 0c23dd52d64d4a3448fb7d21b0e40d13f885bcfa
Author: Shixiong Zhu <[email protected]>
Date:   2015-11-20T22:23:01Z

    [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in 
TransformFunction and TransformFunctionSerializer
    
    TransformFunction and TransformFunctionSerializer don't rethrow the 
exception, so when any exception happens, it just return None. This will cause 
some weird NPE and confuse people.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #9847 from zsxwing/pyspark-streaming-exception.
    
    (cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22)
    Signed-off-by: Tathagata Das <[email protected]>

commit fbe6888cc0c8a16531a4ba7ce5235b84474f1a7b
Author: Josh Rosen <[email protected]>
Date:   2015-11-20T22:31:26Z

    [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite 
tests
    
    In PersistenceEngineSuite, we do not call `close()` on the 
PersistenceEngine at the end of the test. For the ZooKeeperPersistenceEngine, 
this causes us to leak a ZooKeeper client, causing the logs of unrelated tests 
to be periodically spammed with connection error messages from that client:
    
    ```
    15/11/20 05:13:35.789 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 INFO ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown 
error)
    15/11/20 05:13:35.790 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 WARN ClientCnxn: Session 0x15124ff48dd0000 for server null, unexpected error, 
closing socket connection and attempting reconnect
    java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
    ```
    
    This patch fixes this by using a `finally` block.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #9864 from JoshRosen/close-zookeeper-client-in-tests.
    
    (cherry picked from commit 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6)
    Signed-off-by: Josh Rosen <[email protected]>

commit b9b0e17473e98d3d19b88abaf5ffcfdd6a2a2ea8
Author: Jean-Baptiste OnofrÃ© <[email protected]>
Date:   2015-11-20T22:45:40Z

    [SPARK-11716][SQL] UDFRegistration just drops the input type when 
re-creating the UserDefinedFunction
    
    https://issues.apache.org/jira/browse/SPARK-11716
    
    This is one is #9739 and a regression test. When commit it, please make 
sure the author is jbonofre.
    
    You can find the original PR at https://github.com/apache/spark/pull/9739
    
    closes #9739
    
    Author: Jean-Baptiste OnofrÃ© <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #9868 from yhuai/SPARK-11716.
    
    (cherry picked from commit 03ba56d78f50747710d01c27d409ba2be42ae557)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 11a11f0ffcb6d7f6478239cfa3fb5d95877cddab
Author: felixcheung <[email protected]>
Date:   2015-11-20T23:10:55Z

    [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help 
information for SparkR:::summary correctly
    
    Fix use of aliases and changes uses of rdname and seealso
    `aliases` is the hint for `?` - it should not be linked to some other name 
- those should be seealso
    https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html
    
    Clean up usage on family, as multiple use of family with the same rdname is 
causing duplicated See Also html blocks (like 
http://spark.apache.org/docs/latest/api/R/count.html)
    Also changing some rdname for dplyr-like variant for better R user 
visibility in R doc, eg. rbind, summary, mutate, summarize
    
    shivaram yanboliang
    
    Author: felixcheung <[email protected]>
    
    Closes #9750 from felixcheung/rdocaliases.
    
    (cherry picked from commit a6239d587c638691f52eca3eee905c53fbf35a12)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 0665fb5eae931ee93e320da9fedcfd6649ed004e
Author: Michael Armbrust <[email protected]>
Date:   2015-11-20T23:17:17Z

    [SPARK-11636][SQL] Support classes defined in the REPL with Encoders
    
    #theScaryParts (i.e. changes to the repl, executor classloaders and 
codegen)...
    
    Author: Michael Armbrust <[email protected]>
    Author: Yin Huai <[email protected]>
    
    Closes #9825 from marmbrus/dataset-replClasses2.
    
    (cherry picked from commit 4b84c72dfbb9ddb415fee35f69305b5d7b280891)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 1dde97176c799d89ab8ccc991a862d70e74dbee3
Author: Vikas Nelamangala <[email protected]>
Date:   2015-11-20T23:18:41Z

    [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md 
using include_example
    
    Author: Vikas Nelamangala <[email protected]>
    
    Closes #9689 from vikasnp/master.
    
    (cherry picked from commit ed47b1e660b830e2d4fac8d6df93f634b260393c)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 7437a7f5bd06fd304265ab4e708a97fcd8492839
Author: Nong Li <[email protected]>
Date:   2015-11-20T23:30:53Z

    [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.
    
    This mainly moves SqlNewHadoopRDD to the sql package. There is some state 
that is
    shared between core and I've left that in core. This allows some other 
associated
    minor cleanup.
    
    Author: Nong Li <[email protected]>
    
    Closes #9845 from nongli/spark-11787.
    
    (cherry picked from commit 58b4e4f88a330135c4cec04a30d24ef91bc61d91)
    Signed-off-by: Reynold Xin <[email protected]>

commit 7e06d51d5637d4f8e042a1a230ee48591d08236f
Author: Michael Armbrust <[email protected]>
Date:   2015-11-20T23:36:30Z

    [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL
    
    In this PR I delete a method that breaks type inference for aggregators 
(only in the REPL)
    
    The error when this method is present is:
    ```
    <console>:38: error: missing parameter type for expanded function ((x$2) => 
x$2._2)
                  ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
    ```
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #9870 from marmbrus/dataset-repl-agg.
    
    (cherry picked from commit 968acf3bd9a502fcad15df3e53e359695ae702cc)
    Signed-off-by: Michael Armbrust <[email protected]>

commit e0bb4e09c7b04bc8926a4c0658fc2c51db8fb04c
Author: Michael Armbrust <[email protected]>
Date:   2015-11-20T23:38:04Z

    [SPARK-11890][SQL] Fix compilation for Scala 2.11
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #9871 from marmbrus/scala211-break.
    
    (cherry picked from commit 68ed046836975b492b594967256d3c7951b568a5)
    Signed-off-by: Michael Armbrust <[email protected]>

commit 7582425d193d46c3f14b666b551dd42ff54d7ad7
Author: Patrick Wendell <[email protected]>
Date:   2015-11-20T23:43:02Z

    Preparing Spark release v1.6.0-preview1

commit d409afdbceb40ea90b1d20656e8ce79bff2ab71f
Author: Patrick Wendell <[email protected]>
Date:   2015-11-20T23:43:08Z

    Preparing development version 1.6.0-SNAPSHOT

commit 285e4017a445279d39852cd616f01d1d7f2139dd
Author: Michael Armbrust <[email protected]>
Date:   2015-11-21T00:02:03Z

    [HOTFIX] Fix Java Dataset Tests

commit 33d856df53689d7fd515a21ec4f34d1d5c74a958
Author: Xiangrui Meng <[email protected]>
Date:   2015-11-21T00:52:20Z

    Revert "[SPARK-11689][ML] Add user guide and example code for LDA under 
spark.ml"
    
    This reverts commit 92d3563fd0cf0c3f4fe037b404d172125b24cf2f.

commit 95dfac0dd073680f20ce94a3abb95d12684c8e1d
Author: Wenchen Fan <[email protected]>
Date:   2015-11-21T07:31:19Z

    [SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build
    
    seems scala 2.11 doesn't support: define private methods in `trait xxx` and 
use it in `object xxx extend xxx`.
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #9879 from cloud-fan/follow.
    
    (cherry picked from commit 7d3f922c4ba76c4193f98234ae662065c39cdfb1)
    Signed-off-by: Reynold Xin <[email protected]>

commit 7016b086743a8da9e6eca77c2a94f1e88c5291f6
Author: Reynold Xin <[email protected]>
Date:   2015-11-21T08:10:13Z

    [SPARK-11900][SQL] Add since version for all encoders
    
    Author: Reynold Xin <[email protected]>
    
    Closes #9881 from rxin/SPARK-11900.
    
    (cherry picked from commit 54328b6d862fe62ae01bdd87df4798ceb9d506d6)
    Signed-off-by: Reynold Xin <[email protected]>

commit 05547183bf653abeffd76d9242c9c05215a455b6
Author: Reynold Xin <[email protected]>
Date:   2015-11-21T08:54:18Z

    [SPARK-11901][SQL] API audit for Aggregator.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #9882 from rxin/SPARK-11901.
    
    (cherry picked from commit 596710268e29e8f624c3ba2fade08b66ec7084eb)
    Signed-off-by: Reynold Xin <[email protected]>

commit 8c718a577e32d9f91dc4cacd58dab894e366d93d
Author: Reynold Xin <[email protected]>
Date:   2015-11-21T23:00:37Z

    [SPARK-11899][SQL] API audit for GroupedDataset.
    
    1. Renamed map to mapGroup, flatMap to flatMapGroup.
    2. Renamed asKey -> keyAs.
    3. Added more documentation.
    4. Changed type parameter T to V on GroupedDataset.
    5. Added since versions for all functions.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #9880 from rxin/SPARK-11899.
    
    (cherry picked from commit ff442bbcffd4f93cfcc2f76d160011e725d2fb3f)
    Signed-off-by: Reynold Xin <[email protected]>

commit b004a104f62849b393047aa8ea45542c871198e7
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-11-22T18:36:47Z

    [SPARK-11908][SQL] Add NullType support to RowEncoder
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-11908
    
    We should add NullType support to RowEncoder.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #9891 from viirya/rowencoder-nulltype.
    
    (cherry picked from commit 426004a9c9a864f90494d08601e6974709091a56)
    Signed-off-by: Michael Armbrust <[email protected]>

commit f8369412d22de0fc75b2aab4d72ad298fc30cc6f
Author: Patrick Wendell <[email protected]>
Date:   2015-11-22T18:59:54Z

    Preparing Spark release v1.6.0-preview1

commit 9d10ba76fdff22f6172e775a45a07477300dd618
Author: Patrick Wendell <[email protected]>
Date:   2015-11-22T18:59:59Z

    Preparing development version 1.6.0-SNAPSHOT

commit 308381420f51b6da1007ea09a02d740613a226e0
Author: Patrick Wendell <[email protected]>
Date:   2015-11-22T19:41:18Z

    Preparing Spark release v1.6.0-preview2

commit fc4b88f3bce31184aa43b386f44d699555e17443
Author: Patrick Wendell <[email protected]>
Date:   2015-11-22T19:41:24Z

    Preparing development version 1.6.0-SNAPSHOT

commit a36d9bc7528ab8e6fe5e002f9b9b0a51a5b93568
Author: Xiangrui Meng <[email protected]>
Date:   2015-11-23T05:45:46Z

    [SPARK-11895][ML] rename and refactor DatasetExample under mllib/examples
    
    We used the name `Dataset` to refer to `SchemaRDD` in 1.2 in ML pipelines 
and created this example file. Since `Dataset` has a new meaning in Spark 1.6, 
we should rename it to avoid confusion. This PR also removes support for dense 
format to simplify the example code.
    
    cc: yinxusen
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #9873 from mengxr/SPARK-11895.
    
    (cherry picked from commit fe89c1817d668e46adf70d0896c42c22a547c76a)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 835b5488ff644e2f51442943adffd3cd682703ac
Author: Joseph K. Bradley <[email protected]>
Date:   2015-11-23T05:48:48Z

    [SPARK-6791][ML] Add read/write for CrossValidator and Evaluators
    
    I believe this works for general estimators within CrossValidator, 
including compound estimators.  (See the complex unit test.)
    
    Added read/write for all 3 Evaluators as well.
    
    CC: mengxr yanboliang
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #9848 from jkbradley/cv-io.
    
    (cherry picked from commit a6fda0bfc16a13b28b1cecc96f1ff91363089144)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 7f9d3358afd7e266c79e9989e4d874cd1183f474
Author: Timothy Hunter <[email protected]>
Date:   2015-11-23T05:51:42Z

    [SPARK-11835] Adds a sidebar menu to MLlib's documentation
    
    This PR adds a sidebar menu when browsing the user guide of MLlib. It uses 
a YAML file to describe the structure of the documentation. It should be 
trivial to adapt this to the other projects.
    
    ![screen shot 2015-11-18 at 4 46 12 
pm](https://cloud.githubusercontent.com/assets/7594753/11259591/a55173f4-8e17-11e5-9340-0aed79d66262.png)
    
    Author: Timothy Hunter <[email protected]>
    
    Closes #9826 from thunterdb/spark-11835.
    
    (cherry picked from commit fc4b792d287095d70379a51f117c225d8d857078)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit d482dced313d1d837508d3f449261419c8543c1d
Author: Yanbo Liang <[email protected]>
Date:   2015-11-23T05:56:07Z

    [SPARK-11912][ML] ml.feature.PCA minor refactor
    
    Like [SPARK-11852](https://issues.apache.org/jira/browse/SPARK-11852), 
```k``` is params and we should save it under ```metadata/``` rather than both 
under ```data/``` and ```metadata/```. Refactor the constructor of 
```ml.feature.PCAModel```  to take only ```pc``` but construct 
```mllib.feature.PCAModel``` inside ```transform```.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #9897 from yanboliang/spark-11912.
    
    (cherry picked from commit d9cf9c21fc6b1aa22e68d66760afd42c4e1c18b8)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit bad93d9f3a24a7ee024541569c6f3de88aad2fda
Author: BenFradet <[email protected]>
Date:   2015-11-23T06:05:01Z

    [SPARK-11902][ML] Unhandled case in VectorAssembler#transform
    
    There is an unhandled case in the transform method of VectorAssembler if 
one of the input columns doesn't have one of the supported type DoubleType, 
NumericType, BooleanType or VectorUDT.
    
    So, if you try to transform a column of StringType you get a cryptic 
"scala.MatchError: StringType".
    
    This PR aims to fix this, throwing a SparkException when dealing with an 
unknown column type.
    
    Author: BenFradet <[email protected]>
    
    Closes #9885 from BenFradet/SPARK-11902.
    
    (cherry picked from commit 4be360d4ee6cdb4d06306feca38ddef5212608cf)
    Signed-off-by: Xiangrui Meng <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Branch 1.6

Reply via email to