GitHub user paramtatini opened a pull request:
https://github.com/apache/spark/pull/10304
Branch 1.6
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-1.6
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10304.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10304
----
commit ff156a3a660e1730de220b404a61e1bda8b7682e
Author: Wenchen Fan <[email protected]>
Date: 2015-11-20T20:04:42Z
[SPARK-11819][SQL] nice error message for missing encoder
before this PR, when users try to get an encoder for an un-supported class,
they will only get a very simple error message like `Encoder for type xxx is
not supported`.
After this PR, the error message become more friendly, for example:
```
No Encoder found for abc.xyz.NonEncodable
- array element class: "abc.xyz.NonEncodable"
- field (class: "scala.Array", name: "arrayField")
- root class: "abc.xyz.AnotherClass"
```
Author: Wenchen Fan <[email protected]>
Closes #9810 from cloud-fan/error-message.
(cherry picked from commit 3b9d2a347f9c796b90852173d84189834e499e25)
Signed-off-by: Michael Armbrust <[email protected]>
commit 6fc96875460d881f13ec3082c4a2b32144ea45e9
Author: Josh Rosen <[email protected]>
Date: 2015-11-20T21:17:35Z
[SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test
This patch reduces some RPC timeouts in order to speed up the slow
"AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two
minutes to run.
Author: Josh Rosen <[email protected]>
Closes #9869 from JoshRosen/SPARK-11650.
(cherry picked from commit 652def318e47890bd0a0977dc982cc07f99fb06a)
Signed-off-by: Josh Rosen <[email protected]>
commit 9c8e17984d95a8d225525a592a921a5af81e4440
Author: Nong Li <[email protected]>
Date: 2015-11-20T22:19:34Z
[SPARK-11724][SQL] Change casting between int and timestamp to consistently
treat int in seconds.
Hive has since changed this behavior as well.
https://issues.apache.org/jira/browse/HIVE-3454
Author: Nong Li <[email protected]>
Author: Nong Li <[email protected]>
Author: Yin Huai <[email protected]>
Closes #9685 from nongli/spark-11724.
(cherry picked from commit 9ed4ad4265cf9d3135307eb62dae6de0b220fc21)
Signed-off-by: Yin Huai <[email protected]>
commit 0c23dd52d64d4a3448fb7d21b0e40d13f885bcfa
Author: Shixiong Zhu <[email protected]>
Date: 2015-11-20T22:23:01Z
[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in
TransformFunction and TransformFunctionSerializer
TransformFunction and TransformFunctionSerializer don't rethrow the
exception, so when any exception happens, it just return None. This will cause
some weird NPE and confuse people.
Author: Shixiong Zhu <[email protected]>
Closes #9847 from zsxwing/pyspark-streaming-exception.
(cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22)
Signed-off-by: Tathagata Das <[email protected]>
commit fbe6888cc0c8a16531a4ba7ce5235b84474f1a7b
Author: Josh Rosen <[email protected]>
Date: 2015-11-20T22:31:26Z
[SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite
tests
In PersistenceEngineSuite, we do not call `close()` on the
PersistenceEngine at the end of the test. For the ZooKeeperPersistenceEngine,
this causes us to leak a ZooKeeper client, causing the logs of unrelated tests
to be periodically spammed with connection error messages from that client:
```
15/11/20 05:13:35.789
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
INFO ClientCnxn: Opening socket connection to server
localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown
error)
15/11/20 05:13:35.790
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
WARN ClientCnxn: Session 0x15124ff48dd0000 for server null, unexpected error,
closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
```
This patch fixes this by using a `finally` block.
Author: Josh Rosen <[email protected]>
Closes #9864 from JoshRosen/close-zookeeper-client-in-tests.
(cherry picked from commit 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6)
Signed-off-by: Josh Rosen <[email protected]>
commit b9b0e17473e98d3d19b88abaf5ffcfdd6a2a2ea8
Author: Jean-Baptiste Onofré <[email protected]>
Date: 2015-11-20T22:45:40Z
[SPARK-11716][SQL] UDFRegistration just drops the input type when
re-creating the UserDefinedFunction
https://issues.apache.org/jira/browse/SPARK-11716
This is one is #9739 and a regression test. When commit it, please make
sure the author is jbonofre.
You can find the original PR at https://github.com/apache/spark/pull/9739
closes #9739
Author: Jean-Baptiste Onofré <[email protected]>
Author: Yin Huai <[email protected]>
Closes #9868 from yhuai/SPARK-11716.
(cherry picked from commit 03ba56d78f50747710d01c27d409ba2be42ae557)
Signed-off-by: Michael Armbrust <[email protected]>
commit 11a11f0ffcb6d7f6478239cfa3fb5d95877cddab
Author: felixcheung <[email protected]>
Date: 2015-11-20T23:10:55Z
[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help
information for SparkR:::summary correctly
Fix use of aliases and changes uses of rdname and seealso
`aliases` is the hint for `?` - it should not be linked to some other name
- those should be seealso
https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html
Clean up usage on family, as multiple use of family with the same rdname is
causing duplicated See Also html blocks (like
http://spark.apache.org/docs/latest/api/R/count.html)
Also changing some rdname for dplyr-like variant for better R user
visibility in R doc, eg. rbind, summary, mutate, summarize
shivaram yanboliang
Author: felixcheung <[email protected]>
Closes #9750 from felixcheung/rdocaliases.
(cherry picked from commit a6239d587c638691f52eca3eee905c53fbf35a12)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit 0665fb5eae931ee93e320da9fedcfd6649ed004e
Author: Michael Armbrust <[email protected]>
Date: 2015-11-20T23:17:17Z
[SPARK-11636][SQL] Support classes defined in the REPL with Encoders
#theScaryParts (i.e. changes to the repl, executor classloaders and
codegen)...
Author: Michael Armbrust <[email protected]>
Author: Yin Huai <[email protected]>
Closes #9825 from marmbrus/dataset-replClasses2.
(cherry picked from commit 4b84c72dfbb9ddb415fee35f69305b5d7b280891)
Signed-off-by: Michael Armbrust <[email protected]>
commit 1dde97176c799d89ab8ccc991a862d70e74dbee3
Author: Vikas Nelamangala <[email protected]>
Date: 2015-11-20T23:18:41Z
[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md
using include_example
Author: Vikas Nelamangala <[email protected]>
Closes #9689 from vikasnp/master.
(cherry picked from commit ed47b1e660b830e2d4fac8d6df93f634b260393c)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 7437a7f5bd06fd304265ab4e708a97fcd8492839
Author: Nong Li <[email protected]>
Date: 2015-11-20T23:30:53Z
[SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.
This mainly moves SqlNewHadoopRDD to the sql package. There is some state
that is
shared between core and I've left that in core. This allows some other
associated
minor cleanup.
Author: Nong Li <[email protected]>
Closes #9845 from nongli/spark-11787.
(cherry picked from commit 58b4e4f88a330135c4cec04a30d24ef91bc61d91)
Signed-off-by: Reynold Xin <[email protected]>
commit 7e06d51d5637d4f8e042a1a230ee48591d08236f
Author: Michael Armbrust <[email protected]>
Date: 2015-11-20T23:36:30Z
[SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL
In this PR I delete a method that breaks type inference for aggregators
(only in the REPL)
The error when this method is present is:
```
<console>:38: error: missing parameter type for expanded function ((x$2) =>
x$2._2)
ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
```
Author: Michael Armbrust <[email protected]>
Closes #9870 from marmbrus/dataset-repl-agg.
(cherry picked from commit 968acf3bd9a502fcad15df3e53e359695ae702cc)
Signed-off-by: Michael Armbrust <[email protected]>
commit e0bb4e09c7b04bc8926a4c0658fc2c51db8fb04c
Author: Michael Armbrust <[email protected]>
Date: 2015-11-20T23:38:04Z
[SPARK-11890][SQL] Fix compilation for Scala 2.11
Author: Michael Armbrust <[email protected]>
Closes #9871 from marmbrus/scala211-break.
(cherry picked from commit 68ed046836975b492b594967256d3c7951b568a5)
Signed-off-by: Michael Armbrust <[email protected]>
commit 7582425d193d46c3f14b666b551dd42ff54d7ad7
Author: Patrick Wendell <[email protected]>
Date: 2015-11-20T23:43:02Z
Preparing Spark release v1.6.0-preview1
commit d409afdbceb40ea90b1d20656e8ce79bff2ab71f
Author: Patrick Wendell <[email protected]>
Date: 2015-11-20T23:43:08Z
Preparing development version 1.6.0-SNAPSHOT
commit 285e4017a445279d39852cd616f01d1d7f2139dd
Author: Michael Armbrust <[email protected]>
Date: 2015-11-21T00:02:03Z
[HOTFIX] Fix Java Dataset Tests
commit 33d856df53689d7fd515a21ec4f34d1d5c74a958
Author: Xiangrui Meng <[email protected]>
Date: 2015-11-21T00:52:20Z
Revert "[SPARK-11689][ML] Add user guide and example code for LDA under
spark.ml"
This reverts commit 92d3563fd0cf0c3f4fe037b404d172125b24cf2f.
commit 95dfac0dd073680f20ce94a3abb95d12684c8e1d
Author: Wenchen Fan <[email protected]>
Date: 2015-11-21T07:31:19Z
[SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build
seems scala 2.11 doesn't support: define private methods in `trait xxx` and
use it in `object xxx extend xxx`.
Author: Wenchen Fan <[email protected]>
Closes #9879 from cloud-fan/follow.
(cherry picked from commit 7d3f922c4ba76c4193f98234ae662065c39cdfb1)
Signed-off-by: Reynold Xin <[email protected]>
commit 7016b086743a8da9e6eca77c2a94f1e88c5291f6
Author: Reynold Xin <[email protected]>
Date: 2015-11-21T08:10:13Z
[SPARK-11900][SQL] Add since version for all encoders
Author: Reynold Xin <[email protected]>
Closes #9881 from rxin/SPARK-11900.
(cherry picked from commit 54328b6d862fe62ae01bdd87df4798ceb9d506d6)
Signed-off-by: Reynold Xin <[email protected]>
commit 05547183bf653abeffd76d9242c9c05215a455b6
Author: Reynold Xin <[email protected]>
Date: 2015-11-21T08:54:18Z
[SPARK-11901][SQL] API audit for Aggregator.
Author: Reynold Xin <[email protected]>
Closes #9882 from rxin/SPARK-11901.
(cherry picked from commit 596710268e29e8f624c3ba2fade08b66ec7084eb)
Signed-off-by: Reynold Xin <[email protected]>
commit 8c718a577e32d9f91dc4cacd58dab894e366d93d
Author: Reynold Xin <[email protected]>
Date: 2015-11-21T23:00:37Z
[SPARK-11899][SQL] API audit for GroupedDataset.
1. Renamed map to mapGroup, flatMap to flatMapGroup.
2. Renamed asKey -> keyAs.
3. Added more documentation.
4. Changed type parameter T to V on GroupedDataset.
5. Added since versions for all functions.
Author: Reynold Xin <[email protected]>
Closes #9880 from rxin/SPARK-11899.
(cherry picked from commit ff442bbcffd4f93cfcc2f76d160011e725d2fb3f)
Signed-off-by: Reynold Xin <[email protected]>
commit b004a104f62849b393047aa8ea45542c871198e7
Author: Liang-Chi Hsieh <[email protected]>
Date: 2015-11-22T18:36:47Z
[SPARK-11908][SQL] Add NullType support to RowEncoder
JIRA: https://issues.apache.org/jira/browse/SPARK-11908
We should add NullType support to RowEncoder.
Author: Liang-Chi Hsieh <[email protected]>
Closes #9891 from viirya/rowencoder-nulltype.
(cherry picked from commit 426004a9c9a864f90494d08601e6974709091a56)
Signed-off-by: Michael Armbrust <[email protected]>
commit f8369412d22de0fc75b2aab4d72ad298fc30cc6f
Author: Patrick Wendell <[email protected]>
Date: 2015-11-22T18:59:54Z
Preparing Spark release v1.6.0-preview1
commit 9d10ba76fdff22f6172e775a45a07477300dd618
Author: Patrick Wendell <[email protected]>
Date: 2015-11-22T18:59:59Z
Preparing development version 1.6.0-SNAPSHOT
commit 308381420f51b6da1007ea09a02d740613a226e0
Author: Patrick Wendell <[email protected]>
Date: 2015-11-22T19:41:18Z
Preparing Spark release v1.6.0-preview2
commit fc4b88f3bce31184aa43b386f44d699555e17443
Author: Patrick Wendell <[email protected]>
Date: 2015-11-22T19:41:24Z
Preparing development version 1.6.0-SNAPSHOT
commit a36d9bc7528ab8e6fe5e002f9b9b0a51a5b93568
Author: Xiangrui Meng <[email protected]>
Date: 2015-11-23T05:45:46Z
[SPARK-11895][ML] rename and refactor DatasetExample under mllib/examples
We used the name `Dataset` to refer to `SchemaRDD` in 1.2 in ML pipelines
and created this example file. Since `Dataset` has a new meaning in Spark 1.6,
we should rename it to avoid confusion. This PR also removes support for dense
format to simplify the example code.
cc: yinxusen
Author: Xiangrui Meng <[email protected]>
Closes #9873 from mengxr/SPARK-11895.
(cherry picked from commit fe89c1817d668e46adf70d0896c42c22a547c76a)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 835b5488ff644e2f51442943adffd3cd682703ac
Author: Joseph K. Bradley <[email protected]>
Date: 2015-11-23T05:48:48Z
[SPARK-6791][ML] Add read/write for CrossValidator and Evaluators
I believe this works for general estimators within CrossValidator,
including compound estimators. (See the complex unit test.)
Added read/write for all 3 Evaluators as well.
CC: mengxr yanboliang
Author: Joseph K. Bradley <[email protected]>
Closes #9848 from jkbradley/cv-io.
(cherry picked from commit a6fda0bfc16a13b28b1cecc96f1ff91363089144)
Signed-off-by: Xiangrui Meng <[email protected]>
commit 7f9d3358afd7e266c79e9989e4d874cd1183f474
Author: Timothy Hunter <[email protected]>
Date: 2015-11-23T05:51:42Z
[SPARK-11835] Adds a sidebar menu to MLlib's documentation
This PR adds a sidebar menu when browsing the user guide of MLlib. It uses
a YAML file to describe the structure of the documentation. It should be
trivial to adapt this to the other projects.

Author: Timothy Hunter <[email protected]>
Closes #9826 from thunterdb/spark-11835.
(cherry picked from commit fc4b792d287095d70379a51f117c225d8d857078)
Signed-off-by: Xiangrui Meng <[email protected]>
commit d482dced313d1d837508d3f449261419c8543c1d
Author: Yanbo Liang <[email protected]>
Date: 2015-11-23T05:56:07Z
[SPARK-11912][ML] ml.feature.PCA minor refactor
Like [SPARK-11852](https://issues.apache.org/jira/browse/SPARK-11852),
```k``` is params and we should save it under ```metadata/``` rather than both
under ```data/``` and ```metadata/```. Refactor the constructor of
```ml.feature.PCAModel``` to take only ```pc``` but construct
```mllib.feature.PCAModel``` inside ```transform```.
Author: Yanbo Liang <[email protected]>
Closes #9897 from yanboliang/spark-11912.
(cherry picked from commit d9cf9c21fc6b1aa22e68d66760afd42c4e1c18b8)
Signed-off-by: Xiangrui Meng <[email protected]>
commit bad93d9f3a24a7ee024541569c6f3de88aad2fda
Author: BenFradet <[email protected]>
Date: 2015-11-23T06:05:01Z
[SPARK-11902][ML] Unhandled case in VectorAssembler#transform
There is an unhandled case in the transform method of VectorAssembler if
one of the input columns doesn't have one of the supported type DoubleType,
NumericType, BooleanType or VectorUDT.
So, if you try to transform a column of StringType you get a cryptic
"scala.MatchError: StringType".
This PR aims to fix this, throwing a SparkException when dealing with an
unknown column type.
Author: BenFradet <[email protected]>
Closes #9885 from BenFradet/SPARK-11902.
(cherry picked from commit 4be360d4ee6cdb4d06306feca38ddef5212608cf)
Signed-off-by: Xiangrui Meng <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]