GitHub user kevincox opened a pull request:
https://github.com/apache/spark/pull/12335
[SPARK-11321] [SQL] Python non null udfs
## What changes were proposed in this pull request?
This patch allows Python UDFs to return non-nullable values.
## How was this patch tested?
This was tested by running PySpark jobs.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kevincox/spark python-non-null-udfs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12335.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12335
----
commit 2ddd10486b91619117b0c236c86e4e0f39869cfa
Author: anabranch <[email protected]>
Date: 2015-12-11T20:55:56Z
[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
Adding in Pipeline Import and Export Documentation.
Author: anabranch <[email protected]>
Author: Bill Chambers <[email protected]>
Closes #10179 from anabranch/master.
(cherry picked from commit aa305dcaf5b4148aba9e669e081d0b9235f50857)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit bfcc8cfee7219e63d2f53fc36627f95dc60428eb
Author: Mike Dusenberry <[email protected]>
Date: 2015-12-11T22:21:33Z
[SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type Erasure
Issue
As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our
PySpark `RowMatrix` constructor. As discussed on the dev list
[here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html),
there appears to be an issue with type erasure with RDDs coming from Java, and
by extension from PySpark. Although we are attempting to construct a
`RowMatrix` from an `RDD[Vector]` in
[PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115),
the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when
calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException`
in which an `Object` cannot be cast to a Spark `Vector`. As noted in the
aforementioned dev list thread, this issue was also encountered with
`DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a
`Vector` type. `IndexedRowMatrix` and `CoordinateM
atrix` do not appear to have this issue likely due to their related helper
functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with
pattern matching, thus preserving the types.
This PR currently contains that retagging fix applied to the
`createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks #9441,
so once this is merged, the other can be rebased.
cc holdenk
Author: Mike Dusenberry <[email protected]>
Closes #9458 from
dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.
(cherry picked from commit 1b8220387e6903564f765fabb54be0420c3e99d7)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 75531c77e85073c7be18985a54c623710894d861
Author: BenFradet <[email protected]>
Date: 2015-12-11T23:43:00Z
[SPARK-12217][ML] Document invalid handling for StringIndexer
Added a paragraph regarding StringIndexer#setHandleInvalid to the
ml-features documentation.
I wonder if I should also add a snippet to the code example, input welcome.
Author: BenFradet <[email protected]>
Closes #10257 from BenFradet/SPARK-12217.
(cherry picked from commit aea676ca2d07c72b1a752e9308c961118e5bfc3c)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit c2f20469d5b53a027b022e3c4a9bea57452c5ba6
Author: Yanbo Liang <[email protected]>
Date: 2015-12-12T02:02:24Z
[SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to
dataframe_example.py
Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to
avoid confusion.
#9873 finished the work of Scala example, here we focus on the Python one.
Move dataset_example.py to ```examples/ml``` and rename to
```dataframe_example.py```.
BTW, fix minor missing issues of #9873.
cc mengxr
Author: Yanbo Liang <[email protected]>
Closes #9957 from yanboliang/SPARK-11978.
(cherry picked from commit a0ff6d16ef4bcc1b6ff7282e82a9b345d8449454)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 03d801587936fe92d4e7541711f1f41965e64956
Author: Ankur Dave <[email protected]>
Date: 2015-12-12T03:07:48Z
[SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions
Modifies the String overload to call the Column overload and ensures this
is called in a test.
Author: Ankur Dave <[email protected]>
Closes #10271 from ankurdave/SPARK-12298.
(cherry picked from commit 1e799d617a28cd0eaa8f22d103ea8248c4655ae5)
Signed-off-by: Yin Huai <[email protected]>
commit 47461fea7c079819de6add308f823c7a8294f891
Author: gatorsmile <[email protected]>
Date: 2015-12-12T04:55:16Z
[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test
cases
The existing sample functions miss the parameter `seed`, however, the
corresponding function interface in `generics` has such a parameter. Thus,
although the function caller can call the function with the 'seed', we are not
using the value.
This could cause SparkR unit tests failed. For example, I hit it in another
PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
Author: gatorsmile <[email protected]>
Closes #10160 from gatorsmile/sampleR.
(cherry picked from commit 1e3526c2d3de723225024fedd45753b556e18fc6)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit 2679fce717704bc6e64e726d1b754a6a48148770
Author: Jean-Baptiste Onofré <[email protected]>
Date: 2015-12-12T08:51:52Z
[SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait
in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver
Author: Jean-Baptiste Onofré <[email protected]>
Closes #10203 from jbonofre/SPARK-11193.
(cherry picked from commit 03138b67d3ef7f5278ea9f8b9c75f0e357ef79d8)
Signed-off-by: Sean Owen <[email protected]>
commit e05364baa34cae1d359ebcec1a0a61abf86d464d
Author: Xusen Yin <[email protected]>
Date: 2015-12-13T01:47:01Z
[SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
https://issues.apache.org/jira/browse/SPARK-12199
Follow-up PR of SPARK-11551. Fix some errors in ml-features.md
mengxr
Author: Xusen Yin <[email protected]>
Closes #10193 from yinxusen/SPARK-12199.
(cherry picked from commit 98b212d36b34ab490c391ea2adf5b141e4fb9289)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit d7e3bfd7d33b8fba44ef80932c0d40fb68075cb4
Author: Shixiong Zhu <[email protected]>
Date: 2015-12-13T05:58:55Z
[SPARK-12267][CORE] Store the remote RpcEnv address to send the correct
disconnetion message
Author: Shixiong Zhu <[email protected]>
Closes #10261 from zsxwing/SPARK-12267.
(cherry picked from commit 8af2f8c61ae4a59d129fb3530d0f6e9317f4bff8)
Signed-off-by: Shixiong Zhu <[email protected]>
commit fbf16da2e53acc8678bd1454b0749d1923d4eddf
Author: Shixiong Zhu <[email protected]>
Date: 2015-12-14T06:06:39Z
[SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in
the shutdown hook
1. Make sure workers and masters exit so that no worker or master will
still be running when triggering the shutdown hook.
2. Set ExecutorState to FAILED if it's still RUNNING when executing the
shutdown hook.
This should fix the potential exceptions when exiting a local cluster
```
java.lang.AssertionError: assertion failed: executor 4 state transfer from
RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalStateException: Shutdown hooks cannot be modified during
shutdown.
at
org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
at
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
at
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
at
org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```
Author: Shixiong Zhu <[email protected]>
Closes #10269 from zsxwing/executor-state.
(cherry picked from commit 2aecda284e22ec608992b6221e2f5ffbd51fcd24)
Signed-off-by: Shixiong Zhu <[email protected]>
commit 94ce5025f894f01602732b543bc14901e169cc65
Author: yucai <[email protected]>
Date: 2015-12-14T07:08:21Z
[SPARK-12275][SQL] No plan for BroadcastHint in some condition
When SparkStrategies.BasicOperators's "case BroadcastHint(child) =>
apply(child)" is hit, it only recursively invokes BasicOperators.apply with
this "child". It makes many strategies have no change to process this plan,
which probably leads to "No plan" issue, so we use planLater to go through all
strategies.
https://issues.apache.org/jira/browse/SPARK-12275
Author: yucai <[email protected]>
Closes #10265 from yucai/broadcast_hint.
(cherry picked from commit ed87f6d3b48a85391628c29c43d318c26e2c6de7)
Signed-off-by: Yin Huai <[email protected]>
commit c0f0f6cb0fef6e939744b60fdd4911c718f8fac5
Author: BenFradet <[email protected]>
Date: 2015-12-14T13:50:30Z
[MINOR][DOC] Fix broken word2vec link
Follow-up of
[SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193
where a broken link has been left as is.
Author: BenFradet <[email protected]>
Closes #10282 from BenFradet/SPARK-12199.
(cherry picked from commit e25f1fe42747be71c6b6e6357ca214f9544e3a46)
Signed-off-by: Sean Owen <[email protected]>
commit 352a0c80f4833a97916a75388ef290067c2dbede
Author: Shivaram Venkataraman <[email protected]>
Date: 2015-12-15T00:13:55Z
[SPARK-12327] Disable commented code lintr temporarily
cc yhuai felixcheung shaneknapp
Author: Shivaram Venkataraman <[email protected]>
Closes #10300 from shivaram/comment-lintr-disable.
(cherry picked from commit fb3778de685881df66bf0222b520f94dca99e8c8)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit 23c8846050b307fdfe2307f7e7ca9d0f69f969a9
Author: jerryshao <[email protected]>
Date: 2015-12-15T17:41:40Z
[STREAMING][MINOR] Fix typo in function name of StateImpl
cc\ tdas zsxwing , please review. Thanks a lot.
Author: jerryshao <[email protected]>
Closes #10305 from jerryshao/fix-typo-state-impl.
(cherry picked from commit bc1ff9f4a41401599d3a87fb3c23a2078228a29b)
Signed-off-by: Shixiong Zhu <[email protected]>
commit 80d261718c1157e5cd4b0ac27e36ef919ea65afa
Author: Michael Armbrust <[email protected]>
Date: 2015-12-15T23:03:33Z
Update branch-1.6 for 1.6.0 release
Author: Michael Armbrust <[email protected]>
Closes #10317 from marmbrus/versions.
commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a
Author: Patrick Wendell <[email protected]>
Date: 2015-12-15T23:09:57Z
Preparing Spark release v1.6.0-rc3
commit 08aa3b47e6a295a8297e741effa14cd0d834aea8
Author: Patrick Wendell <[email protected]>
Date: 2015-12-15T23:10:04Z
Preparing development version 1.6.0-SNAPSHOT
commit 9e4ac56452710ddd8efb695e69c8de49317e3f28
Author: tedyu <[email protected]>
Date: 2015-12-16T02:15:10Z
[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling
setConf
This is continuation of SPARK-12056 where change is applied to
SqlNewHadoopRDD.scala
andrewor14
FYI
Author: tedyu <[email protected]>
Closes #10164 from tedyu/master.
(cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d)
Signed-off-by: Andrew Or <[email protected]>
commit 2c324d35a698b353c2193e2f9bd8ba08c741c548
Author: Timothy Chen <[email protected]>
Date: 2015-12-16T02:20:00Z
[SPARK-12351][MESOS] Add documentation about submitting Spark with mesos
cluster mode.
Adding more documentation about submitting jobs with mesos cluster mode.
Author: Timothy Chen <[email protected]>
Closes #10086 from tnachen/mesos_supervise_docs.
(cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f)
Signed-off-by: Andrew Or <[email protected]>
commit 8e9a600313f3047139d3cebef85acc782903123b
Author: Naveen <[email protected]>
Date: 2015-12-16T02:25:22Z
[SPARK-9886][CORE] Fix to use ShutdownHookManager in
ExternalBlockStore.scala
Author: Naveen <[email protected]>
Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
(cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8)
Signed-off-by: Andrew Or <[email protected]>
commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae
Author: Bryan Cutler <[email protected]>
Date: 2015-12-16T02:28:16Z
[SPARK-12062][CORE] Change Master to asyc rebuild UI when application
completes
This change builds the event history of completed apps asynchronously so
the RPC thread will not be blocked and allow new workers to register/remove if
the event log history is very large and takes a long time to rebuild.
Author: Bryan Cutler <[email protected]>
Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
(cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8)
Signed-off-by: Andrew Or <[email protected]>
commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08
Author: Wenchen Fan <[email protected]>
Date: 2015-12-16T02:29:19Z
[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability
Author: Wenchen Fan <[email protected]>
Closes #8645 from cloud-fan/test.
(cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696)
Signed-off-by: Andrew Or <[email protected]>
commit a2d584ed9ab3c073df057bed5314bdf877a47616
Author: Timothy Hunter <[email protected]>
Date: 2015-12-16T18:12:33Z
[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation
This fixes the sidebar, using a pure CSS mechanism to hide it when the
browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).
Note that I am not a CSS expert, so I can only address comments up to some
extent.
Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm"
src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">
When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm"
src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">
Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm"
src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">
Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm"
src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">
Author: Timothy Hunter <[email protected]>
Closes #10297 from thunterdb/12324.
(cherry picked from commit a6325fc401f68d9fa30cc947c44acc9d64ebda7b)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit ac0e2ea7c712e91503b02ae3c12fa2fcf5079886
Author: Yanbo Liang <[email protected]>
Date: 2015-12-16T18:34:30Z
[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR
Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated
```saveAsParquetFile```.
Author: Yanbo Liang <[email protected]>
Closes #10281 from yanboliang/spark-12310.
(cherry picked from commit 22f6cd86fc2e2d6f6ad2c3aae416732c46ebf1b1)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit 16edd933d7323f8b6861409bbd62bc1efe244c14
Author: Yu ISHIKAWA <[email protected]>
Date: 2015-12-16T18:43:45Z
[SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml
cc jkbradley
Author: Yu ISHIKAWA <[email protected]>
Closes #10244 from yu-iskw/SPARK-12215.
(cherry picked from commit 26d70bd2b42617ff731b6e9e6d77933b38597ebe)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit f815127294c06320204d9affa4f35da7ec3a710d
Author: Jeff Zhang <[email protected]>
Date: 2015-12-16T18:32:32Z
[SPARK-12318][SPARKR] Save mode in SparkR should be error by default
shivaram Please help review.
Author: Jeff Zhang <[email protected]>
Closes #10290 from zjffdu/SPARK-12318.
(cherry picked from commit 2eb5af5f0d3c424dc617bb1a18dd0210ea9ba0bc)
Signed-off-by: Shivaram Venkataraman <[email protected]>
commit e5b85713d8a0dbbb1a0a07481f5afa6c5098147f
Author: Timothy Chen <[email protected]>
Date: 2015-12-16T18:54:15Z
[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with
Mesos cluster mode.
SPARK_HOME is now causing problem with Mesos cluster mode since
spark-submit script has been changed recently to take precendence when running
spark-class scripts to look in SPARK_HOME if it's defined.
We should skip passing SPARK_HOME from the Spark client in cluster mode
with Mesos, since Mesos shouldn't use this configuration but should use
spark.executor.home instead.
Author: Timothy Chen <[email protected]>
Closes #10332 from tnachen/scheduler_ui.
(cherry picked from commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490)
Signed-off-by: Andrew Or <[email protected]>
commit e1adf6d7d1c755fb16a0030e66ce9cff348c3de8
Author: Yu ISHIKAWA <[email protected]>
Date: 2015-12-16T18:55:42Z
[SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for
bisecting k-means
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.
Author: Yu ISHIKAWA <[email protected]>
Closes #9952 from yu-iskw/SPARK-6518.
(cherry picked from commit 7b6dc29d0ebbfb3bb941130f8542120b6bc3e234)
Signed-off-by: Joseph K. Bradley <[email protected]>
commit 168c89e07c51fa24b0bb88582c739cec0acb44d7
Author: Patrick Wendell <[email protected]>
Date: 2015-12-16T19:23:41Z
Preparing Spark release v1.6.0-rc3
commit aee88eb55b89bfdc763fd30f7574d2aa7de4bf39
Author: Patrick Wendell <[email protected]>
Date: 2015-12-16T19:23:52Z
Preparing development version 1.6.0-SNAPSHOT
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]