[GitHub] spark pull request: [SPARK-11321] [SQL] Python non null udfs

kevincox Tue, 12 Apr 2016 14:12:43 -0700

GitHub user kevincox opened a pull request:

    https://github.com/apache/spark/pull/12335


    [SPARK-11321] [SQL] Python non null udfs

    ## What changes were proposed in this pull request?
    
    This patch allows Python UDFs to return non-nullable values.
    
    ## How was this patch tested?
    
    This was tested by running PySpark jobs.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevincox/spark python-non-null-udfs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12335
    
----
commit 2ddd10486b91619117b0c236c86e4e0f39869cfa
Author: anabranch <[email protected]>
Date:   2015-12-11T20:55:56Z

    [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
    
    Adding in Pipeline Import and Export Documentation.
    
    Author: anabranch <[email protected]>
    Author: Bill Chambers <[email protected]>
    
    Closes #10179 from anabranch/master.
    
    (cherry picked from commit aa305dcaf5b4148aba9e669e081d0b9235f50857)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit bfcc8cfee7219e63d2f53fc36627f95dc60428eb
Author: Mike Dusenberry <[email protected]>
Date:   2015-12-11T22:21:33Z

    [SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type Erasure 
Issue
    
    As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our 
PySpark `RowMatrix` constructor.  As discussed on the dev list 
[here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html),
 there appears to be an issue with type erasure with RDDs coming from Java, and 
by extension from PySpark.  Although we are attempting to construct a 
`RowMatrix` from an `RDD[Vector]` in 
[PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115),
 the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when 
calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` 
in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the 
aforementioned dev list thread, this issue was also encountered with 
`DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a 
`Vector` type.  `IndexedRowMatrix` and `CoordinateM
 atrix` do not appear to have this issue likely due to their related helper 
functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with 
pattern matching, thus preserving the types.
    
    This PR currently contains that retagging fix applied to the 
`createRowMatrix` helper function in `PythonMLlibAPI`.  This PR blocks #9441, 
so once this is merged, the other can be rebased.
    
    cc holdenk
    
    Author: Mike Dusenberry <[email protected]>
    
    Closes #9458 from 
dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.
    
    (cherry picked from commit 1b8220387e6903564f765fabb54be0420c3e99d7)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 75531c77e85073c7be18985a54c623710894d861
Author: BenFradet <[email protected]>
Date:   2015-12-11T23:43:00Z

    [SPARK-12217][ML] Document invalid handling for StringIndexer
    
    Added a paragraph regarding StringIndexer#setHandleInvalid to the 
ml-features documentation.
    
    I wonder if I should also add a snippet to the code example, input welcome.
    
    Author: BenFradet <[email protected]>
    
    Closes #10257 from BenFradet/SPARK-12217.
    
    (cherry picked from commit aea676ca2d07c72b1a752e9308c961118e5bfc3c)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit c2f20469d5b53a027b022e3c4a9bea57452c5ba6
Author: Yanbo Liang <[email protected]>
Date:   2015-12-12T02:02:24Z

    [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to 
dataframe_example.py
    
    Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to 
avoid confusion.
    #9873 finished the work of Scala example, here we focus on the Python one.
    Move dataset_example.py to ```examples/ml``` and rename to 
```dataframe_example.py```.
    BTW, fix minor missing issues of #9873.
    cc mengxr
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #9957 from yanboliang/SPARK-11978.
    
    (cherry picked from commit a0ff6d16ef4bcc1b6ff7282e82a9b345d8449454)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 03d801587936fe92d4e7541711f1f41965e64956
Author: Ankur Dave <[email protected]>
Date:   2015-12-12T03:07:48Z

    [SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions
    
    Modifies the String overload to call the Column overload and ensures this 
is called in a test.
    
    Author: Ankur Dave <[email protected]>
    
    Closes #10271 from ankurdave/SPARK-12298.
    
    (cherry picked from commit 1e799d617a28cd0eaa8f22d103ea8248c4655ae5)
    Signed-off-by: Yin Huai <[email protected]>

commit 47461fea7c079819de6add308f823c7a8294f891
Author: gatorsmile <[email protected]>
Date:   2015-12-12T04:55:16Z

    [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test 
cases
    
    The existing sample functions miss the parameter `seed`, however, the 
corresponding function interface in `generics` has such a parameter. Thus, 
although the function caller can call the function with the 'seed', we are not 
using the value.
    
    This could cause SparkR unit tests failed. For example, I hit it in another 
PR:
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
    
    Author: gatorsmile <[email protected]>
    
    Closes #10160 from gatorsmile/sampleR.
    
    (cherry picked from commit 1e3526c2d3de723225024fedd45753b556e18fc6)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 2679fce717704bc6e64e726d1b754a6a48148770
Author: Jean-Baptiste OnofrÃ© <[email protected]>
Date:   2015-12-12T08:51:52Z

    [SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait 
in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver
    
    Author: Jean-Baptiste OnofrÃ© <[email protected]>
    
    Closes #10203 from jbonofre/SPARK-11193.
    
    (cherry picked from commit 03138b67d3ef7f5278ea9f8b9c75f0e357ef79d8)
    Signed-off-by: Sean Owen <[email protected]>

commit e05364baa34cae1d359ebcec1a0a61abf86d464d
Author: Xusen Yin <[email protected]>
Date:   2015-12-13T01:47:01Z

    [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md
    
    https://issues.apache.org/jira/browse/SPARK-12199
    
    Follow-up PR of SPARK-11551. Fix some errors in ml-features.md
    
    mengxr
    
    Author: Xusen Yin <[email protected]>
    
    Closes #10193 from yinxusen/SPARK-12199.
    
    (cherry picked from commit 98b212d36b34ab490c391ea2adf5b141e4fb9289)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit d7e3bfd7d33b8fba44ef80932c0d40fb68075cb4
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-13T05:58:55Z

    [SPARK-12267][CORE] Store the remote RpcEnv address to send the correct 
disconnetion message
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10261 from zsxwing/SPARK-12267.
    
    (cherry picked from commit 8af2f8c61ae4a59d129fb3530d0f6e9317f4bff8)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit fbf16da2e53acc8678bd1454b0749d1923d4eddf
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-14T06:06:39Z

    [SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in 
the shutdown hook
    
    1. Make sure workers and masters exit so that no worker or master will 
still be running when triggering the shutdown hook.
    2. Set ExecutorState to FAILED if it's still RUNNING when executing the 
shutdown hook.
    
    This should fix the potential exceptions when exiting a local cluster
    ```
    java.lang.AssertionError: assertion failed: executor 4 state transfer from 
RUNNING to RUNNING is illegal
        at scala.Predef$.assert(Predef.scala:179)
        at 
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
    java.lang.IllegalStateException: Shutdown hooks cannot be modified during 
shutdown.
        at 
org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
        at 
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
        at 
org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
        at 
org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
        at 
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    ```
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10269 from zsxwing/executor-state.
    
    (cherry picked from commit 2aecda284e22ec608992b6221e2f5ffbd51fcd24)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 94ce5025f894f01602732b543bc14901e169cc65
Author: yucai <[email protected]>
Date:   2015-12-14T07:08:21Z

    [SPARK-12275][SQL] No plan for BroadcastHint in some condition
    
    When SparkStrategies.BasicOperators's "case BroadcastHint(child) => 
apply(child)" is hit, it only recursively invokes BasicOperators.apply with 
this "child". It makes many strategies have no change to process this plan, 
which probably leads to "No plan" issue, so we use planLater to go through all 
strategies.
    
    https://issues.apache.org/jira/browse/SPARK-12275
    
    Author: yucai <[email protected]>
    
    Closes #10265 from yucai/broadcast_hint.
    
    (cherry picked from commit ed87f6d3b48a85391628c29c43d318c26e2c6de7)
    Signed-off-by: Yin Huai <[email protected]>

commit c0f0f6cb0fef6e939744b60fdd4911c718f8fac5
Author: BenFradet <[email protected]>
Date:   2015-12-14T13:50:30Z

    [MINOR][DOC] Fix broken word2vec link
    
    Follow-up of 
[SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 
where a broken link has been left as is.
    
    Author: BenFradet <[email protected]>
    
    Closes #10282 from BenFradet/SPARK-12199.
    
    (cherry picked from commit e25f1fe42747be71c6b6e6357ca214f9544e3a46)
    Signed-off-by: Sean Owen <[email protected]>

commit 352a0c80f4833a97916a75388ef290067c2dbede
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-12-15T00:13:55Z

    [SPARK-12327] Disable commented code lintr temporarily
    
    cc yhuai felixcheung shaneknapp
    
    Author: Shivaram Venkataraman <[email protected]>
    
    Closes #10300 from shivaram/comment-lintr-disable.
    
    (cherry picked from commit fb3778de685881df66bf0222b520f94dca99e8c8)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 23c8846050b307fdfe2307f7e7ca9d0f69f969a9
Author: jerryshao <[email protected]>
Date:   2015-12-15T17:41:40Z

    [STREAMING][MINOR] Fix typo in function name of StateImpl
    
    cc\ tdas zsxwing , please review. Thanks a lot.
    
    Author: jerryshao <[email protected]>
    
    Closes #10305 from jerryshao/fix-typo-state-impl.
    
    (cherry picked from commit bc1ff9f4a41401599d3a87fb3c23a2078228a29b)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 80d261718c1157e5cd4b0ac27e36ef919ea65afa
Author: Michael Armbrust <[email protected]>
Date:   2015-12-15T23:03:33Z

    Update branch-1.6 for 1.6.0 release
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #10317 from marmbrus/versions.

commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a
Author: Patrick Wendell <[email protected]>
Date:   2015-12-15T23:09:57Z

    Preparing Spark release v1.6.0-rc3

commit 08aa3b47e6a295a8297e741effa14cd0d834aea8
Author: Patrick Wendell <[email protected]>
Date:   2015-12-15T23:10:04Z

    Preparing development version 1.6.0-SNAPSHOT

commit 9e4ac56452710ddd8efb695e69c8de49317e3f28
Author: tedyu <[email protected]>
Date:   2015-12-16T02:15:10Z

    [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling 
setConf
    
    This is continuation of SPARK-12056 where change is applied to 
SqlNewHadoopRDD.scala
    
    andrewor14
    FYI
    
    Author: tedyu <[email protected]>
    
    Closes #10164 from tedyu/master.
    
    (cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d)
    Signed-off-by: Andrew Or <[email protected]>

commit 2c324d35a698b353c2193e2f9bd8ba08c741c548
Author: Timothy Chen <[email protected]>
Date:   2015-12-16T02:20:00Z

    [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos 
cluster mode.
    
    Adding more documentation about submitting jobs with mesos cluster mode.
    
    Author: Timothy Chen <[email protected]>
    
    Closes #10086 from tnachen/mesos_supervise_docs.
    
    (cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f)
    Signed-off-by: Andrew Or <[email protected]>

commit 8e9a600313f3047139d3cebef85acc782903123b
Author: Naveen <[email protected]>
Date:   2015-12-16T02:25:22Z

    [SPARK-9886][CORE] Fix to use ShutdownHookManager in
    
    ExternalBlockStore.scala
    
    Author: Naveen <[email protected]>
    
    Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
    
    (cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8)
    Signed-off-by: Andrew Or <[email protected]>

commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae
Author: Bryan Cutler <[email protected]>
Date:   2015-12-16T02:28:16Z

    [SPARK-12062][CORE] Change Master to asyc rebuild UI when application 
completes
    
    This change builds the event history of completed apps asynchronously so 
the RPC thread will not be blocked and allow new workers to register/remove if 
the event log history is very large and takes a long time to rebuild.
    
    Author: Bryan Cutler <[email protected]>
    
    Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
    
    (cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8)
    Signed-off-by: Andrew Or <[email protected]>

commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08
Author: Wenchen Fan <[email protected]>
Date:   2015-12-16T02:29:19Z

    [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #8645 from cloud-fan/test.
    
    (cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696)
    Signed-off-by: Andrew Or <[email protected]>

commit a2d584ed9ab3c073df057bed5314bdf877a47616
Author: Timothy Hunter <[email protected]>
Date:   2015-12-16T18:12:33Z

    [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation
    
    This fixes the sidebar, using a pure CSS mechanism to hide it when the 
browser's viewport is too narrow.
    Credit goes to the original author Titan-C (mentioned in the NOTICE).
    
    Note that I am not a CSS expert, so I can only address comments up to some 
extent.
    
    Default view:
    <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" 
src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png";>
    
    When collapsed manually by the user:
    <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" 
src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png";>
    
    Disappears when column is too narrow:
    <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" 
src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png";>
    
    Can still be opened by the user if necessary:
    <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" 
src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png";>
    
    Author: Timothy Hunter <[email protected]>
    
    Closes #10297 from thunterdb/12324.
    
    (cherry picked from commit a6325fc401f68d9fa30cc947c44acc9d64ebda7b)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit ac0e2ea7c712e91503b02ae3c12fa2fcf5079886
Author: Yanbo Liang <[email protected]>
Date:   2015-12-16T18:34:30Z

    [SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR
    
    Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated 
```saveAsParquetFile```.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10281 from yanboliang/spark-12310.
    
    (cherry picked from commit 22f6cd86fc2e2d6f6ad2c3aae416732c46ebf1b1)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 16edd933d7323f8b6861409bbd62bc1efe244c14
Author: Yu ISHIKAWA <[email protected]>
Date:   2015-12-16T18:43:45Z

    [SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml
    
    cc jkbradley
    
    Author: Yu ISHIKAWA <[email protected]>
    
    Closes #10244 from yu-iskw/SPARK-12215.
    
    (cherry picked from commit 26d70bd2b42617ff731b6e9e6d77933b38597ebe)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit f815127294c06320204d9affa4f35da7ec3a710d
Author: Jeff Zhang <[email protected]>
Date:   2015-12-16T18:32:32Z

    [SPARK-12318][SPARKR] Save mode in SparkR should be error by default
    
    shivaram  Please help review.
    
    Author: Jeff Zhang <[email protected]>
    
    Closes #10290 from zjffdu/SPARK-12318.
    
    (cherry picked from commit 2eb5af5f0d3c424dc617bb1a18dd0210ea9ba0bc)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit e5b85713d8a0dbbb1a0a07481f5afa6c5098147f
Author: Timothy Chen <[email protected]>
Date:   2015-12-16T18:54:15Z

    [SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with 
Mesos cluster mode.
    
    SPARK_HOME is now causing problem with Mesos cluster mode since 
spark-submit script has been changed recently to take precendence when running 
spark-class scripts to look in SPARK_HOME if it's defined.
    
    We should skip passing SPARK_HOME from the Spark client in cluster mode 
with Mesos, since Mesos shouldn't use this configuration but should use 
spark.executor.home instead.
    
    Author: Timothy Chen <[email protected]>
    
    Closes #10332 from tnachen/scheduler_ui.
    
    (cherry picked from commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490)
    Signed-off-by: Andrew Or <[email protected]>

commit e1adf6d7d1c755fb16a0030e66ce9cff348c3de8
Author: Yu ISHIKAWA <[email protected]>
Date:   2015-12-16T18:55:42Z

    [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for 
bisecting k-means
    
    This PR includes only an example code in order to finish it quickly.
    I'll send another PR for the docs soon.
    
    Author: Yu ISHIKAWA <[email protected]>
    
    Closes #9952 from yu-iskw/SPARK-6518.
    
    (cherry picked from commit 7b6dc29d0ebbfb3bb941130f8542120b6bc3e234)
    Signed-off-by: Joseph K. Bradley <[email protected]>

commit 168c89e07c51fa24b0bb88582c739cec0acb44d7
Author: Patrick Wendell <[email protected]>
Date:   2015-12-16T19:23:41Z

    Preparing Spark release v1.6.0-rc3

commit aee88eb55b89bfdc763fd30f7574d2aa7de4bf39
Author: Patrick Wendell <[email protected]>
Date:   2015-12-16T19:23:52Z

    Preparing development version 1.6.0-SNAPSHOT

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-11321] [SQL] Python non null udfs

Reply via email to