[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 Thanks @HyukjinKwon @BryanCutler for the review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 LGTM. Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93686/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93686/testReport)** for PR 21650 at commit [`8e995e8`](https://github.com/apache/spark/commit/8e995e81542852ff4af43883db79cdfbe9aca1ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 retest please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93688/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93688/testReport)** for PR 21650 at commit [`f3a45a5`](https://github.com/apache/spark/commit/f3a45a576b6a186f3694e6bd0f22a8198a9d19a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93688/testReport)** for PR 21650 at commit [`f3a45a5`](https://github.com/apache/spark/commit/f3a45a576b6a186f3694e6bd0f22a8198a9d19a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1422/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93686/testReport)** for PR 21650 at commit [`8e995e8`](https://github.com/apache/spark/commit/8e995e81542852ff4af43883db79cdfbe9aca1ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1421/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93667/testReport)** for PR 21650 at commit [`6b22fea`](https://github.com/apache/spark/commit/6b22fea5b42b40d2eb92d931e76d183518533717). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93668/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93668/testReport)** for PR 21650 at commit [`b25936d`](https://github.com/apache/spark/commit/b25936d4c5216904f0ca3cf33df4b5c7130aa8f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1402/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1403/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93668/testReport)** for PR 21650 at commit [`b25936d`](https://github.com/apache/spark/commit/b25936d4c5216904f0ca3cf33df4b5c7130aa8f8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @BryanCutler @HyukjinKwon I updated the PR based on Bryan's suggestion. Please take a look and let me know if you have further comments. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93667/testReport)** for PR 21650 at commit [`6b22fea`](https://github.com/apache/spark/commit/6b22fea5b42b40d2eb92d931e76d183518533717). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @HyukjinKwon I think Bryan's imple looks promising. Please let me take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 Hm, then how about giving a try in a followup @BryanCutler if you see some values on it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21650 >ehh .. @BryanCutler, WDYT about just doing the previous one for now? The approach you suggested sounds efficient of course but.. here's not a hot path so I think the previous way is fine too .. since that's a bit cleaner (but a bit less efficient), and partly the code freeze is close I didn't make the suggestion for performance, it was because looking at the previous code took me a while before I realized the intent was to find the first evaluable udf then all others matching that eval type. I think the previous code kind of masked that and made it more complicated to follow. I wasn't really sure how the expression tree was evaluated, so my suggestion didn't handle chained expressions. The problem was the eval type was being set when checking the children nodes, instead it should only be set after all children are determined to be the same type. I'll update the above code again, which passes all tests, as far as I can tell. I still prefer this approach, but I'm not a sql expert ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 ehh .. @BryanCutler, WDYT about just doing the previous one for now? The approach you suggested sounds efficient of course but.. here's not a hot path so I think the previous way is fine too .. since that's a bit cleaner (but a bit less efficient), and partly the code freeze is close. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93546/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93546/testReport)** for PR 21650 at commit [`2bc906d`](https://github.com/apache/spark/commit/2bc906de5a12dcc452e6855aa30d27021c446e17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93546/testReport)** for PR 21650 at commit [`2bc906d`](https://github.com/apache/spark/commit/2bc906de5a12dcc452e6855aa30d27021c446e17). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1312/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21650 I'm okay with https://github.com/apache/spark/pull/21650#issuecomment-407506043's way too but should be really simplified. Either way LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @BryanCutler Thanks for taking a look at this! Yeah I think this works too. Let me update the code and try it. Thanks again! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21650 I gave it a shot to extract the UDFs in one traversal, using the first occurrence of either pandas or batch udf. I think it's much clearer ```scala object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper { private class FirstEvalType() { var evalType = -1 def isEvalTypeSet(): Boolean = evalType >= 0 } private def canEvaluateInPython(e: PythonUDF, firstEvalType: FirstEvalType): Boolean = { if (firstEvalType.isEvalTypeSet() && e.evalType != firstEvalType.evalType) { false } else { firstEvalType.evalType = e.evalType e.children match { // single PythonUDF child could be chained and evaluated in Python case Seq(u: PythonUDF) => canEvaluateInPython(u, firstEvalType) // Python UDF can't be evaluated directly in JVM case children => !children.exists(hasScalarPythonUDF) } } } private def collectEvaluableUDFs(expr: Expression, firstEvalType: FirstEvalType): Seq[PythonUDF] = expr match { case udf: PythonUDF if PythonUDF.isScalarPythonUDF(udf) && canEvaluateInPython(udf, firstEvalType) => Seq(udf) case e => e.children.flatMap(collectEvaluableUDFs(_, firstEvalType)) } private def extract(plan: SparkPlan): SparkPlan = { val udfs = plan.expressions.flatMap(collectEvaluableUDFs(_, new FirstEvalType)) ... ``` This does pass around a mutable object, but I guess you could do about the same using an Option that gets returned, but that might not look as nice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @BryanCutler I've address most of you comments and explained the ones that I didn't change. Do you mind take another look? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93450/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93450/testReport)** for PR 21650 at commit [`78f2ebf`](https://github.com/apache/spark/commit/78f2ebf3b11fe8849fe0d41300f74319ca174d42). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93451/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93451/testReport)** for PR 21650 at commit [`4c9c007`](https://github.com/apache/spark/commit/4c9c007858aef65c2c190b35673404dd61279369). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1239/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93451/testReport)** for PR 21650 at commit [`4c9c007`](https://github.com/apache/spark/commit/4c9c007858aef65c2c190b35673404dd61279369). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1238/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #93450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93450/testReport)** for PR 21650 at commit [`78f2ebf`](https://github.com/apache/spark/commit/78f2ebf3b11fe8849fe0d41300f74319ca174d42). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21650 ping @BryanCutler Any update about this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21650 I think the previous behavior was to not allow mixing pandas and regular udfs, but you're probably right that there are some cases where data could be handled differently. I'll try to look at this more in depth today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @BryanCutler I think your suggestion would change the behavior. Using ArrowEvalExec and BatchEvalExec are still different when it comes to corner cases, for example, type coercion (ArrowEvalExec supports type coercion but BatchEvalExec doesn't) and timestamp type (regular UDF expects Python datetime for timestamp and pandas UDF expects pd.Timestamp) I think this is probably a good future improvement but not great for this Jira because of the behavior change. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21650 I had an idea of a slightly different approach.. Would it be possible to "promote" the regular `udf` to a `pandas_udf`? By this I mean wrap the function using `apply()` so that it takes pd.Series as inputs and returns another pd.Series. Then we can send the entire mix of `udf`s and `pandas_udf`s to the worker in one shot, instead of separate evaluations. Since the user is already are using `pandas_udf`s we know that the worker supports it and I think the performance would be much better. Is there any downside or issues with doing it this way? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92482/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92482/testReport)** for PR 21650 at commit [`ce5e7f5`](https://github.com/apache/spark/commit/ce5e7f53cff3c5657fe2e99f2f2a57176d009cce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/589/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92482/testReport)** for PR 21650 at commit [`ce5e7f5`](https://github.com/apache/spark/commit/ce5e7f53cff3c5657fe2e99f2f2a57176d009cce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92443/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92443/testReport)** for PR 21650 at commit [`674e361`](https://github.com/apache/spark/commit/674e36136911839df00635eff8abb3c405e537d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/557/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92443/testReport)** for PR 21650 at commit [`674e361`](https://github.com/apache/spark/commit/674e36136911839df00635eff8abb3c405e537d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21650 Would you mind changing cast (1) in your description? It threw me off a little as they looked independent at first glance. Maybe something like: ``` df = spark.range(0, 1).toDF('v') \ .withColumn('foo', f1(df['v'])) \ .withColumn('bar', f2(df['v'])) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @viirya I have added the query plan output. @maropu I updated the PR title. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org