[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-25 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 Thanks all for the discussion and reviews! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92223/ Test PASSed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92223/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92223/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/418/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Seems fine and I am okay with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92077/ Test PASSed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92077/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92077/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4208/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/314/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92066/ Test FAILed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92066/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92066/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4202/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/308/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92048/ Test FAILed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92048/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I updated to use a conf that reverts to column assignment by position, regardless of the the type of column labels in the Pandas DataFrame, and added a test for this. I also put this in a

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4187/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/293/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #92048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92048/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Okay .. but let's make sure this case was special ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-12 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I agree with you @HyukjinKwon but we have had lots of discussion here and are still split, so let's just add the config. There is still the off chance that someone upgrading from 2.3.0

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 2.3.1 wouldn't have this behaviour change and we marked this as experimental. So, on the other hand, it probably will give more time to expose that this is discouraged in production and there

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21427 > But we marked this as experimental ... That's also special for this case, we marked it as experimental in 2.3.1. Not a lot of behavior changes are similar to this one. To

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 > we can't just change the behavior. We think the old behavior doesn't make sense and users should change their code, but users may not think in this way. I think this basically mean we

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Okay, but I get it can be smooth to go ahead. I am okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 But we marked this as experimental. If we treat old API and new experimental API in the same way, I wonder why we have them. One thing I am less clear is, what kind of scenario we are worried

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21427 Pandas UDF is already in 2 releases(2.3.0 and 2.3.1), we can't just change the behavior. We think the old behavior doesn't make sense and users should change their code, but users may not think

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 How about we just start to have configurations classified for each version if it sounds better to have configurations for each behaviour change? For example, we could have postfix like spark_23

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21427 @icexelloss @HyukjinKwon It is always simple to deprecate the confs in the release of Spark 3.0. Let us make it configurable in the next release which is Spark 2.4. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 It's at least not trivial as much as Scaia side's. I am okay but please make sure what case we will allow by this configuration. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21427 if adding the config is trivial, let's add it. We can pick the new behavior by default. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I would prefer if we could do this without a config because while the current behavior can work if the user knows what they are doing, it can also fail very easily and not obviously. So to me

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21427 I think the config switch is for maintaining backward compatibility in case someone is hit with this. so I think it's a good idea. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 yea, I generally agree with that and I feel in the same way. I think I had a talk about this @gatorsmile and @cloud-fan multiple times. Here is my thought: we should not make a configuration

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 @HyukjinKwon I agree with you 99% people will mostly certainly not use the config. I think @gatorsmile 's concern is that in the rare case that some people are actually depending on the existing

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 I'm okay if that's only the way to get through here. but I must say I wonder who's going to intentionally switch this off though. This now sounds more like a bug or a design issue to be fixed

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-08 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 I ran into @ueshin and @gatorsmile at the Summit. It seems the preferable way to move forward is to having a configuration to fall back to the existing behavior and change the default behavior

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-31 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I made https://issues.apache.org/jira/browse/SPARK-2 to improve the documentation of the current behavior, until we can resolve this. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-30 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 @gatorsmile @ueshin Thanks for joining the discussion! I wonder if you agree that at least for case (1) here https://github.com/apache/spark/pull/21427#issuecomment-392070950, we should

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-30 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 Here are some examples that currently work, but would no longer work under the proposed fix. These are all cases where columns are named with strings, but the names do not match the schema (let

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-30 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 Thanks @ueshin and @gatorsmile for taking a look, I agree the proposed fix changes some behavior, but I think that behavior is either error-prone or doesn't make much sense. Let me put up some

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Thing is, we already do similar thing via createDataFrame with list and dictionary. I believe @icexelloss borrowed this idea from there: ``` >>> spark.createDataFrame([["a", 1],

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21427 Please let users decide whether they are resolved by names or by position. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 For configuration, I wasn't sure if we should send the whole configuration map into worker.py side, if we should fix the command writing way, and also was thinking of the current timezone way

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21427 I guess sending configurations is not that difficult. We can write configs (as `Map[String, String]` for further configurations in the future?) before `PythonUDFRunner.writeUDFs(dataOut, funcs,

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Yup, my impression was that there could be a corner case too but I wasn't sure how much the corner case makes sense, and haven't checked it closelt yet. I believe elaborating the case might be

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21427 I'm sorry for the late review, but I think the current fix is still behavior change.. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91275/ Test PASSed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3684/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 @BryanCutler could you also add in the document of grouped_map to explains the behavior? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91167/ Test PASSed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91167/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3595/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 > what will happen if the given schema names are numbers? could we recognise it? The schema names will be integers as strings, so they are treated same as any string. This is only if the

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91167/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91166/ Test FAILed. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91166/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3594/

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91166/testReport)** for PR 21427 at commit

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 SGTM if it works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 I think so: ``` >>> type(pd.DataFrame({'1': [1], '2': [2]}).columns) >>> type(pd.DataFrame([[1, 2.0, "hello"]]).columns) ``` ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 what will happen if the given schema names are numbers? could we recognise it? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 If we can fix it without breaking existing behavior that would be awesome. On Fri, May 25, 2018 at 9:59 AM Bryan Cutler wrote: > I've been thinking about

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I've been thinking about this and came to the same conclusion as @icexelloss here https://github.com/apache/spark/pull/21427#issuecomment-392070950 that we could really support both names and

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I opened https://issues.apache.org/jira/browse/SPARK-24392 to continue the discussion about changing this to experimental. IMO it was a bit of an oversight to not do so initially, but agree

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 btw, we should really mark this as experimental and allow a bit of behaviour changes really. I guess that's what we meat by Experimental:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Yea, that's what I meant. Having configuration should be desiable but I doubt if we should extend that way further. one time thing should be fine too. I roughly guess that's going to be a min

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 @rxin @gatorsmile thanks for joining the discussion! On the configuration side, we have already some mechanism to do so for the "timezone" config:

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 I need to take a look too but sounds possible. WDYT @BryanCutler? BTW, the fix there should be the most appropriate place to fix since that's actually where the problem was started. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 On the config part, I haven’t looked at the code but can’t we just reorder the columns on the JVM side? Why do we need to reorder them on the Python side? On Fri, May 25, 2018 at

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 I believe it was just a mistake to correct - we forget this to mark it experimental. It's pretty unstable and many JIRAs are being open. @BryanCutler mind if I ask to go ahead if you find some

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 I agree it should have started experimental. It is pretty weird to after the fact mark something experimental though. On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 BTW, what do you think about adding a blocker to set this feature as experimental @rxin? I think it's pretty new feature and it should be reasonable to call it experimental. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 because it needs to change the our pickle protocol to access to the configuration if I remember this correctly. cc @ueshin too. ---

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 Why is it difficult? On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon wrote: > but as I said it's difficult to have a configuration there. Shall we just

  1   2   >