[GitHub] [spark] AmplabJenkins removed a comment on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5
AmplabJenkins removed a comment on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736#issuecomment-496794227 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5
AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736#issuecomment-496794643 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5
AmplabJenkins removed a comment on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736#issuecomment-496794099 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5
AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736#issuecomment-496794227 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5
AmplabJenkins commented on issue #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736#issuecomment-496794099 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on issue #24729: [SPARK-27862][Build] Move to json4s 3.6.5
igreenfield commented on issue #24729: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24729#issuecomment-496793245 Created also for master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield opened a new pull request #24736: [SPARK-27862][Build] Move to json4s 3.6.5
igreenfield opened a new pull request #24736: [SPARK-27862][Build] Move to json4s 3.6.5 URL: https://github.com/apache/spark/pull/24736 Add scala-xml 1.2.0 What changes were proposed in this pull request? Move to json4s version 3.6.5 How was this patch tested? run: build/mvn clean package This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #24727: [SPARK-27867][ML] RegressionEvaluator cache lastest RegressionMetrics to avoid duplicated computation
zhengruifeng commented on issue #24727: [SPARK-27867][ML] RegressionEvaluator cache lastest RegressionMetrics to avoid duplicated computation URL: https://github.com/apache/spark/pull/24727#issuecomment-496792285 @srowen But current `Evaluator` do not expose method to obtain the metrics other that set by `metricName`. If we want two metrics, we have to compute twice. or we modify `Evaluator` to support `setMetricNames` and return an array? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496790279 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496790283 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105889/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496790279 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496790283 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105889/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
SparkQA commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496789836 **[Test build #105889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105889/testReport)** for PR 24643 at commit [`7cc4a92`](https://github.com/apache/spark/commit/7cc4a92b5d7ebbf421421f1c5d8da0bd0e671a49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
SparkQA removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766421 **[Test build #105889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105889/testReport)** for PR 24643 at commit [`7cc4a92`](https://github.com/apache/spark/commit/7cc4a92b5d7ebbf421421f1c5d8da0bd0e671a49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496787849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105890/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496787847 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496787849 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105890/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496787847 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
SparkQA commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496787757 **[Test build #105890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105890/testReport)** for PR 24721 at commit [`4258665`](https://github.com/apache/spark/commit/425866578f8f18c861e64666e2b454376b6594fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
SparkQA removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773836 **[Test build #105890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105890/testReport)** for PR 24721 at commit [`4258665`](https://github.com/apache/spark/commit/425866578f8f18c861e64666e2b454376b6594fc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
cloud-fan edited a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781526 cc @ueshin @viirya @rednaxelafx @gatorsmile @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
SparkQA commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496783581 **[Test build #105892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105892/testReport)** for PR 24719 at commit [`509761b`](https://github.com/apache/spark/commit/509761b4473611d393598e545a5fe14daaab5326). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
AmplabJenkins removed a comment on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496783301 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11148/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
AmplabJenkins commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496783299 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
AmplabJenkins commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496783301 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11148/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
AmplabJenkins removed a comment on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496783299 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase
gengliangwang commented on issue #24719: [SPARK-27849][SQL] Redact treeString of FileTable and DataSourceV2ScanExecBase URL: https://github.com/apache/spark/pull/24719#issuecomment-496782905 @dongjoon-hyun I think the test suites are much better now. Please review it again, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496782716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105886/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496782715 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496782715 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496782716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105886/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
AmplabJenkins removed a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781889 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496759896 **[Test build #105886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105886/testReport)** for PR 24671 at commit [`f23c1b7`](https://github.com/apache/spark/commit/f23c1b70b9dcf8e4dce43e5dc217ea8822ddfae3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
AmplabJenkins removed a comment on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11147/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. URL: https://github.com/apache/spark/pull/24671#issuecomment-496782380 **[Test build #105886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105886/testReport)** for PR 24671 at commit [`f23c1b7`](https://github.com/apache/spark/commit/f23c1b70b9dcf8e4dce43e5dc217ea8822ddfae3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
SparkQA commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496782273 **[Test build #105891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105891/testReport)** for PR 24735 at commit [`55677c0`](https://github.com/apache/spark/commit/55677c04eb7e0782efc756073e5bf85b59b1aa1a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #24372: [SPARK-27462][SQL] Enhance insert into hive table that could choose some columns in target table flexibly.
gatorsmile commented on issue #24372: [SPARK-27462][SQL] Enhance insert into hive table that could choose some columns in target table flexibly. URL: https://github.com/apache/spark/pull/24372#issuecomment-496782130 We are trying our best to reduce coupling with Hive. Having a native support of Default in schema specification in Spark is what we need. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
AmplabJenkins commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11147/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
AmplabJenkins commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781889 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
cloud-fan commented on issue #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#issuecomment-496781526 cc @ueshin @viirya @rednaxelafx @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
cloud-fan opened a new pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735 ## What changes were proposed in this pull request? For simplicity, all `LambdaVariable`s are globally unique, to avoid any potential conflicts. However, this causes a perf problem: we can never hit codegen cache for encoder expressions that deal with collections (which means they contain `LambdaVariable`). To overcome this problem, `LambdaVariable` should have per-query unique IDs. This PR does 2 things: 1. refactor `LambdaVariable` to carry an ID, so that it's easier to change the ID. 2. add an optimizer rule to reassign `LambdaVariable` IDs, which are per-query unique. ## How was this patch tested? new tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24730: [SPARK-27835][Core] Resource Scheduling: change driver config from addresses
felixcheung commented on a change in pull request #24730: [SPARK-27835][Core] Resource Scheduling: change driver config from addresses URL: https://github.com/apache/spark/pull/24730#discussion_r288389257 ## File path: core/src/main/scala/org/apache/spark/ResourceDiscoverer.scala ## @@ -132,4 +132,20 @@ private[spark] object ResourceDiscoverer extends Logging { } } } + + def parseAllocatedFromJsonFile(resourcesFile: String): Map[String, ResourceInformation] = { +implicit val formats = DefaultFormats +// case class to make json4s parsing easy +case class JsonResourceInformation(val name: String, val addresses: Array[String]) +val resourceInput = new BufferedInputStream(new FileInputStream(resourcesFile)) +val resources = try { + parse(resourceInput).extract[Seq[JsonResourceInformation]] +} catch { + case e@(_: MappingException | _: MismatchedInputException | _: ClassCastException) => +throw new SparkException(s"Exception parsing the resources in $resourcesFile", e) +} finally { + resourceInput.close() +} +resources.map(r => (r.name, new ResourceInformation(r.name, r.addresses))).toMap Review comment: could `Seq[JsonResourceInformation]` contain duplicated name? might be (very marginally) better to do `resource.toMap.map(...)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
cloud-fan commented on issue #22571: [SPARK-25392][Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page URL: https://github.com/apache/spark/pull/22571#issuecomment-496779646 I don't know this part well, and I have no idea why this problem occurs. Would be great to see more details/analysis. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24732: [SPARK-27868][core] Better default value and documentation for socket server backlog.
felixcheung commented on a change in pull request #24732: [SPARK-27868][core] Better default value and documentation for socket server backlog. URL: https://github.com/apache/spark/pull/24732#discussion_r288388516 ## File path: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java ## @@ -108,8 +108,8 @@ public int numConnectionsPerPeer() { return conf.getInt(SPARK_NETWORK_IO_NUMCONNECTIONSPERPEER_KEY, 1); } - /** Requested maximum length of the queue of incoming connections. Default -1 for no backlog. */ - public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, -1); } + /** Requested maximum length of the queue of incoming connections. Default is 64. */ + public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, 64); } Review comment: what's the different between setting to -1 or to 64 as a default? does this change any existing behavior? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24709: [SPARK-27841][SQL] Improve UTF8String to/fromString()/numBytesForFirstByte() performance
felixcheung commented on a change in pull request #24709: [SPARK-27841][SQL] Improve UTF8String to/fromString()/numBytesForFirstByte() performance URL: https://github.com/apache/spark/pull/24709#discussion_r288388247 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ## @@ -1217,6 +1264,20 @@ public boolean toByte(IntWrapper intWrapper) { @Override public String toString() { +byte[] bytes = getBytes(); +// Optimization for ASCII characters: use deprecated string API which +// skips charset encoder and simply casts each byte into a char. +if (isAscii(bytes)) { + return new String(bytes, 0); Review comment: it does have to check all bytes though? isn't this grow linearly to the length of bytes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-49684 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
SparkQA removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766406 **[Test build #105888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105888/testReport)** for PR 24717 at commit [`7d83ac3`](https://github.com/apache/spark/commit/7d83ac39df2a45fca2990e855802055f60804cc6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-49686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105888/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-49684 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-49686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105888/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496777624 **[Test build #105888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105888/testReport)** for PR 24717 at commit [`7d83ac3`](https://github.com/apache/spark/commit/7d83ac39df2a45fca2990e855802055f60804cc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
felixcheung commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496776866 > Yes, you are right, this is the same issue as `toLocalIterator` in #24070 and needs to be fixed. This is a real problem for branch-2.4 which, like you said, could cause `toPandas` to return a partial result without raising the error. @HyukjinKwon do you think would it make sense to patch branch-2.4 with a manual fix? this sounds important... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations
viirya commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations URL: https://github.com/apache/spark/pull/24700#issuecomment-496775395 `spark.sql.execution.arrow.pyspark.enabled` looks slightly better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
SparkQA commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773836 **[Test build #105890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105890/testReport)** for PR 24721 at commit [`4258665`](https://github.com/apache/spark/commit/425866578f8f18c861e64666e2b454376b6594fc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11146/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins removed a comment on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table
AmplabJenkins commented on issue #24721: [SPARK-27856][SQL] do not forcibly add cast when inserting table URL: https://github.com/apache/spark/pull/24721#issuecomment-496773587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11146/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lu-wang-dl closed pull request #24705: [SPARK-22340][PYTHON] Save localProperties in thread.local
lu-wang-dl closed pull request #24705: [SPARK-22340][PYTHON] Save localProperties in thread.local URL: https://github.com/apache/spark/pull/24705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lu-wang-dl commented on issue #24705: [SPARK-22340][PYTHON] Save localProperties in thread.local
lu-wang-dl commented on issue #24705: [SPARK-22340][PYTHON] Save localProperties in thread.local URL: https://github.com/apache/spark/pull/24705#issuecomment-496772401 Close this PR now. We will design this more carefully. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496767946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496767951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105884/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496767951 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105884/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496767946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
SparkQA removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496737022 **[Test build #105884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105884/testReport)** for PR 24689 at commit [`cecbea0`](https://github.com/apache/spark/commit/cecbea0c83931779572058a07c203697a0e74ceb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
SparkQA commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-496767668 **[Test build #105884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105884/testReport)** for PR 24689 at commit [`cecbea0`](https://github.com/apache/spark/commit/cecbea0c83931779572058a07c203697a0e74ceb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Analyzer(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496767315 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105883/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496767308 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496767308 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496767315 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105883/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496735671 **[Test build #105883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105883/testReport)** for PR 24677 at commit [`4f57b7d`](https://github.com/apache/spark/commit/4f57b7d8e6950990566b6de867cdc2039644b574). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496766966 **[Test build #105883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105883/testReport)** for PR 24677 at commit [`4f57b7d`](https://github.com/apache/spark/commit/4f57b7d8e6950990566b6de867cdc2039644b574). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
WeichenXu123 commented on a change in pull request #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#discussion_r288377915 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala ## @@ -86,28 +86,11 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], resultAttrs: Seq[Attribute] sessionLocalTimeZone, pythonRunnerConf).compute(batchIter, context.partitionId(), context) -new Iterator[InternalRow] { - - private var currentIter = if (columnarBatchIter.hasNext) { -val batch = columnarBatchIter.next() -val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType()) -assert(outputTypes == actualDataTypes, "Invalid schema from pandas_udf: " + - s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}") -batch.rowIterator.asScala - } else { -Iterator.empty - } - - override def hasNext: Boolean = currentIter.hasNext || { -if (columnarBatchIter.hasNext) { - currentIter = columnarBatchIter.next().rowIterator.asScala - hasNext -} else { - false -} - } - - override def next(): InternalRow = currentIter.next() +columnarBatchIter.flatMap { batch => + val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType()) + assert(outputTypes == actualDataTypes, "Invalid schema from pandas_udf: " + +s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}") + batch.rowIterator.asScala Review comment: Some explanation here: The master impl has an issue, the member "private var currentIter = ..." will be computed when create the returned iterator. Computing the `currentIter` require to read the first element of `columnarBatchIter`. But note that this block code being called from `EvalPythonExec.doExecute`, we should not read iterator here, we can only generate iterator. Reading iterator should start after the total iterator pipeline constructed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-496766054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11143/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-496766048 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
SparkQA commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-496766399 **[Test build #105887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105887/testReport)** for PR 24734 at commit [`4d58419`](https://github.com/apache/spark/commit/4d58419df523ebc5427287a4c84bfb28968ad32c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766406 **[Test build #105888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105888/testReport)** for PR 24717 at commit [`7d83ac3`](https://github.com/apache/spark/commit/7d83ac39df2a45fca2990e855802055f60804cc6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
SparkQA commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766421 **[Test build #105889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105889/testReport)** for PR 24643 at commit [`7cc4a92`](https://github.com/apache/spark/commit/7cc4a92b5d7ebbf421421f1c5d8da0bd0e671a49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496766139 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105882/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496766126 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766077 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766046 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766046 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496766126 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496766139 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105882/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766077 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-496766048 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
AmplabJenkins removed a comment on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496766080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11144/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-496766054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11143/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics URL: https://github.com/apache/spark/pull/24717#issuecomment-496766052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11144/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496734231 **[Test build #105882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105882/testReport)** for PR 24677 at commit [`4f57b7d`](https://github.com/apache/spark/commit/4f57b7d8e6950990566b6de867cdc2039644b574). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496765790 **[Test build #105882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105882/testReport)** for PR 24677 at commit [`4f57b7d`](https://github.com/apache/spark/commit/4f57b7d8e6950990566b6de867cdc2039644b574). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series
WeichenXu123 commented on issue #24643: [SPARK-26412][PySpark][SQL][WIP] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series URL: https://github.com/apache/spark/pull/24643#issuecomment-496765673 I split the "add per batch flush" changing into a new PR https://github.com/apache/spark/pull/24734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed
viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed URL: https://github.com/apache/spark/pull/24699#discussion_r288376133 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -992,12 +992,20 @@ private[spark] class BlockManager( /** * Release a lock on the given block with explicit TID. Review comment: `with explicit TaskContext` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed
viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed URL: https://github.com/apache/spark/pull/24699#discussion_r288374574 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -992,12 +992,20 @@ private[spark] class BlockManager( /** * Release a lock on the given block with explicit TID. - * The param `taskAttemptId` should be passed in case we can't get the correct TID from - * TaskContext, for example, the input iterator of a cached RDD iterates to the end in a child + * The param `taskContext` should be passed in case we can't get the correct TaskContext + * for example, the input iterator of a cached RDD iterates to the end in a child * thread. */ - def releaseLock(blockId: BlockId, taskAttemptId: Option[Long] = None): Unit = { -blockInfoManager.unlock(blockId, taskAttemptId) + def releaseLock(blockId: BlockId, taskContext: Option[TaskContext] = None): Unit = { +val taskAttemptId = taskContext.map(_.taskAttemptId()) +// SPARK-27666. Child thread spawned from task thread could produce race condition +// on block lock releasing. We should prevent child thread from releasing un-locked +// block when task thread has already finished. +if (taskContext.isDefined && taskContext.map(_.isCompleted()).get) { + logWarning(s"Task $taskAttemptId already completed, not releasing lock for $blockId") Review comment: `${taskAttemptId.get}` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed
viirya commented on a change in pull request #24699: [SPARK-27666][CORE] Do not release lock while TaskContext already completed URL: https://github.com/apache/spark/pull/24699#discussion_r288374379 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -992,12 +992,20 @@ private[spark] class BlockManager( /** * Release a lock on the given block with explicit TID. - * The param `taskAttemptId` should be passed in case we can't get the correct TID from - * TaskContext, for example, the input iterator of a cached RDD iterates to the end in a child + * The param `taskContext` should be passed in case we can't get the correct TaskContext + * for example, the input iterator of a cached RDD iterates to the end in a child Review comment: nit: a missing `,` before `for example`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 opened a new pull request #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline)
WeichenXu123 opened a new pull request #24734: [SPARK-27870][SQL][PySpark] Flush each batch for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734 ## What changes were proposed in this pull request? Flush each batch for pandas UDF. This could improve performance when multiple pandas UDF plans are pipelined. When batch being flushed in time, downstream pandas UDFs will get pipelined as soon as possible, and pipeline will help hide the donwstream UDFs computation time. For example: When the first UDF start computing on batch-3, the second pipelined UDF can start computing on batch-2, and the third pipelined UDF can start computing on batch-1. If we do not flush each batch in time, the donwstream UDF's pipeline will lag behind too much, which may increase the total processing time. ## How was this patch tested? N/A Please review https://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling
AmplabJenkins removed a comment on issue #24374: [SPARK-27366][CORE] Support GPU Resources in Spark job scheduling URL: https://github.com/apache/spark/pull/24374#issuecomment-496760966 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105885/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org