[GitHub] [spark] AmplabJenkins removed a comment on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API
AmplabJenkins removed a comment on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API URL: https://github.com/apache/spark/pull/27561#issuecomment-586072578 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API
AmplabJenkins commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API URL: https://github.com/apache/spark/pull/27561#issuecomment-586072585 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23143/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API
AmplabJenkins commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API URL: https://github.com/apache/spark/pull/27561#issuecomment-586072578 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24902: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue
cloud-fan commented on issue #24902: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue URL: https://github.com/apache/spark/pull/24902#issuecomment-586072349 Thanks @marmbrus for pointing out that this is already discussed officially on the mailing list. I'm trying to build a clear document about how to deal with behavior changes and make it more smooth for users to upgrade. But this may take a while as I need to collect many examples. @dongjoon-hyun shall we move forward to revert this kind of cosmetic changes from 3.0 first? I don't want to block 3.0 until I finish the document. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API
SparkQA commented on issue #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API URL: https://github.com/apache/spark/pull/27561#issuecomment-586071737 **[Test build #118386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118386/testReport)** for PR 27561 at commit [`4e9ddf1`](https://github.com/apache/spark/commit/4e9ddf1ecc82a9d43d50c261dadf2be2d2b60395). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liangz1 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
liangz1 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#discussion_r379226694 ## File path: python/pyspark/sql/dataframe.py ## @@ -2153,6 +2153,22 @@ def transform(self, func): "should have been DataFrame." % type(result) return result +@since(3.0) +def sameSemantics(self, other): +""" +Return true when the query plan of the given :class:`DataFrame` will return the same +results as this :class:`DataFrame`. +""" +return self._jdf.sameSemantics(other) Review comment: It becomes the same error as the second one: ``` Traceback (most recent call last): File "", line 1, in File "/Users/liang.zhang/work/repos/apache/spark/python/pyspark/sql/dataframe.py", line 2162, in sameSemantics return self._jdf.sameSemantics(other._jdf) File "/Users/liang.zhang/mypy3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/Users/liang.zhang/work/repos/apache/spark/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/Users/liang.zhang/mypy3/lib/python3.7/site-packages/py4j/protocol.py", line 332, in get_return_value format(target_id, ".", name, value)) py4j.protocol.Py4JError: An error occurred while calling o42.sameSemantics. Trace: py4j.Py4JException: Method sameSemantics([class org.apache.spark.sql.Dataset]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liangz1 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
liangz1 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#discussion_r379226303 ## File path: python/pyspark/sql/dataframe.py ## @@ -2153,6 +2153,22 @@ def transform(self, func): "should have been DataFrame." % type(result) return result +@since(3.0) +def sameSemantics(self, other): +""" +Return true when the query plan of the given :class:`DataFrame` will return the same +results as this :class:`DataFrame`. +""" +return self._jdf.sameSemantics(other) + +@since(3.0) +def semanticHash(self): +""" + +:return: +""" +return self._jdf.semanticHash(None) Review comment: It shows the same error... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 edited a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
WeichenXu123 edited a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069714 @cloud-fan @HyukjinKwon Any comments ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API
HyukjinKwon commented on a change in pull request #27561: [SPARK-30810][SQL] Parses and convert a CSV Dataset having different column from 'value' in csv(dataset) API URL: https://github.com/apache/spark/pull/27561#discussion_r379225857 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ## @@ -33,11 +33,12 @@ object CSVUtils { // with the one below, `filterCommentAndEmpty` but execution path is different. One of them // might have to be removed in the near future if possible. import lines.sqlContext.implicits._ -val nonEmptyLines = lines.filter(length(trim($"value")) > 0) Review comment: @MaxGekk and @cloud-fan, I came up with a better idea to avoid relying on string format in `col`. Can you take a look again? I think this way is safer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
WeichenXu123 commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069714 @cloud-fan Any comments ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
AmplabJenkins removed a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23142/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
AmplabJenkins removed a comment on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069480 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
AmplabJenkins commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069480 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
AmplabJenkins commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586069483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23142/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
WeichenXu123 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#discussion_r379225187 ## File path: python/pyspark/sql/dataframe.py ## @@ -2153,6 +2153,22 @@ def transform(self, func): "should have been DataFrame." % type(result) return result +@since(3.0) +def sameSemantics(self, other): +""" +Return true when the query plan of the given :class:`DataFrame` will return the same +results as this :class:`DataFrame`. +""" +return self._jdf.sameSemantics(other) + +@since(3.0) +def semanticHash(self): +""" + +:return: +""" +return self._jdf.semanticHash(None) Review comment: should be `self._jdf.semanticHash()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
WeichenXu123 commented on a change in pull request #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#discussion_r379225118 ## File path: python/pyspark/sql/dataframe.py ## @@ -2153,6 +2153,22 @@ def transform(self, func): "should have been DataFrame." % type(result) return result +@since(3.0) +def sameSemantics(self, other): +""" +Return true when the query plan of the given :class:`DataFrame` will return the same +results as this :class:`DataFrame`. +""" +return self._jdf.sameSemantics(other) Review comment: should be `self._jdf.sameSemantics(other._jdf)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method
SparkQA commented on issue #27565: [SPARK-30791] Dataframe add sameSemantics and sementicHash method URL: https://github.com/apache/spark/pull/27565#issuecomment-586068918 **[Test build #118385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118385/testReport)** for PR 27565 at commit [`284d7ad`](https://github.com/apache/spark/commit/284d7ad3de0a15a6b6aebf92c7b9e32349607048). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586062873 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586062883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118380/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586062883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118380/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586062873 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
SparkQA removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586019235 **[Test build #118380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118380/testReport)** for PR 26440 at commit [`af55030`](https://github.com/apache/spark/commit/af550303e0b929dc9f7436bcfb36438ff36b8208). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586062402 **[Test build #118380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118380/testReport)** for PR 26440 at commit [`af55030`](https://github.com/apache/spark/commit/af550303e0b929dc9f7436bcfb36438ff36b8208). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #27002: [SPARK-30346][CORE]Improve logging when events dropped
HeartSaVioR commented on issue #27002: [SPARK-30346][CORE]Improve logging when events dropped URL: https://github.com/apache/spark/pull/27002#issuecomment-586061828 @liupc Any update here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame, Stream}{Reader, Writer}`.
HyukjinKwon commented on issue #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame,Stream}{Reader,Writer}`. URL: https://github.com/apache/spark/pull/27553#issuecomment-586061374 Merged to master, branch-3.0 and branch-2.4. Thanks, @DavidToneian for the contribution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame, Stream}{Reader, Writer}`.
HyukjinKwon closed pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame,Stream}{Reader,Writer}`. URL: https://github.com/apache/spark/pull/27553 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame, Stream}{Reader
HyukjinKwon commented on a change in pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame,Stream}{Reader,Writer}`. URL: https://github.com/apache/spark/pull/27553#discussion_r379217246 ## File path: python/pyspark/sql/streaming.py ## @@ -276,9 +276,9 @@ def resetTerminated(self): class DataStreamReader(OptionUtils): """ -Interface used to load a streaming :class:`DataFrame` from external storage systems -(e.g. file systems, key-value stores, etc). Use :func:`spark.readStream` -to access this. +Interface used to load a streaming :class:`DataFrame ` from external Review comment: okie This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame, Stream}{Reader
HyukjinKwon commented on a change in pull request #27553: [PYSPARK][DOCS] [MINOR]Changed `:func:` to `:attr:` Sphinx roles, fixed links in documentation of `Data{Frame,Stream}{Reader,Writer}`. URL: https://github.com/apache/spark/pull/27553#discussion_r379217221 ## File path: python/pyspark/sql/readwriter.py ## @@ -616,7 +616,7 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar class DataFrameWriter(OptionUtils): """ Interface used to write a :class:`DataFrame` to external storage systems -(e.g. file systems, key-value stores, etc). Use :func:`DataFrame.write` +(e.g. file systems, key-value stores, etc). Use :attr:`DataFrame.write` Review comment: Okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586060123 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118384/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
AmplabJenkins removed a comment on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586059906 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586056733 **[Test build #118384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118384/testReport)** for PR 27563 at commit [`165fefa`](https://github.com/apache/spark/commit/165fefad3a6580aa5b82cfc5468deb15d36d0146). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586057489 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23141/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586060117 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
AmplabJenkins commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586060284 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586060123 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118384/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586060117 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586060101 **[Test build #118384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118384/testReport)** for PR 27563 at commit [`165fefa`](https://github.com/apache/spark/commit/165fefad3a6580aa5b82cfc5468deb15d36d0146). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
AmplabJenkins commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586059906 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586059445 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
Ngone51 commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586059347 > Oh, is this backward compatible, @Ngone51 ? I'm wondering all of these are new configurations added at 3.0.0. @dongjoon-hyun We're targeting at configs added in 3.0 only without any backward compatible maintain in this PR. I'll check them again for sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+
HyukjinKwon commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+ URL: https://github.com/apache/spark/pull/26496#discussion_r379215351 ## File path: docs/pyspark-migration-guide.md ## @@ -87,6 +87,8 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - Since Spark 3.0, `Column.getItem` is fixed such that it does not call `Column.apply`. Consequently, if `Column` is used as an argument to `getItem`, the indexing operator should be used. For example, `map_col.getItem(col('id'))` should be replaced with `map_col[col('id')]`. + - As of Spark 3.0 `Row` field names are no longer sorted alphabetically when constructing with named arguments for Python versions 3.6 and above, and the order of fields will match that as entered. To enable sorted fields by default, as in Spark 2.4, set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to "true". For Python versions less than 3.6, the field names will be sorted alphabetically as the only option. Review comment: +1. Let me fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586057810 Jenkins retest please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586057296 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118383/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586057657 I'm reverting this commit because the jenkins test is not triggered. We can merge it again after the test passes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarthfrey opened a new pull request #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
sarthfrey opened a new pull request #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395 ### What changes were proposed in this pull request? The `allGather` method is added to the `BarrierTaskContext`. This method contains the same functionality as the `BarrierTaskContext.barrier` method; it blocks the task until all tasks make the call, at which time they may continue execution. In addition, the `allGather` method takes an input message. Upon returning from the `allGather` the task receives a list of all the messages sent by all the tasks that made the `allGather` call. ### Why are the changes needed? There are many situations where having the tasks communicate in a synchronized way is useful. One simple example is if each task needs to start a server to serve requests from one another; first the tasks must find a free port (the result of which is undetermined beforehand) and then start making requests, but to do so they each must know the port chosen by the other task. An `allGather` method would allow them to inform each other of the port they will run on. ### Does this PR introduce any user-facing change? Yes, an `BarrierTaskContext.allGather` method will be available through the Scala, Java, and Python APIs. ### How was this patch tested? Most of the code path is already covered by tests to the `barrier` method, since this PR includes a refactor so that much code is shared by the `barrier` and `allGather` methods. However, a test is added to assert that an all gather on each tasks partition ID will return a list of every partition ID. An example through the Python API: ```python >>> from pyspark import BarrierTaskContext >>> >>> def f(iterator): ... context = BarrierTaskContext.get() ... return [context.allGather('{}'.format(context.partitionId()))] ... >>> sc.parallelize(range(4), 4).barrier().mapPartitions(f).collect()[0] [u'3', u'1', u'0', u'2'] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
jiangxb1987 commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586057720 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
AmplabJenkins removed a comment on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-580106541 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0
HyukjinKwon commented on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 URL: https://github.com/apache/spark/pull/27529#issuecomment-586057499 Oh, I thought it's single env. If this is the case, I think it's fine to don't backport. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586057489 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23141/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586057292 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586057483 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586057483 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586052972 **[Test build #118383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118383/testReport)** for PR 27495 at commit [`519f47f`](https://github.com/apache/spark/commit/519f47fc7de0f753c6202bada0fa6972111d115f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586057216 **[Test build #118383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118383/testReport)** for PR 27495 at commit [`519f47f`](https://github.com/apache/spark/commit/519f47fc7de0f753c6202bada0fa6972111d115f). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shaneknapp edited a comment on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0
shaneknapp edited a comment on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 URL: https://github.com/apache/spark/pull/27529#issuecomment-586056059 for 2.4, python 3.6.8: ``` -bash-4.1$ python -c "import pandas; import pyarrow; print('pandas: %s' % pandas.__version__); print('pyarrow: %s' % pyarrow.__version__)" pandas: 0.19.2 pyarrow: 0.8.0 ``` for master/3.0, python 3.6.8: ``` -bash-4.1$ python -c "import pandas; import pyarrow; print('pandas: %s' % pandas.__version__); print('pyarrow: %s' % pyarrow.__version__)" pandas: 0.24.2 pyarrow: 0.15.1 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586057296 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118383/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586057292 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586056733 **[Test build #118384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118384/testReport)** for PR 27563 at commit [`165fefa`](https://github.com/apache/spark/commit/165fefad3a6580aa5b82cfc5468deb15d36d0146). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shaneknapp edited a comment on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0
shaneknapp edited a comment on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 URL: https://github.com/apache/spark/pull/27529#issuecomment-586056059 for 2.4, python 3.6.8: ``` -bash-4.1$ python -c "import pandas; import pyarrow; print('pandas: %s' % pandas.__version__); print('pyarrow: %s' % pyarrow.__version__)" pandas: 0.19.2 pyarrow: 0.8.0 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shaneknapp commented on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0
shaneknapp commented on issue #27529: [SPARK-30777][PYTHON][TESTS] Fix test failures for Pandas >= 1.0.0 URL: https://github.com/apache/spark/pull/27529#issuecomment-586056059 for 2.4, python 3.6.8: ''' -bash-4.1$ python -c "import pandas; import pyarrow; print('pandas: %s' % pandas.__version__); print('pyarrow: %s' % pyarrow.__version__)" pandas: 0.19.2 pyarrow: 0.8.0 ''' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
jiangxb1987 commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-586053686 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586053606 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586053538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118382/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586053535 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586053611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586053611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586053606 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586053538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118382/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586053535 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
SparkQA removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586030807 **[Test build #118382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118382/testReport)** for PR 27571 at commit [`9639e14`](https://github.com/apache/spark/commit/9639e144c943b0c87e08d21e7d984595bd08518d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586053288 **[Test build #118382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118382/testReport)** for PR 27571 at commit [`9639e14`](https://github.com/apache/spark/commit/9639e144c943b0c87e08d21e7d984595bd08518d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-586052972 **[Test build #118383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118383/testReport)** for PR 27495 at commit [`519f47f`](https://github.com/apache/spark/commit/519f47fc7de0f753c6202bada0fa6972111d115f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable`
viirya commented on issue #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable` URL: https://github.com/apache/spark/pull/26929#issuecomment-586052696 Looks good and thanks for working on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #27002: [SPARK-30346][CORE]Improve logging when events dropped
jiangxb1987 commented on a change in pull request #27002: [SPARK-30346][CORE]Improve logging when events dropped URL: https://github.com/apache/spark/pull/27002#discussion_r379206377 ## File path: core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala ## @@ -167,20 +170,29 @@ private class AsyncEventQueue( } logTrace(s"Dropping event $event") -val droppedCount = droppedEventsCounter.get +val droppedCount = droppedEventsCounter.get - lastDroppedEventsCounter +val lastReportTime = lastReportTimestamp.get +val curTime = System.currentTimeMillis() if (droppedCount > 0) { // Don't log too frequently - if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) { -// There may be multiple threads trying to decrease droppedEventsCounter. -// Use "compareAndSet" to make sure only one thread can win. -// And if another thread is increasing droppedEventsCounter, "compareAndSet" will fail and -// then that thread will update it. -if (droppedEventsCounter.compareAndSet(droppedCount, 0)) { Review comment: Can we just remove this check and use `droppedEventsCounter.getAndSet()` instead? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #27560: [SPARK-30809][SQL] Review and fix issues in SQL API docs
xuanyuanking commented on issue #27560: [SPARK-30809][SQL] Review and fix issues in SQL API docs URL: https://github.com/apache/spark/pull/27560#issuecomment-586050561 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking edited a comment on issue #27560: [SPARK-30809][SQL] Review and fix issues in SQL API docs
xuanyuanking edited a comment on issue #27560: [SPARK-30809][SQL] Review and fix issues in SQL API docs URL: https://github.com/apache/spark/pull/27560#issuecomment-586050561 cc @cloud-fan @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable`
viirya commented on a change in pull request #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable` URL: https://github.com/apache/spark/pull/26929#discussion_r379204978 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala ## @@ -59,10 +59,30 @@ class InMemoryTable( def rows: Seq[InternalRow] = dataMap.values.flatMap(_.rows).toSeq - private val partFieldNames = partitioning.flatMap(_.references).toSeq.flatMap(_.fieldNames) - private val partIndexes = partFieldNames.map(schema.fieldIndex) + private val partCols: Array[Array[String]] = partitioning.flatMap(_.references).map { ref => +schema.findNestedField(ref.fieldNames(), includeCollections = false) match { + case Some(_) => ref.fieldNames() + case None => throw new IllegalArgumentException(s"${ref.describe()} does not exist.") +} + } - private def getKey(row: InternalRow): Seq[Any] = partIndexes.map(row.toSeq(schema)(_)) + private def getKey(row: InternalRow): Seq[Any] = { +def extractor(fieldNames: Array[String], schema: StructType, row: InternalRow): Any = { + val index = schema.fieldIndex(fieldNames(0)) + val value = row.toSeq(schema).apply(index) + if (fieldNames.length > 1) { +(value, schema(index).dataType) match { + case (row: InternalRow, nestedSchema: StructType) => +extractor(fieldNames.slice(1, fieldNames.length), nestedSchema, row) + case (_, dataType) => +throw new IllegalArgumentException(s"Unsupported type, ${dataType.simpleString}") +} + } else { +value + } +} +partCols.map(filedNames => extractor(filedNames, schema, row)) Review comment: filedNames? fieldNames? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #27002: [SPARK-30346][CORE]Improve logging when events dropped
jiangxb1987 commented on a change in pull request #27002: [SPARK-30346][CORE]Improve logging when events dropped URL: https://github.com/apache/spark/pull/27002#discussion_r379206377 ## File path: core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala ## @@ -167,20 +170,29 @@ private class AsyncEventQueue( } logTrace(s"Dropping event $event") -val droppedCount = droppedEventsCounter.get +val droppedCount = droppedEventsCounter.get - lastDroppedEventsCounter +val lastReportTime = lastReportTimestamp.get +val curTime = System.currentTimeMillis() if (droppedCount > 0) { // Don't log too frequently - if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) { -// There may be multiple threads trying to decrease droppedEventsCounter. -// Use "compareAndSet" to make sure only one thread can win. -// And if another thread is increasing droppedEventsCounter, "compareAndSet" will fail and -// then that thread will update it. -if (droppedEventsCounter.compareAndSet(droppedCount, 0)) { Review comment: Can we just remove this check and use `droppedEventsCounter.getAndSet()` instead? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586040107 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins removed a comment on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586040115 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23137/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586040107 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
AmplabJenkins commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586040115 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23137/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586039212 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23137/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zsxwing commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+
zsxwing commented on a change in pull request #26496: [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for version 3.6+ URL: https://github.com/apache/spark/pull/26496#discussion_r379194091 ## File path: docs/pyspark-migration-guide.md ## @@ -87,6 +87,8 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - Since Spark 3.0, `Column.getItem` is fixed such that it does not call `Column.apply`. Consequently, if `Column` is used as an argument to `getItem`, the indexing operator should be used. For example, `map_col.getItem(col('id'))` should be replaced with `map_col[col('id')]`. + - As of Spark 3.0 `Row` field names are no longer sorted alphabetically when constructing with named arguments for Python versions 3.6 and above, and the order of fields will match that as entered. To enable sorted fields by default, as in Spark 2.4, set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to "true". For Python versions less than 3.6, the field names will be sorted alphabetically as the only option. Review comment: nit: Could we mention that this must be set for all processes? For example, `set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to "true" for **executors and driver**. This env must be consistent on all executors and driver. Any inconsistency may cause failures or incorrect answers ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mengxr commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
mengxr commented on issue #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395#issuecomment-586035904 Merged into master and branch-3.0. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
asfgit closed pull request #27395: [SPARK-30667][CORE] Add allGather method to BarrierTaskContext URL: https://github.com/apache/spark/pull/27395 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #26971: [SPARK-30320][SQL] Fix insert overwrite to DataSource table with dynamic partition error
jiangxb1987 commented on a change in pull request #26971: [SPARK-30320][SQL] Fix insert overwrite to DataSource table with dynamic partition error URL: https://github.com/apache/spark/pull/26971#discussion_r379190648 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1521,4 +1521,10 @@ package object config { .bytesConf(ByteUnit.BYTE) .createOptional + private[spark] val MAX_LOCAL_TASK_FAILURES = ConfigBuilder("spark.task.local.maxFailures") +.doc("The max failure times for a task while SparkContext running in Local mode, " + Review comment: How could you launch speculative task when running under local mode? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support
SparkQA commented on issue #26440: [SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support URL: https://github.com/apache/spark/pull/26440#issuecomment-586033832 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23137/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27222: [SPARK-30519][CORE] Use spark.executorEnv.HADOOP_USER_NAME to set executor's hadoop user
jiangxb1987 commented on issue #27222: [SPARK-30519][CORE] Use spark.executorEnv.HADOOP_USER_NAME to set executor's hadoop user URL: https://github.com/apache/spark/pull/27222#issuecomment-586033002 This is a behavior change, and I don't think we should introduce this change by the approach proposed this way. If both `SPARK_USER` and `HADOOP_USER_NAME` are specified in the env, we should prioritize `SPARK_USER` over `HADOOP_USER_NAME`. Also, I'm hesitate whether we should support `HADOOP_USER_NAME` at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on issue #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable`
rdblue commented on issue #26929: [SPARK-30289][SQL] Partitioned by Nested Column for `InMemoryTable` URL: https://github.com/apache/spark/pull/26929#issuecomment-586032685 +1 Thanks for updating tests, @dbtsai. This looks good to me and it's great to have cases for partitioning by nested fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586031351 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins removed a comment on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586031345 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586031351 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
AmplabJenkins commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586031345 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #27050: [SPARK-30388][Core] Mark running map stages of finished job as finished, and cancel running tasks
jiangxb1987 commented on a change in pull request #27050: [SPARK-30388][Core] Mark running map stages of finished job as finished, and cancel running tasks URL: https://github.com/apache/spark/pull/27050#discussion_r379186506 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2003,7 +2000,7 @@ private[spark] class DAGScheduler( if (runningStages.contains(stage)) { try { // cancelTasks will fail if a SchedulerBackend does not implement killTask taskScheduler.cancelTasks(stageId, shouldInterruptTaskThread(job)) - markStageAsFinished(stage, Some(failureReason)) + markStageAsFinished(stage, reason) Review comment: If the reason is None, the `completionTime` of the stage would be updated, thus it's not the time that the stage has actually succeed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR
SparkQA commented on issue #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR URL: https://github.com/apache/spark/pull/27571#issuecomment-586030807 **[Test build #118382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118382/testReport)** for PR 27571 at commit [`9639e14`](https://github.com/apache/spark/commit/9639e144c943b0c87e08d21e7d984595bd08518d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #27050: [SPARK-30388][Core] Mark running map stages of finished job as finished, and cancel running tasks
jiangxb1987 commented on a change in pull request #27050: [SPARK-30388][Core] Mark running map stages of finished job as finished, and cancel running tasks URL: https://github.com/apache/spark/pull/27050#discussion_r379184969 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -2003,7 +2000,7 @@ private[spark] class DAGScheduler( if (runningStages.contains(stage)) { Review comment: Please also update the comment above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #27419: [WIP][SPARK-30694][SHUFFLE]If exception occured while fetching blocks by ExternalBlockClient, fail early when External Shuffle Service is not al
jiangxb1987 commented on issue #27419: [WIP][SPARK-30694][SHUFFLE]If exception occured while fetching blocks by ExternalBlockClient, fail early when External Shuffle Service is not alive URL: https://github.com/apache/spark/pull/27419#issuecomment-586026054 If you skip retry, how can you know it's not transient network problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] marmbrus commented on issue #24902: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue
marmbrus commented on issue #24902: [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue URL: https://github.com/apache/spark/pull/24902#issuecomment-586025104 Reordering function parameters to match another system, for a method that is otherwise working correctly, sounds exactly like a cosmetic change to me. And as I pointed out, this has been discussed officially on the mailing list. I gave one example, but I can assure you this is not the only one. Don't read just his specific example, but rather also understand the motivation he gives. The Spark project always has been concerned about unnecessary pain being inflicted on users during an upgrade. He encourages us to think about "the tradeoff in terms of creating an update barrier for existing users". I'm also not saying we should *never* silently change behavior. However, in general, silent behavior changes are a big red flag to me. I think they are extra costly to users for the reasons listed above. I believe @cloud-fan is working on collecting a bunch of examples so that he can propose a framework on the mailing list to make sure we evaluate these cases consistently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #26984: [SPARK-27641][CORE] Fix MetricsSystem to remove metrics of the specific source by exactly match
jiangxb1987 commented on issue #26984: [SPARK-27641][CORE] Fix MetricsSystem to remove metrics of the specific source by exactly match URL: https://github.com/apache/spark/pull/26984#issuecomment-586024740 What's the behavior change you are proposing? Your new test case passes under current code base. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org