[GitHub] [spark] MaxGekk commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
MaxGekk commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613241629 @cloud-fan This is just changes in comments. It can be merged if the build succeed? I just want to avoid rebasing of my other PRs that might conflicts with this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407884363 ## File path: docs/configuration.md ## @@ -2939,6 +2939,10 @@ To specify a different configuration directory other than the default "SPARK_HOM you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. +# Overriding classpath default directory + +To specify a different classpath directory other than the default "SPARK_HOME/jars" you can set SPARK_JARS_DIR. Any of the services under "SPARK_HOME/sbin" folder will use it. Review comment: Like the PR title have `bin` and `sbin`, this PR changes the standalone ExternalShuffleService like the following. We had better mention that. ``` bin/spark-class org.apache.spark.deploy.ExternalShuffleService ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407884363 ## File path: docs/configuration.md ## @@ -2939,6 +2939,10 @@ To specify a different configuration directory other than the default "SPARK_HOM you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. +# Overriding classpath default directory + +To specify a different classpath directory other than the default "SPARK_HOME/jars" you can set SPARK_JARS_DIR. Any of the services under "SPARK_HOME/sbin" folder will use it. Review comment: Like the PR title have `bin` and `sbin`, this PR changes the standalone ExternalShuffleService like the following. We had better mention that together. ``` bin/spark-class org.apache.spark.deploy.ExternalShuffleService ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407883142 ## File path: launcher/src/test/java/org/apache/spark/launcher/CommandBuilderUtilsSuite.java ## @@ -99,6 +99,11 @@ public void testJavaMajorVersion() { assertEquals(10, javaMajorVersion("10")); } + @Test + public void testGetJarsDir() { +assertEquals("/spark/home/jars", getJarsDir("/spark/home")); Review comment: Hi, @marblejenka . This seems to add a half of the test coverage. Since this PR's key contribution is to introduce `ENV_SPARK_JARS_DIR`, shall we have a test coverage for `ENV_SPARK_JARS_DIR`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407881833 ## File path: docs/configuration.md ## @@ -2939,6 +2939,10 @@ To specify a different configuration directory other than the default "SPARK_HOM you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. +# Overriding classpath default directory + +To specify a different classpath directory other than the default "SPARK_HOME/jars" you can set SPARK_JARS_DIR. Any of the services under "SPARK_HOME/sbin" folder will use it. Review comment: Why do we limit to only `sbin` here? Is there a reason? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407881833 ## File path: docs/configuration.md ## @@ -2939,6 +2939,10 @@ To specify a different configuration directory other than the default "SPARK_HOM you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. +# Overriding classpath default directory + +To specify a different classpath directory other than the default "SPARK_HOME/jars" you can set SPARK_JARS_DIR. Any of the services under "SPARK_HOME/sbin" folder will use it. Review comment: Why do we limit to only `sbin` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#discussion_r407881093 ## File path: docs/configuration.md ## @@ -2939,6 +2939,10 @@ To specify a different configuration directory other than the default "SPARK_HOM you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. +# Overriding classpath default directory + +To specify a different classpath directory other than the default "SPARK_HOME/jars" you can set SPARK_JARS_DIR. Any of the services under "SPARK_HOME/sbin" folder will use it. Review comment: nit, `" you` -> `", you` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
dongjoon-hyun commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613237532 Thank you for pinging me, @maropu . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28192: [SPARK-31420][WEBUI] Infinite timeline redraw in job details page
dongjoon-hyun commented on issue #28192: [SPARK-31420][WEBUI] Infinite timeline redraw in job details page URL: https://github.com/apache/spark/pull/28192#issuecomment-613237036 +1 for @gengliangwang 's opinion for backporting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
cloud-fan edited a comment on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#issuecomment-613236857 Sorry I didn't run tests when backporting as it only has a very small conflict. I've fixed it in branch-3.0 after local tests pass. Thanks for catching @dongjoon-hyun ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#issuecomment-613236857 Sorry I didn't run tests when backporting as it only has a very small conflict. I've fixed it in branch-3.0 after local tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts
dongjoon-hyun edited a comment on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts URL: https://github.com/apache/spark/pull/27364#issuecomment-613236056 If you are using non-root, that's good. The above link shows NFS mount is not working in case of non-root users. So, I assumed that you might use root to avoid that. I also don't want to have assumption like a root user. - https://discuss.kubernetes.io/t/nfs-mount-is-not-working/6458 Then, could you describe how to verify your PR in the following section of the PR description by example with `Minikube`? ``` ## How was this patch tested? Test cases were added just like the existing EmptyDir support. Also the code were ran against my Kubernetes and NFS using pyspark. ``` As you know, Apache Spark is using Minikube as the official test environment instead of GCE or EKS. Also, without additional configuration like SecurityContext, this PR doesn't work in the vanilla `Minikube`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts
dongjoon-hyun commented on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts URL: https://github.com/apache/spark/pull/27364#issuecomment-613236056 If you are using non-root, that's good. The above link shows NFS mount is not working in case of non-root users. So, I assumed that you might use root to avoid that. I also don't want to root user. - https://discuss.kubernetes.io/t/nfs-mount-is-not-working/6458 Then, could you describe how to verify your PR in the following section of the PR description by example with `Minikube`? ``` ## How was this patch tested? Test cases were added just like the existing EmptyDir support. Also the code were ran against my Kubernetes and NFS using pyspark. ``` As you know, Apache Spark is using Minikube as the official test environment instead of GCE or EKS. Also, without additional configuration like SecurityContext, this PR doesn't work in the vanilla `Minikube`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613235282 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613235287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25942/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613235287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25942/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613235282 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support
yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613234916 updated, Thank you very much @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support
SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613234973 **[Test build #121255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121255/testReport)** for PR 28106 at commit [`0fc69ca`](https://github.com/apache/spark/commit/0fc69ca4eb2948307e87798dceee137719d1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#discussion_r407874752 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -172,17 +176,8 @@ object DateTimeUtils { * @return The number of micros since epoch from `java.sql.Timestamp`. */ def fromJavaTimestamp(t: Timestamp): SQLTimestamp = { -val era = if (t.before(julianCommonEraStart)) 0 else 1 -val localDateTime = LocalDateTime.of( - t.getYear + 1900, t.getMonth + 1, 1, - t.getHours, t.getMinutes, t.getSeconds, t.getNanos) - .`with`(ChronoField.ERA, era) - // Add days separately to convert dates existed in Julian calendar but not - // in Proleptic Gregorian calendar. For example, 1000-02-29 is valid date - // in Julian calendar because 1000 is a leap year but 1000 is not a leap - // year in Proleptic Gregorian calendar. And 1000-02-29 doesn't exist in it. - .plusDays(t.getDate - 1) // Returns the next valid date after `date.getDate - 1` days -instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant) +val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS Review comment: This seems to be missing in `branch-3.0`. ``` $ git grep millisToMicros sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala: val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#discussion_r407874752 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -172,17 +176,8 @@ object DateTimeUtils { * @return The number of micros since epoch from `java.sql.Timestamp`. */ def fromJavaTimestamp(t: Timestamp): SQLTimestamp = { -val era = if (t.before(julianCommonEraStart)) 0 else 1 -val localDateTime = LocalDateTime.of( - t.getYear + 1900, t.getMonth + 1, 1, - t.getHours, t.getMinutes, t.getSeconds, t.getNanos) - .`with`(ChronoField.ERA, era) - // Add days separately to convert dates existed in Julian calendar but not - // in Proleptic Gregorian calendar. For example, 1000-02-29 is valid date - // in Julian calendar because 1000 is a leap year but 1000 is not a leap - // year in Proleptic Gregorian calendar. And 1000-02-29 doesn't exist in it. - .plusDays(t.getDate - 1) // Returns the next valid date after `date.getDate - 1` days -instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant) +val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS Review comment: This seems to be the root cause. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
dongjoon-hyun commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#issuecomment-613232736 Hi, @cloud-fan . This seems to break `branch-3.0`. Could you take a look at that once more? ``` [ERROR] [Error] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala:181: not found: value millisToMicros 700 [ERROR] one error found ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support
maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613232614 No more comment now except for the existing comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support
maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#discussion_r407873456 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala ## @@ -89,4 +90,29 @@ class MiscExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(outputCodegen.contains(s"Result of $inputExpr is 1")) assert(outputEval.contains(s"Result of $inputExpr is 1")) } + + test("try expression") { +intercept[RuntimeException] { Review comment: nit: could you check the error message, too, just in case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support
maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#discussion_r407872264 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ## @@ -198,3 +201,85 @@ case class TypeOf(child: Expression) extends UnaryExpression { defineCodeGen(ctx, ev, _ => s"""UTF8String.fromString(${child.dataType.catalogString})""") } } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Evaluate an expression and handle certain types of runtime exceptions by returning NULL. +In cases where it is preferable that queries produce NULL instead of failing when corrupt or invalid data is encountered, the TRY function may be useful, especially when ANSI mode is on and the users need null-tolerant on certain columns or outputs. Review comment: nit: "the TRY function" -> "the `_FUNC_` function"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support
maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#discussion_r407872797 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ## @@ -198,3 +201,85 @@ case class TypeOf(child: Expression) extends UnaryExpression { defineCodeGen(ctx, ev, _ => s"""UTF8String.fromString(${child.dataType.catalogString})""") } } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Evaluate an expression and handle certain types of runtime exceptions by returning NULL. +In cases where it is preferable that queries produce NULL instead of failing when corrupt or invalid data is encountered, the TRY function may be useful, especially when ANSI mode is on and the users need null-tolerant on certain columns or outputs. +AnalysisExceptions will not be handled by this, typically runtime exceptions handled by TRY function are: + + * ArightmeticException - e.g. division by zero, numeric value out of range, + * NumberFormatException - e.g. invalid casting, + * IllegalArgumentException - e.g. invalid datetime pattern, missing format argument for string formatting, + * DateTimeException - e.g. invalid datetime values + * UnsupportedEncodingException - e.g. encode or decode string with invalid charset Review comment: Thanks for the update. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613231464 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613231468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25941/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613231468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25941/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613231464 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands
cloud-fan commented on issue #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands URL: https://github.com/apache/spark/pull/27701#issuecomment-613231273 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands
cloud-fan closed pull request #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands URL: https://github.com/apache/spark/pull/27701 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
SparkQA commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613231147 **[Test build #121254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121254/testReport)** for PR 28199 at commit [`bd34313`](https://github.com/apache/spark/commit/bd34313b4b45a5a0998cc6075a4b2596086d2560). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229646 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25940/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime
cloud-fan commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime URL: https://github.com/apache/spark/pull/28199#issuecomment-613229841 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229646 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613229157 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121250/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support
AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25940/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613229154 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613229157 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121250/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
SparkQA removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613221968 **[Test build #121250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)** for PR 28210 at commit [`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613229154 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support
SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229275 **[Test build #121253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121253/testReport)** for PR 28106 at commit [`bc59b6c`](https://github.com/apache/spark/commit/bc59b6cfd505e8cff98ecdbf4070e9072fead3db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613228943 **[Test build #121250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)** for PR 28210 at commit [`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613228942 Should we backport this and SPARK-31186, @viirya and @ueshin? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…
cloud-fan closed pull request #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca… URL: https://github.com/apache/spark/pull/28198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
HyukjinKwon closed pull request #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support
maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613229093 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613228761 The last comment was comment only. Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…
cloud-fan commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca… URL: https://github.com/apache/spark/pull/28198#issuecomment-613228680 thanks, merging to 3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in SQLMetricsSuite run with both codegen on and off
maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#discussion_r407869788 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala ## @@ -214,14 +214,16 @@ trait SQLMetricsTestUtils extends SQLTestUtils { protected def testSparkPlanMetrics( df: DataFrame, expectedNumOfJobs: Int, - expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = { + expectedMetrics: Map[Long, (String, Map[String, Any])], + enableWholeStage: Boolean = false): Unit = { val expectedMetricsPredicates = expectedMetrics.mapValues { case (nodeName, nodeMetrics) => (nodeName, nodeMetrics.mapValues(expectedMetricValue => (actualMetricValue: Any) => { actualMetricValue.toString.matches(expectedMetricValue.toString) })) } -testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, expectedMetricsPredicates) +testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, expectedMetricsPredicates, Review comment: Ah, the same comment there https://github.com/apache/spark/pull/28173#discussion_r406877479 Could you check this, @sririshindra This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407868316 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) Review comment: ~How about using `PlanLater` there?~ Ah, I see. We need to think more about this... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task URL: https://github.com/apache/spark/pull/26339#issuecomment-613226194 > IIUC, only one task attempt should commit at a time. If so, will there still be conflicts with task speculation when the first task commit successfully ? Got, It seems that we need judge whether the commit operation performed in SparkHadoopMapRedUtil.commitTask. Then we need revert the performed commit status if exception happened when renaming dynacmicStagingTaskFiles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate
cloud-fan commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate URL: https://github.com/apache/spark/pull/28205#issuecomment-613227589 can we fix the conflicts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407868316 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) Review comment: How about using `PlanLater` there? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407867656 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) +} + } + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_ENABLED)) { + return plan +} + +plan transform { + case join: Join if join.joinType == Inner => Review comment: Yea, I also think we can support all join types except for full outer joins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task
turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task URL: https://github.com/apache/spark/pull/26339#issuecomment-613226194 > IIUC, only one task attempt should commit at a time. If so, will there still be conflicts with task speculation when the first task commit successfully ? Got, It seems that we need judge whether the commit operation performed in SparkHadoopMapRedUtil.commitTask. Then we need revert the performed commit status if some exception happened when renaming dynacmicStagingTaskFiles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407866696 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) Review comment: > You cannot extract a join plan pattern, e.g., `Join(LogicalRelation(HadoopFsRelation), LogicalRelation(HadoopFsRelation))`, just like `ScanOperation`? If I match a `Join` in `FileSourceStrategy`, I need to return `SparkPlan` for `Join`, no? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and o
maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#discussion_r407866066 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala ## @@ -214,14 +214,16 @@ trait SQLMetricsTestUtils extends SQLTestUtils { protected def testSparkPlanMetrics( df: DataFrame, expectedNumOfJobs: Int, - expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = { + expectedMetrics: Map[Long, (String, Map[String, Any])], + enableWholeStage: Boolean = false): Unit = { val expectedMetricsPredicates = expectedMetrics.mapValues { case (nodeName, nodeMetrics) => (nodeName, nodeMetrics.mapValues(expectedMetricValue => (actualMetricValue: Any) => { actualMetricValue.toString.matches(expectedMetricValue.toString) })) } -testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, expectedMetricsPredicates) +testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, expectedMetricsPredicates, Review comment: Is it not enough just to run tests here with both modes: codegen and non-codegen? ``` Seq(true, false).foreach { codegenEnabled => testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, expectedMetricsPredicates, codegenEnabled) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
cloud-fan closed pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp URL: https://github.com/apache/spark/pull/28189#issuecomment-613224495 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613224073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613224081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121241/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613224081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121241/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613224026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25939/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613224019 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613224073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613224019 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613224026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25939/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
SparkQA commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-613223702 **[Test build #121251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121251/testReport)** for PR 28188 at commit [`e22caed`](https://github.com/apache/spark/commit/e22caed2136b21a3694b94cb68d713d37aa52fa5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
SparkQA commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613223712 **[Test build #121252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121252/testReport)** for PR 28173 at commit [`8ac7fe5`](https://github.com/apache/spark/commit/8ac7fe54a56ea06035dff901a81e93eb8b5ded6f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
bmarcott commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-613223634 Meanwhile.. I added a test case which matches that scenario you described in the other PR. I changed the code such that launching a task is still required to reset the timer, but not launching a task does not prevent you from resetting on a follow up offer. This passes the new test and previous ones. It seems better to remove the requirement to launch a task, but I'm ok with this since it isn't worse than legacy locality wait. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
SparkQA removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613176864 **[Test build #121241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121241/testReport)** for PR 28200 at commit [`a128f01`](https://github.com/apache/spark/commit/a128f0100a23a7a9ad7f6133b1c7704c824270ba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
SparkQA commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir URL: https://github.com/apache/spark/pull/28200#issuecomment-613223580 **[Test build #121241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121241/testReport)** for PR 28200 at commit [`a128f01`](https://github.com/apache/spark/commit/a128f0100a23a7a9ad7f6133b1c7704c824270ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off
maropu commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off URL: https://github.com/apache/spark/pull/28173#issuecomment-613223510 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407864011 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) +} + } + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_ENABLED)) { + return plan +} + +plan transform { + case join: Join if join.joinType == Inner => Review comment: Thanks for pointing this out. Actually, this should apply for any join types; we don't check the join types when buckets are used. This optimization would help in equi-join cases but I don't think this will negatively impact the non equi-join cases. (please correct me if I am wrong here) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613222587 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613222593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121230/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-61353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25938/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] manuzhang commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task
manuzhang commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task URL: https://github.com/apache/spark/pull/26339#issuecomment-613222514 IIUC, only one task attempt should commit at a time. If so, will there still be conflicts with task speculation when the first task commit successfully ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-61349 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613222593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121230/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613222587 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-61353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25938/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer
AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer URL: https://github.com/apache/spark/pull/28188#issuecomment-61349 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613136567 **[Test build #121230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121230/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate
MaxGekk commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate URL: https://github.com/apache/spark/pull/28205#issuecomment-613222161 @cloud-fan @HyukjinKwon Please, review the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch
SparkQA commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch URL: https://github.com/apache/spark/pull/27649#issuecomment-613222015 **[Test build #121230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121230/testReport)** for PR 27649 at commit [`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613221968 **[Test build #121250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)** for PR 28210 at commit [`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable URL: https://github.com/apache/spark/pull/28123#discussion_r407862565 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketInJoin extends Rule[LogicalPlan] { + val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets" + + private val sqlConf = SQLConf.get + + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) + if r.bucketSpec.nonEmpty && !r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + (large - small) <= sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) { + Some(small) +} else { + None +} + } + + private def addBucketHint(plan: LogicalPlan, hint: (String, String)): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) => +l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession)) Review comment: > for example, if I see a Join in a strategy, I need to act on it, and if I see a scan node, I cannot traverse up the TreeNode You cannot extract a join plan pattern, e.g., `Join(LogicalRelation(HadoopFsRelation), LogicalRelation(HadoopFsRelation))`, just like `ScanOperation`? > Modifying BucketSpec is not desirable either, right? Yea, `BucketSpec` is an item in catalog, so I think we cannot put optimization hints in it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28209: [SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF
huaxingao commented on issue #28209: [SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF URL: https://github.com/apache/spark/pull/28209#issuecomment-613221208 Thanks! @HyukjinKwon @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…
yaooqinn commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca… URL: https://github.com/apache/spark/pull/28198#issuecomment-613220853 cc @dongjoon-hyun @cloud-fan @HyukjinKwon, the backport PR is ready to review thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613220581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613220587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25937/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-613220348 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613220587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25937/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.
AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution. URL: https://github.com/apache/spark/pull/28210#issuecomment-613220581 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support
yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support URL: https://github.com/apache/spark/pull/28106#issuecomment-613220617 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption
AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption URL: https://github.com/apache/spark/pull/25965#issuecomment-613220356 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121235/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org