[GitHub] [spark] HyukjinKwon commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax
HyukjinKwon commented on a change in pull request #28026: URL: https://github.com/apache/spark/pull/28026#discussion_r513197469 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala ## @@ -295,18 +295,61 @@ private[sql] object CatalogV2Util { catalog.name().equalsIgnoreCase(CatalogManager.SESSION_CATALOG_NAME) } - def convertTableProperties( + def convertTableProperties(c: CreateTableStatement): Map[String, String] = { +convertTableProperties( + c.properties, c.options, c.serde, c.location, c.comment, c.provider, c.external) + } + + def convertTableProperties(c: CreateTableAsSelectStatement): Map[String, String] = { +convertTableProperties( + c.properties, c.options, c.serde, c.location, c.comment, c.provider, c.external) + } + + def convertTableProperties(r: ReplaceTableStatement): Map[String, String] = { +convertTableProperties(r.properties, r.options, r.serde, r.location, r.comment, r.provider) + } + + def convertTableProperties(r: ReplaceTableAsSelectStatement): Map[String, String] = { +convertTableProperties(r.properties, r.options, r.serde, r.location, r.comment, r.provider) + } + + private def convertTableProperties( properties: Map[String, String], options: Map[String, String], + serdeInfo: Option[SerdeInfo], location: Option[String], comment: Option[String], - provider: Option[String]): Map[String, String] = { -properties ++ options ++ + provider: Option[String], + external: Boolean = false): Map[String, String] = { +properties ++ + options ++ // to make the transition to the "option." prefix easier, add both + options.map { case (key, value) => TableCatalog.OPTION_PREFIX + key -> value } ++ + convertToProperties(serdeInfo) ++ + (if (external) Map(TableCatalog.PROP_EXTERNAL -> "true") else Map.empty) ++ Review comment: Making separate JIRAs and PRs: - make it easier to port/revert/manage, e.g.) reverting a specific change. - easier to track the discussions specific to smaller scope, e.g.) the current discussion thread looks already very long to read - +1653 and −1125 change is large. It makes easily people to miss key things during a review. - this `CREATE TABLE` syntax is pretty much an entry point of users. Here is where we need more calls for review. I would encourage to make it easier to review so many people can review closely to prevent any surprise by simply overlooking. If I am not mistaken, it is usually considered as a good practice to split if that can logically be split. I also encourage to split PRs in general. If we should not split, I think the argument should be about why it should not be split (instead of why it should be split). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR edited a comment on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717716130 Isn't it declared as "incorrect" behavior in discussion thread in dev. mailing list? If that's not a bug what exactly we fix for master branch? I think it's a kind of serious bug as Spark "ignores" the end users' intention and silently fails back. Suppose the custom session catalog does audit then it silently skips auditing and continue to do the operation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
AmplabJenkins removed a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717714917 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34953/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
AmplabJenkins removed a comment on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717714930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34955/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717716376 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34957/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717716130 Isn't it declared as "incorrect" behavior in discussion thread in dev. mailing list? If that's not a bug what exactly we fix for master branch? I think it's a kind of serious bug as Spark "ignores" the end users' intention and silently fails back. Suppose the custom session catalog does audit then it silently skips auditing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
AmplabJenkins removed a comment on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717714926 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
AmplabJenkins removed a comment on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717708872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
AmplabJenkins removed a comment on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
SparkQA removed a comment on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717631159 **[Test build #130343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130343/testReport)** for PR 30147 at commit [`d50c691`](https://github.com/apache/spark/commit/d50c6911c76ff62286c3bef571d5a488b1f7b5ec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
cloud-fan commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-717715392 @viirya @maropu what do you think about backporting? Shall we backport without the new rule or with it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
cloud-fan closed pull request #30093: URL: https://github.com/apache/spark/pull/30093 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
SparkQA commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717714903 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
SparkQA commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717714913 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34955/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
AmplabJenkins commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717714910 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
AmplabJenkins commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717714926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
cloud-fan commented on pull request #30093: URL: https://github.com/apache/spark/pull/30093#issuecomment-717715051 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717713671 **[Test build #130355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130355/testReport)** for PR 30166 at commit [`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier
cloud-fan closed pull request #30079: URL: https://github.com/apache/spark/pull/30079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on pull request #30079: URL: https://github.com/apache/spark/pull/30079#issuecomment-717712906 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30079: [SPARK-33174][SQL] Migrate DROP TABLE to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #30079: URL: https://github.com/apache/spark/pull/30079#discussion_r513193419 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -367,9 +367,17 @@ class ResolveSessionCatalog( orCreate = c.orCreate) } +case DropTable( +r @ ResolvedTable(_, _, _: V1Table), ifExists, purge) if isSessionCatalog(r.catalog) => + DropTableCommand(r.identifier.asTableIdentifier, ifExists, isView = false, purge = purge) + // v1 DROP TABLE supports temp view. -case DropTableStatement(TempViewOrV1Table(name), ifExists, purge) => - DropTableCommand(name.asTableIdentifier, ifExists, isView = false, purge = purge) +case DropTable(r: ResolvedView, ifExists, purge) => + if (!r.isTemp) { Review comment: Currently, it's duplicated with the check in `DropTableCommand`. I think it's OK as we want to move to v2 commands eventually. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
SparkQA commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717711223 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34954/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
HyukjinKwon commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717711256 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs
SparkQA commented on pull request #29247: URL: https://github.com/apache/spark/pull/29247#issuecomment-717710315 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34956/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
SparkQA commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717709003 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34955/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
AmplabJenkins commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717708872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
SparkQA commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717708073 **[Test build #130343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130343/testReport)** for PR 30147 at commit [`d50c691`](https://github.com/apache/spark/commit/d50c6911c76ff62286c3bef571d5a488b1f7b5ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
HyukjinKwon edited a comment on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717706506 Hm, is it something we should port back? I think it's hard to call it a bug. Maintenance release shouldn't have such behaviour changes in general according to semver though I got that DSv2 is still experimental but this doesn't look something necessary to port back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
SparkQA commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717706418 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34953/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
HyukjinKwon commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717706506 Hm, is it something we should port back? I think it's hard to call it a bug. Maintenance release shouldn't have such behaviour changes in general according to semver though I got that DSv2 is still experimental. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r513185051 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -94,6 +95,9 @@ static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure"; private static final boolean DEFAULT_STOP_ON_FAILURE = false; + // Used by shuffle merge manager to create merged shuffle files. + protected static final String APP_BASE_RELATIVE_PATH = "usercache/%s/appcache/%s/"; Review comment: Also I think we don't need a separate `registerApplication` API anymore. It's just creating an empty `appsPathInfo` for the appId and adds it to the `appsPathInfo`. This could be done in `registerExecutor`. Even with `ExternalShuffleBlockResolver` there isn't any registerApplication API. My next commit will remove this as well. cc. @Victsm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
SparkQA commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717702143 **[Test build #130354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130354/testReport)** for PR 26935 at commit [`95028e2`](https://github.com/apache/spark/commit/95028e2a70a3a32f6b6c10cef432dbb2eb5bed58). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
HeartSaVioR commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717701730 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r513182393 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -279,6 +287,7 @@ public void initializeApplication(ApplicationInitializationContext context) { } catch (Exception e) { logger.error("Exception when initializing application {}", appId, e); } +shuffleMergeManager.registerApplication(appId, context.getUser()); Review comment: This is a mistake. My next commit will have the fix for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on a change in pull request #30062: URL: https://github.com/apache/spark/pull/30062#discussion_r513180854 ## File path: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java ## @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.Arrays; + +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; +import com.google.common.collect.ImmutableMap; + +import org.apache.commons.io.FileUtils; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; +import org.roaringbitmap.RoaringBitmap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static org.junit.Assert.*; + +import org.apache.spark.network.buffer.FileSegmentManagedBuffer; +import org.apache.spark.network.client.StreamCallbackWithID; +import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; +import org.apache.spark.network.util.MapConfigProvider; +import org.apache.spark.network.util.TransportConf; + +/** + * Tests for {@link RemoteBlockPushResolver}. + */ +public class RemoteBlockPushResolverSuite { + + private static final Logger log = LoggerFactory.getLogger(RemoteBlockPushResolverSuite.class); + private final String MERGE_DIR_RELATIVE_PATH = "usercache/%s/appcache/%s/"; + private final String TEST_USER = "testUser"; + private final String TEST_APP = "testApp"; + private final String BLOCK_MANAGER_DIR = "blockmgr-193d8401"; + + private TransportConf conf; + private RemoteBlockPushResolver pushResolver; + private String[] localDirs; + + @Before + public void before() throws IOException { +localDirs = new String[]{Paths.get("target/l1").toAbsolutePath().toString(), + Paths.get("target/l2").toAbsolutePath().toString()}; +cleanupLocalDirs(); +MapConfigProvider provider = new MapConfigProvider( + ImmutableMap.of("spark.shuffle.server.minChunkSizeInMergedShuffleFile", "4")); +conf = new TransportConf("shuffle", provider); +pushResolver = new RemoteBlockPushResolver(conf, MERGE_DIR_RELATIVE_PATH); +registerApplication(TEST_APP, TEST_USER); +registerExecutor(TEST_APP, prepareBlockManagerLocalDirs(TEST_APP, TEST_USER, localDirs)); + } + + @After + public void after() { +try { + cleanupLocalDirs(); + removeApplication(TEST_APP); +} catch (Exception e) { + // don't fail if clean up doesn't succeed. + log.debug("Error while tearing down", e); +} + } + + private void cleanupLocalDirs() throws IOException { +for (String local : localDirs) { + FileUtils.deleteDirectory(new File(local)); +} + } + + @Test(expected = RuntimeException.class) + public void testNoIndexFile() { +try { + pushResolver.getMergedBlockMeta(TEST_APP, 0, 0); +} catch (Throwable t) { + assertTrue(t.getMessage().startsWith("Merged shuffle index file")); + Throwables.propagate(t); +} + } + + @Test + public void testBasicBlockMerge() throws IOException { +PushBlockStream[] pushBlocks = new PushBlockStream[] { + new PushBlockStream(TEST_APP, "shuffle_0_0_0", 0), + new PushBlockStream(TEST_APP, "shuffle_0_1_0", 0), +}; +ByteBuffer[] blocks = new ByteBuffer[]{ + ByteBuffer.wrap(new byte[4]), + ByteBuffer.wrap(new byte[5]) +}; +pushBlockHelper(TEST_APP, pushBlocks, blocks); +MergedBlockMeta blockMeta = pushResolver.getMergedBlockMeta(TEST_APP, 0, 0); +validateChunks(TEST_APP, 0, 0, blockMeta, new int[]{4, 5}, new int[][]{{0}, {1}}); + } + + @Test + public void testDividingMergedBlocksIntoChunks() throws IOException { +PushBlockStream[] pushBlocks = new PushBlockStream[] { + new PushBlockStream(TEST_APP, "shuffle_0_0_0", 0), + new PushBlockStream(TEST_APP, "shuffle_0_1_0", 0), + new PushBlockStream(TEST_APP, "shuffle_0_2_0", 0), + new PushBlockStream(TEST_APP, "shuffle_0_3_0", 0), +}; +
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
HeartSaVioR commented on a change in pull request #30162: URL: https://github.com/apache/spark/pull/30162#discussion_r513180738 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala ## @@ -89,10 +89,15 @@ case class OffsetSeqMetadata( object OffsetSeqMetadata extends Logging { private implicit val format = Serialization.formats(NoTypeHints) + /** + * These configs are related to streaming query execution and should not be changed across + * batches of a streaming query. The values of these configs are persisted into the offset + * log in the checkpoint position. + */ private val relevantSQLConfs = Seq( SHUFFLE_PARTITIONS, STATE_STORE_PROVIDER_CLASS, STREAMING_MULTIPLE_WATERMARK_POLICY, FLATMAPGROUPSWITHSTATE_STATE_FORMAT_VERSION, STREAMING_AGGREGATION_STATE_FORMAT_VERSION, -STREAMING_JOIN_STATE_FORMAT_VERSION) +STREAMING_JOIN_STATE_FORMAT_VERSION, STATE_STORE_COMPRESSION_CODEC) Review comment: I'd also recommend to add the default value ("lz4") in `relevantSQLConfDefaultValues` to make sure we don't make any possible mistake. (That's a sort of defensive programming though.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs
SparkQA commented on pull request #29247: URL: https://github.com/apache/spark/pull/29247#issuecomment-717694392 **[Test build #130353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130353/testReport)** for PR 29247 at commit [`1854e74`](https://github.com/apache/spark/commit/1854e7465d5309a382e118bfe3fe7ada5023ef52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
SparkQA commented on pull request #30167: URL: https://github.com/apache/spark/pull/30167#issuecomment-717694405 **[Test build #130352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130352/testReport)** for PR 30167 at commit [`f6550d0`](https://github.com/apache/spark/commit/f6550d07f201e4f23cdc40e049aa7f550eabdf2b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs
AmplabJenkins removed a comment on pull request #29247: URL: https://github.com/apache/spark/pull/29247#issuecomment-664032815 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #29247: [SPARK-32446][SHS] Add new executor metrics summary REST APIs
gengliangwang commented on pull request #29247: URL: https://github.com/apache/spark/pull/29247#issuecomment-717693646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR opened a new pull request #30167: [SPARK-33240][SQL][3.0] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR opened a new pull request #30167: URL: https://github.com/apache/spark/pull/30167 ### What changes were proposed in this pull request? This patch proposes to change the behavior on failing fast when Spark fails to instantiate configured v2 session catalog. ### Why are the changes needed? The Spark behavior is against the intention of the end users - if end users configure session catalog which Spark would fail to initialize, Spark would swallow the error with only logging the error message and silently use the default catalog implementation. This follows the voices on [discussion thread](https://lists.apache.org/thread.html/rdfa22a5ebdc4ac66e2c5c8ff0cd9d750e8a1690cd6fb456d119c2400%40%3Cdev.spark.apache.org%3E) in dev mailing list. ### Does this PR introduce _any_ user-facing change? Yes. After the PR Spark will fail immediately if Spark fails to instantiate configured session catalog. ### How was this patch tested? New UT added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717692702 I'll submit a PR for 3.0 branch as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
SparkQA commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717692390 **[Test build #130351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130351/testReport)** for PR 30162 at commit [`86f0924`](https://github.com/apache/spark/commit/86f09243816112112addb8cf18c97118a983f434). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
viirya commented on a change in pull request #30162: URL: https://github.com/apache/spark/pull/30162#discussion_r513177486 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala ## @@ -89,10 +89,15 @@ case class OffsetSeqMetadata( object OffsetSeqMetadata extends Logging { private implicit val format = Serialization.formats(NoTypeHints) + /** + * These configs are related to streaming query execution and should not be changed across + * batches of a streaming query. The values of these configs are persisted into the offset + * log in the checkpoint position. + */ Review comment: Added some comments to notify readers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table
HeartSaVioR commented on pull request #30148: URL: https://github.com/apache/spark/pull/30148#issuecomment-717690813 Let's fix it to remove any confusion then - let's ensure both `spark.read.table` and `spark.table` can't deal with streaming table (even it is from temp view) so end users need to deal with `spark.readStream.table`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717689524 @HeartSaVioR Make sense. Looks like we need to put the config into `relevantSQLConfs`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
HeartSaVioR commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717687791 @dongjoon-hyun @viirya The problem isn't something about changing the config during the single run. The problem is something about changing the config during the new run with checkpoint. That's why we should put the config for the first time, and always read the value from checkpoint and ignore further modification. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717685222 > https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115 > > Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata. Hmm, as we use default value as lz4, do we still need to put new config into here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717684847 > @viirya and @HeartSaVioR . > Shall we put the new config into `StaticSQLConf.scala` instead of `SQLConf.scala`? I think that is enough. Yes, that is also what I think. Will update later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717682965 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130346/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
HeartSaVioR commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717683158 https://github.com/apache/spark/blob/fcf8aa59b5025dde9b4af36953146894659967e2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L92-L115 Please look into how `relevantSQLConfs` is handled in OffsetSeqMetadata. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
dongjoon-hyun commented on a change in pull request #30162: URL: https://github.com/apache/spark/pull/30162#discussion_r513168650 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1324,6 +1324,16 @@ object SQLConf { .intConf .createWithDefault(2) + val STATE_STORE_COMPRESSION_CODEC = +buildConf("spark.sql.streaming.stateStore.compression.codec") + .internal() + .doc("The codec used to compress delta and snapshot files generated by StateStore. " + +"By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " + +"use fully qualified class names to specify the codec. Default codec is lz4.") + .version("3.1.0") + .stringConf + .createWithDefault("lz4") Review comment: This doesn't change the default value. I believe we can add the benchmark result as a follow-up, @HeartSaVioR . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717682955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
dongjoon-hyun commented on a change in pull request #30162: URL: https://github.com/apache/spark/pull/30162#discussion_r513168650 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1324,6 +1324,16 @@ object SQLConf { .intConf .createWithDefault(2) + val STATE_STORE_COMPRESSION_CODEC = +buildConf("spark.sql.streaming.stateStore.compression.codec") + .internal() + .doc("The codec used to compress delta and snapshot files generated by StateStore. " + +"By default, Spark provides four codecs: lz4, lzf, snappy, and zstd. You can also " + +"use fully qualified class names to specify the codec. Default codec is lz4.") + .version("3.1.0") + .stringConf + .createWithDefault("lz4") Review comment: This doesn't change the default value. I believe we can add the benchmark result as a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717682955 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
dongjoon-hyun commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717682587 @viirya and @HeartSaVioR . Shall we put the new config into `StaticSQLConf.scala` instead of `SQLConf.scala`? I think that is enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717682590 **[Test build #130346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130346/testReport)** for PR 30166 at commit [`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717645869 **[Test build #130346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130346/testReport)** for PR 30166 at commit [`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
HeartSaVioR commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717681783 Thanks for reviewing and merging! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th
AmplabJenkins removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717681498 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem
AmplabJenkins commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717681498 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717681485 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34951/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem
AmplabJenkins commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717680723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th
AmplabJenkins removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717680723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717674157 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717674160 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34950/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717680708 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34952/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717675078 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34952/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717674157 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717674144 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34950/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
cloud-fan closed pull request #30147: URL: https://github.com/apache/spark/pull/30147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30147: [SPARK-33240][SQL] Fail fast when fails to instantiate configured v2 session catalog
cloud-fan commented on pull request #30147: URL: https://github.com/apache/spark/pull/30147#issuecomment-717673091 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717672685 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34951/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
AmplabJenkins removed a comment on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717668190 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34949/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
AmplabJenkins removed a comment on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717668186 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
AmplabJenkins commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717668186 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
SparkQA commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717668172 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r513153232 ## File path: sql/core/src/test/resources/sql-tests/results/window.sql.out ## @@ -479,6 +479,38 @@ Anthony Bow6627Gerard Bondur Leslie Thompson5186Gerard Bondur +-- !query +SELECT +employee_name, +salary, +nth_value(employee_name, 2) OVER ( + ORDER BY salary DESC + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) second_highest_salary +FROM +basic_pays +ORDER BY salary DESC +-- !query schema +struct +-- !query output +Larry Bott 11798 NULL +Gerard Bondur 11472 Gerard Bondur +Pamela Castillo11303 Gerard Bondur Review comment: The output comes from `hiveResultString` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-71702 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34950/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec
viirya commented on pull request #30162: URL: https://github.com/apache/spark/pull/30162#issuecomment-717666167 > either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats) Btw, out of curiosity, I check two state format configs `spark.sql.streaming.join.stateFormatVersion` and `spark.sql.streaming.aggregation.stateFormatVersion`. Their docs just explicitly claims that `state format version shouldn't be modified after running.`, and I don't find related code to prevent the values to be changed after the query starts. Maybe I miss it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717664437 **[Test build #130350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130350/testReport)** for PR 30156 at commit [`19f`](https://github.com/apache/spark/commit/19f6af09fa9d612592ecb86b442efcd6f738). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717662046 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34948/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717662042 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717662099 **[Test build #130349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130349/testReport)** for PR 30156 at commit [`db7d53a`](https://github.com/apache/spark/commit/db7d53a096149c91cb4376ae890bad1eaa2db679). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717662031 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34948/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less
AngersZh commented on a change in pull request #30156: URL: https://github.com/apache/spark/pull/30156#discussion_r513148800 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2748,6 +2748,16 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_SCRIPT_TRANSFORM_PAD_NULL = +buildConf("spark.sql.legacy.transformationPadNullWhenValueLessThenSchema") + .internal() + .doc("Whether pad null value when transformation output value size less then schema size." + +"When true, we pad NULL value to keep same behavior with hive." + +"When false, we keep original behavior to throw `ArrayIndexOutOfBoundsException`") Review comment: > `we` -> `Spark` Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717662042 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less
AngersZh commented on a change in pull request #30156: URL: https://github.com/apache/spark/pull/30156#discussion_r513148844 ## File path: docs/sql-migration-guide.md ## @@ -49,6 +49,8 @@ license: | - In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your custom SerDes to Hive 2.3. See [HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details. - In Spark 3.1, loading and saving of timestamps from/to parquet files fails if the timestamps are before 1900-01-01 00:00:00Z, and loaded (saved) as the INT96 type. In Spark 3.0, the actions don't fail but might lead to shifting of the input timestamps due to rebasing from/to Julian to/from Proleptic Gregorian calendar. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.parquet.int96RebaseModeInRead` or/and `spark.sql.legacy.parquet.int96RebaseModeInWrite` to `LEGACY`. + + - In Spark 3.1, when `spark.sql.legacy.transformationPadNullWhenValueLessThenSchema` is true, Spark will pad NULL value when scrip transformation's output value size less then schema size in default-serde mode. If false, we will keep original behavior to throw `ArrayIndexOutOfBoundsException`. Review comment: > `scrip` -> `script`. Could we a bit more elaborate about "default-serde mode"? > > `we will keep original behavior to throw ...` -> `Spark will keep original behavior to throw ...` Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
SparkQA commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-717661324 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34949/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-717659086 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34948/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717658158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717658158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717606710 **[Test build #130342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130342/testReport)** for PR 30139 at commit [`53b3c0b`](https://github.com/apache/spark/commit/53b3c0bb90d4b22e4994ea44a659026c4ba0c93c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717657234 **[Test build #130342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130342/testReport)** for PR 30139 at commit [`53b3c0b`](https://github.com/apache/spark/commit/53b3c0bb90d4b22e4994ea44a659026c4ba0c93c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717652952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34947/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents
maropu commented on pull request #30165: URL: https://github.com/apache/spark/pull/30165#issuecomment-717653139 Thanks! Merged to master/3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins commented on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717652945 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717652916 **[Test build #130348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130348/testReport)** for PR 29800 at commit [`1c0e82b`](https://github.com/apache/spark/commit/1c0e82ba0ab95af7871b4b16f8be1e0662d49c78). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #30165: [SPARK-33264][SQL][DOCS] Add a dedicated page for SQL-on-file in SQL documents
maropu closed pull request #30165: URL: https://github.com/apache/spark/pull/30165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
AmplabJenkins removed a comment on pull request #26935: URL: https://github.com/apache/spark/pull/26935#issuecomment-717652945 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org