[GitHub] [spark] beliefer commented on pull request #34554: [SPARK-37286][SQL] Move compileFilter and compileAggregates from JDBCRDD to JdbcDialect
beliefer commented on pull request #34554: URL: https://github.com/apache/spark/pull/34554#issuecomment-984376002 Based on discussion offline, reopen this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #34554: [SPARK-37286][SQL] Move compileFilter and compileAggregates from JDBCRDD to JdbcDialect
beliefer opened a new pull request #34554: URL: https://github.com/apache/spark/pull/34554 ### What changes were proposed in this pull request? Currently, the method `compileFilter` and `compileAggregates` is a member of `JDBCRDD`. But it is not reasonable, because the JDBC source knowns how to compile filter and aggregate expressions to itself's dialect well. ### Why are the changes needed? JDBC source knowns how to compile filter and aggregate expressions to itself's dialect well. After this PR, we can extend the pushdown(e.g. filter, aggregate) based on different dialect between different JDBC database. There are two situations: First, database A and B implement a different number of aggregate functions that meet the SQL standard. Second, some database implement some aggregate functions that not meet the SQL standard. ### Does this PR introduce _any_ user-facing change? 'No'. Just change the inner implementation. ### How was this patch tested? Jenkins tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
SparkQA commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984366546 **[Test build #145849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145849/testReport)** for PR 34701 at commit [`e479e5c`](https://github.com/apache/spark/commit/e479e5c2088510f88f62e6baa89ebbd18ff135af). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
viirya commented on a change in pull request #34701: URL: https://github.com/apache/spark/pull/34701#discussion_r760827034 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -2162,6 +2163,48 @@ object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { } } +/** + * Prunes unnecessary fields from a [[Generate]] if it is under a project which does not refer + * any generated attributes, .e.g., count-like aggregation on an exploded array. + */ +object GenerateOptimization extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning( + _.containsAllPatterns(PROJECT, GENERATE), ruleId) { + case p @ Project(_, g: Generate) if p.references.isEmpty + && g.generator.isInstanceOf[ExplodeBase] => +g.generator.children.head.dataType match { + case ArrayType(StructType(fields), _) => +val atomicFields = fields.collect { + case f: StructField if f.dataType.isInstanceOf[AtomicType] => f +} +val extractor = if (atomicFields.size > 0) { + // Pick an arbitrary atomic field, if any + ExtractValue(g.generator.children.head, Review comment: okay, updated to use `GetArrayStructFields`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
viirya commented on a change in pull request #34701: URL: https://github.com/apache/spark/pull/34701#discussion_r760826896 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -2162,6 +2163,48 @@ object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { } } +/** + * Prunes unnecessary fields from a [[Generate]] if it is under a project which does not refer + * any generated attributes, .e.g., count-like aggregation on an exploded array. + */ +object GenerateOptimization extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning( + _.containsAllPatterns(PROJECT, GENERATE), ruleId) { + case p @ Project(_, g: Generate) if p.references.isEmpty + && g.generator.isInstanceOf[ExplodeBase] => +g.generator.children.head.dataType match { + case ArrayType(StructType(fields), _) => +val atomicFields = fields.collect { + case f: StructField if f.dataType.isInstanceOf[AtomicType] => f +} +val extractor = if (atomicFields.size > 0) { + // Pick an arbitrary atomic field, if any Review comment: good idea. updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34764: [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command
SparkQA commented on pull request #34764: URL: https://github.com/apache/spark/pull/34764#issuecomment-984361799 **[Test build #145848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145848/testReport)** for PR 34764 at commit [`ec966bf`](https://github.com/apache/spark/commit/ec966bfc901290d129ab7f6f076b6708eb71463f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
AmplabJenkins removed a comment on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984360337 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
AmplabJenkins removed a comment on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984360338 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
AmplabJenkins removed a comment on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984360344 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50317/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
AmplabJenkins removed a comment on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984360339 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50315/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
AmplabJenkins removed a comment on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984360335 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50318/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
AmplabJenkins removed a comment on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984360334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
AmplabJenkins removed a comment on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984360340 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145846/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
AmplabJenkins commented on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984360340 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145846/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
AmplabJenkins commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984360336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
AmplabJenkins commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984360344 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50317/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
AmplabJenkins commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984360337 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
AmplabJenkins commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984360338 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
AmplabJenkins commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984360335 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50318/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
AmplabJenkins commented on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984360339 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50315/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
SparkQA commented on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984358043 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50315/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984354840 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50322/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
SparkQA commented on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984353579 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50321/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984353353 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50316/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
SparkQA commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984351389 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50318/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #34772: [SPARK-37514][PYTHON] Remove workarounds due to older pandas
itholic commented on pull request #34772: URL: https://github.com/apache/spark/pull/34772#issuecomment-984350857 Clean! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
SparkQA commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984350247 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50317/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
SparkQA removed a comment on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984332768 **[Test build #145847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145847/testReport)** for PR 34776 at commit [`acb7c55`](https://github.com/apache/spark/commit/acb7c55efbccab5bc3225d83b7a65c6d45ef8926). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
SparkQA removed a comment on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984332719 **[Test build #145846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145846/testReport)** for PR 34778 at commit [`73a84fc`](https://github.com/apache/spark/commit/73a84fccdcfcc84cf68e8a73ee830dd0c4522627). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
SparkQA commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984348042 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
SparkQA commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984347565 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984345366 **[Test build #145847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145847/testReport)** for PR 34776 at commit [`acb7c55`](https://github.com/apache/spark/commit/acb7c55efbccab5bc3225d83b7a65c6d45ef8926). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
SparkQA commented on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984345256 **[Test build #145846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145846/testReport)** for PR 34778 at commit [`73a84fc`](https://github.com/apache/spark/commit/73a84fccdcfcc84cf68e8a73ee830dd0c4522627). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
dchvn commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760805798 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { provider match { case supportsExtract: SupportsCatalogOptions => val ident = supportsExtract.extractIdentifier(dsOptions) - val catalog = CatalogV2Util.getTableProviderCatalog( Review comment: `CatalogV2Util.getTableProviderCatalog.name` will return `spark_catalog` when we save with default catalog, that is cause of failed test `SupportsCatalogOptionsSuite`. ```scala test(s"save works with ErrorIfExists - no table, no partitioning, session catalog") { testCreateAndRead(SaveMode.ErrorIfExists, None, Nil) } ``` which we need here is `default` namespace got from `ident.namespaces`. It can satisfy the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
dchvn commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760805798 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { provider match { case supportsExtract: SupportsCatalogOptions => val ident = supportsExtract.extractIdentifier(dsOptions) - val catalog = CatalogV2Util.getTableProviderCatalog( Review comment: `CatalogV2Util.getTableProviderCatalog.name` will return `spark_catalog` when we save default catalog, that is cause of failed test `SupportsCatalogOptionsSuite`. ```scala test(s"save works with ErrorIfExists - no table, no partitioning, session catalog") { testCreateAndRead(SaveMode.ErrorIfExists, None, Nil) } ``` which we need here is `default` namespace got from `ident.namespaces`. It can satisfy the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
dchvn commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760805798 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { provider match { case supportsExtract: SupportsCatalogOptions => val ident = supportsExtract.extractIdentifier(dsOptions) - val catalog = CatalogV2Util.getTableProviderCatalog( Review comment: `CatalogV2Util.getTableProviderCatalog.name` will return `spark_catalog` when we save default catalog, that is cause of failed test `SupportsCatalogOptionsSuite`. ```scala test(s"save works with ErrorIfExists - no table, no partitioning, session catalog") { testCreateAndRead(SaveMode.ErrorIfExists, None, Nil) } ``` which we need here is `default` namespace got from `ident.namespaces`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
dchvn commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760805798 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { provider match { case supportsExtract: SupportsCatalogOptions => val ident = supportsExtract.extractIdentifier(dsOptions) - val catalog = CatalogV2Util.getTableProviderCatalog( Review comment: `CatalogV2Util.getTableProviderCatalog.name` will return `spark_catalog` when we save default catalog, that is cause of failed test `SupportsCatalogOptionsSuite`. which we need here is `default` namespace got from `ident.namespaces`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
dchvn commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760805798 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { provider match { case supportsExtract: SupportsCatalogOptions => val ident = supportsExtract.extractIdentifier(dsOptions) - val catalog = CatalogV2Util.getTableProviderCatalog( Review comment: `CatalogV2Util.getTableProviderCatalog.name` will return `spark_catalog` when we save default catalog, that is cause of test fail `SupportsCatalogOptionsSuite`. which we need here is `default` namespace got from `ident.namespaces`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
HyukjinKwon commented on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984333910 Thanks @dchvn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn commented on a change in pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
dchvn commented on a change in pull request #34778: URL: https://github.com/apache/spark/pull/34778#discussion_r760795723 ## File path: python/pyspark/pandas/tests/test_dataframe.py ## @@ -5870,8 +5870,12 @@ def test_cov(self): self.assert_eq(pdf.cov(min_periods=5), psdf.cov(min_periods=5)) # extension dtype -numeric_dtypes = ["Int8", "Int16", "Int32", "Int64", "Float32", "Float64", "float"] -boolean_dtypes = ["boolean", "bool"] +if LooseVersion(pd.__version__) >= LooseVersion("1.2"): Review comment: with pandas < 1.2, `pd.Dataframe.cov` can not work with extension dtype `NAType`, so I just fix the test only ```python >>> pd.__version__ '1.0.5' >>> pdf = pd.DataFrame([[1,2],[None, 3]], dtype="Int64") >>> pdf 0 1 0 1 2 13 >>> pdf.cov() Traceback (most recent call last): File "", line 1, in File "/u02/venv/python3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 7608, in cov baseCov = libalgos.nancorr(ensure_float64(mat), cov=True, minp=min_periods) File "pandas/_libs/algos_common_helper.pxi", line 41, in pandas._libs.algos.ensure_float64 TypeError: float() argument must be a string or a number, not 'NAType' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation (from Series/Index) and TimedeltaIndex.astype
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984332768 **[Test build #145847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145847/testReport)** for PR 34776 at commit [`acb7c55`](https://github.com/apache/spark/commit/acb7c55efbccab5bc3225d83b7a65c6d45ef8926). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
SparkQA commented on pull request #34778: URL: https://github.com/apache/spark/pull/34778#issuecomment-984332719 **[Test build #145846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145846/testReport)** for PR 34778 at commit [`73a84fc`](https://github.com/apache/spark/commit/73a84fccdcfcc84cf68e8a73ee830dd0c4522627). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
AmplabJenkins removed a comment on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984331468 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145843/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)
AmplabJenkins removed a comment on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-984331466 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50314/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34753: [SPARK-37494][SQL] Unify v1 and v2 options output of `SHOW CREATE TABLE` command
AmplabJenkins removed a comment on pull request #34753: URL: https://github.com/apache/spark/pull/34753#issuecomment-984331465 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145833/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
AmplabJenkins removed a comment on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984331467 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145842/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution
AmplabJenkins removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984331423 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145834/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution
AmplabJenkins commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984331423 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145834/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
AmplabJenkins commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984331467 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145842/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
AmplabJenkins commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984331468 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145843/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)
AmplabJenkins commented on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-984331466 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50314/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34753: [SPARK-37494][SQL] Unify v1 and v2 options output of `SHOW CREATE TABLE` command
AmplabJenkins commented on pull request #34753: URL: https://github.com/apache/spark/pull/34753#issuecomment-984331465 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145833/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution
SparkQA removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984210804 **[Test build #145834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145834/testReport)** for PR 32875 at commit [`0345612`](https://github.com/apache/spark/commit/0345612b5ab9347e8b14eec243ec332f98765ab5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution
SparkQA commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984330466 **[Test build #145834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145834/testReport)** for PR 32875 at commit [`0345612`](https://github.com/apache/spark/commit/0345612b5ab9347e8b14eec243ec332f98765ab5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)
SparkQA commented on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-984329538 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50314/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
SparkQA commented on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984329506 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50315/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
SparkQA commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984328332 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
SparkQA commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984328016 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
HyukjinKwon commented on a change in pull request #34778: URL: https://github.com/apache/spark/pull/34778#discussion_r760789523 ## File path: python/pyspark/pandas/tests/test_dataframe.py ## @@ -5870,8 +5870,12 @@ def test_cov(self): self.assert_eq(pdf.cov(min_periods=5), psdf.cov(min_periods=5)) # extension dtype -numeric_dtypes = ["Int8", "Int16", "Int32", "Int64", "Float32", "Float64", "float"] -boolean_dtypes = ["boolean", "bool"] +if LooseVersion(pd.__version__) >= LooseVersion("1.2"): Review comment: @dchvn just to clarify, so cov API only works correctly with pandas 1.2+? or are you just fixing the tests only? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
SparkQA commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984325655 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50317/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984325276 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50316/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
SparkQA commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984324189 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50318/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
SparkQA removed a comment on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984307383 **[Test build #145843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145843/testReport)** for PR 34774 at commit [`b14db3d`](https://github.com/apache/spark/commit/b14db3d31491cdb85401046371613912b99b84dd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
SparkQA removed a comment on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984307321 **[Test build #145842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145842/testReport)** for PR 34775 at commit [`4682604`](https://github.com/apache/spark/commit/4682604d7628afc5ab855a1cefb8d5bb8e64004d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34753: [SPARK-37494][SQL] Unify v1 and v2 options output of `SHOW CREATE TABLE` command
SparkQA removed a comment on pull request #34753: URL: https://github.com/apache/spark/pull/34753#issuecomment-984210101 **[Test build #145833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145833/testReport)** for PR 34753 at commit [`7771b14`](https://github.com/apache/spark/commit/7771b14e20a374bf638dd2abd3dc8ba74e14c3c2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34753: [SPARK-37494][SQL] Unify v1 and v2 options output of `SHOW CREATE TABLE` command
SparkQA commented on pull request #34753: URL: https://github.com/apache/spark/pull/34753#issuecomment-984321198 **[Test build #145833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145833/testReport)** for PR 34753 at commit [`7771b14`](https://github.com/apache/spark/commit/7771b14e20a374bf638dd2abd3dc8ba74e14c3c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
SparkQA commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984320572 **[Test build #145843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145843/testReport)** for PR 34774 at commit [`b14db3d`](https://github.com/apache/spark/commit/b14db3d31491cdb85401046371613912b99b84dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class PandasSQLStringFormatter(string.Formatter):` * `class SQLStringFormatter(string.Formatter):` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dchvn opened a new pull request #34778: [SPARK-36396][PYTHON][FOLLOWUP] Fix test with extensions dtype when pandas version < 1.2
dchvn opened a new pull request #34778: URL: https://github.com/apache/spark/pull/34778 ### What changes were proposed in this pull request? Fix test of `pd.Dataframe.cov` with extensions dtype when pandas version < 1.2 ### Why are the changes needed? Pass test of `pd.Dataframe.cov` with pandas version < 1.2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
SparkQA commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984320063 **[Test build #145842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145842/testReport)** for PR 34775 at commit [`4682604`](https://github.com/apache/spark/commit/4682604d7628afc5ab855a1cefb8d5bb8e64004d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
AmplabJenkins removed a comment on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984312341 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
SparkQA removed a comment on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984260258 **[Test build #145837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145837/testReport)** for PR 34667 at commit [`729cc22`](https://github.com/apache/spark/commit/729cc2272f77216eb825e6ec81e0254560972e01). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
AmplabJenkins commented on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984312341 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
SparkQA commented on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984312176 **[Test build #145837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145837/testReport)** for PR 34667 at commit [`729cc22`](https://github.com/apache/spark/commit/729cc2272f77216eb825e6ec81e0254560972e01). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34760: [SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the never changed 'var' to 'val'
AmplabJenkins removed a comment on pull request #34760: URL: https://github.com/apache/spark/pull/34760#issuecomment-984310479 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145835/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34760: [SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the never changed 'var' to 'val'
AmplabJenkins commented on pull request #34760: URL: https://github.com/apache/spark/pull/34760#issuecomment-984310479 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145835/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
cloud-fan commented on a change in pull request #34701: URL: https://github.com/apache/spark/pull/34701#discussion_r760776756 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -2162,6 +2163,48 @@ object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { } } +/** + * Prunes unnecessary fields from a [[Generate]] if it is under a project which does not refer + * any generated attributes, .e.g., count-like aggregation on an exploded array. + */ +object GenerateOptimization extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning( + _.containsAllPatterns(PROJECT, GENERATE), ruleId) { + case p @ Project(_, g: Generate) if p.references.isEmpty + && g.generator.isInstanceOf[ExplodeBase] => +g.generator.children.head.dataType match { + case ArrayType(StructType(fields), _) => +val atomicFields = fields.collect { + case f: StructField if f.dataType.isInstanceOf[AtomicType] => f +} +val extractor = if (atomicFields.size > 0) { + // Pick an arbitrary atomic field, if any Review comment: shall we pick the smallest one? e.g. prefer int over string -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
cloud-fan commented on a change in pull request #34701: URL: https://github.com/apache/spark/pull/34701#discussion_r760776756 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -2162,6 +2163,48 @@ object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { } } +/** + * Prunes unnecessary fields from a [[Generate]] if it is under a project which does not refer + * any generated attributes, .e.g., count-like aggregation on an exploded array. + */ +object GenerateOptimization extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning( + _.containsAllPatterns(PROJECT, GENERATE), ruleId) { + case p @ Project(_, g: Generate) if p.references.isEmpty + && g.generator.isInstanceOf[ExplodeBase] => +g.generator.children.head.dataType match { + case ArrayType(StructType(fields), _) => +val atomicFields = fields.collect { + case f: StructField if f.dataType.isInstanceOf[AtomicType] => f +} +val extractor = if (atomicFields.size > 0) { + // Pick an arbitrary atomic field, if any Review comment: shall we pick the smaller list one? e.g. prefer int over string -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
cloud-fan commented on a change in pull request #34701: URL: https://github.com/apache/spark/pull/34701#discussion_r760776370 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -2162,6 +2163,48 @@ object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan] { } } +/** + * Prunes unnecessary fields from a [[Generate]] if it is under a project which does not refer + * any generated attributes, .e.g., count-like aggregation on an exploded array. + */ +object GenerateOptimization extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformDownWithPruning( + _.containsAllPatterns(PROJECT, GENERATE), ruleId) { + case p @ Project(_, g: Generate) if p.references.isEmpty + && g.generator.isInstanceOf[ExplodeBase] => +g.generator.children.head.dataType match { + case ArrayType(StructType(fields), _) => +val atomicFields = fields.collect { + case f: StructField if f.dataType.isInstanceOf[AtomicType] => f +} +val extractor = if (atomicFields.size > 0) { + // Pick an arbitrary atomic field, if any + ExtractValue(g.generator.children.head, Review comment: nit: I feel it's safer to create `GetStructField` instead of doing name lookup again. It's possible that some dataframe-generated query plan has name conflicts in the struct, and `GetStructField` allows us to put the ordinal directly to avoid a name lookup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34764: [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command
SparkQA removed a comment on pull request #34764: URL: https://github.com/apache/spark/pull/34764#issuecomment-984260141 **[Test build #145836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145836/testReport)** for PR 34764 at commit [`66a2aaf`](https://github.com/apache/spark/commit/66a2aafc18c653e30c4c8d0442da810307bf9376). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34764: [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command
AmplabJenkins commented on pull request #34764: URL: https://github.com/apache/spark/pull/34764#issuecomment-984309930 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145836/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34764: [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command
AmplabJenkins removed a comment on pull request #34764: URL: https://github.com/apache/spark/pull/34764#issuecomment-984309930 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145836/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34764: [SPARK-37330][SQL] Migrate ReplaceTableStatement to v2 command
SparkQA commented on pull request #34764: URL: https://github.com/apache/spark/pull/34764#issuecomment-984309768 **[Test build #145836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145836/testReport)** for PR 34764 at commit [`66a2aaf`](https://github.com/apache/spark/commit/66a2aafc18c653e30c4c8d0442da810307bf9376). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34760: [SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the never changed 'var' to 'val'
SparkQA removed a comment on pull request #34760: URL: https://github.com/apache/spark/pull/34760#issuecomment-984238599 **[Test build #145835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145835/testReport)** for PR 34760 at commit [`7f57e4a`](https://github.com/apache/spark/commit/7f57e4aa01f57c0cf9bb91c353b5c46d51e8128a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34760: [SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the never changed 'var' to 'val'
SparkQA commented on pull request #34760: URL: https://github.com/apache/spark/pull/34760#issuecomment-984309442 **[Test build #145835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145835/testReport)** for PR 34760 at commit [`7f57e4a`](https://github.com/apache/spark/commit/7f57e4aa01f57c0cf9bb91c353b5c46d51e8128a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on a change in pull request #33896: [SPARK-33701][SHUFFLE] Adaptive shuffle merge finalization for push-based shuffle
venkata91 commented on a change in pull request #33896: URL: https://github.com/apache/spark/pull/33896#discussion_r760774868 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -3847,6 +3887,76 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti // Job successful ended. assert(results === Map(0 -> 11, 1 -> 12)) + } + + test("SPARK-33701: shuffle adaptive merge finalization") { +initPushBasedShuffleConfs(conf) +conf.set(config.PUSH_BASED_SHUFFLE_SIZE_MIN_SHUFFLE_SIZE_TO_WAIT, 10L) +conf.set(config.SHUFFLE_MERGER_LOCATIONS_MIN_STATIC_THRESHOLD, 3) +DAGSchedulerSuite.clearMergerLocs +DAGSchedulerSuite.addMergerLocs(Seq("host1", "host2", "host3", "host4", "host5")) +val parts = 2 + +val shuffleMapRdd1 = new MyRDD(sc, parts, Nil) +val shuffleDep1 = new ShuffleDependency(shuffleMapRdd1, new HashPartitioner(parts)) +val shuffleMapRdd2 = new MyRDD(sc, parts, Nil) +val shuffleDep2 = new ShuffleDependency(shuffleMapRdd2, new HashPartitioner(parts)) +val reduceRdd = new MyRDD(sc, parts, List(shuffleDep1, shuffleDep2), + tracker = mapOutputTracker) + +// Submit a reduce job that depends which will create a map stage +submit(reduceRdd, (0 until parts).toArray) + +val taskResults = taskSets(0).tasks.zipWithIndex.map { + case (_, idx) => +(Success, makeMapStatus("host" + ('A' + idx).toChar, parts)) +}.toSeq +// Remove MapStatus on one of the host before the stage ends to trigger +// a scenario where stage 0 needs to be resubmitted upon finishing all tasks. +// Merge finalization should not be scheduled in this case. +for ((result, i) <- taskResults.zipWithIndex) { + if (i == taskSets(0).tasks.size - 1) { +mapOutputTracker.removeOutputsOnHost("hostA") + } + if (i < taskSets(0).tasks.size) { +runEvent(makeCompletionEvent(taskSets(0).tasks(i), result._1, result._2)) + } +} +val shuffleStage1 = scheduler.stageIdToStage(0).asInstanceOf[ShuffleMapStage] +// Successfully completing the retry of stage 0. Merge finalization should be +// disabled + complete(taskSets(2), taskSets(2).tasks.zipWithIndex.map { + case (_, idx) => +(Success, makeMapStatus("host" + ('A' + idx).toChar, parts)) +}.toSeq) +assert(!shuffleStage1.shuffleDep.shuffleMergeEnabled) +// Verify finalize task is set with 0 delay and merge results not marked +// for registration due to shuffle size smaller than threshold +assert(shuffleStage1.shuffleDep.getFinalizeTask.nonEmpty) +val finalizeTask1 = shuffleStage1.shuffleDep.getFinalizeTask.get + .asInstanceOf[DummyScheduledFuture] +assert(finalizeTask1.delay == 0 && !finalizeTask1.registerMergeResults) + +complete(taskSets(1), taskSets(1).tasks.zipWithIndex.map { + case (_, idx) => +(Success, makeMapStatus("host" + ('A' + idx).toChar, parts, 10)) +}.toSeq) +val shuffleStage2 = scheduler.stageIdToStage(1).asInstanceOf[ShuffleMapStage] +// Verify finalize task is set with default delay of 10s and merge results are marked +// for registration +assert(shuffleStage2.shuffleDep.getFinalizeTask.nonEmpty) +val finalizeTask2 = shuffleStage2.shuffleDep.getFinalizeTask.get + .asInstanceOf[DummyScheduledFuture] +assert(finalizeTask2.delay == 10 && finalizeTask2.registerMergeResults) + +pushComplete(shuffleStage2.shuffleDep.shuffleId, 0, 0) +pushComplete(shuffleStage2.shuffleDep.shuffleId, 0, 1) + +assert(mapOutputTracker.getNumAvailableMergeResults(shuffleDep1.shuffleId) == parts) +assert(mapOutputTracker.getNumAvailableMergeResults(shuffleDep2.shuffleId) == parts) Review comment: @mridulm I just realized we cannot set `PUSH_BASED_SHUFFLE_MIN_PUSH_RATIO` < 1.0 as we can have some tasks still running but the minimum pushes would have completed causing stage completion (`processShuffleMapStageCompletion`), this would have cause retry of the stage as all of the map status is not available due to some tasks still running. Even though we have enough pushes completed, we still need to wait till all the tasks run successfully. We can have a check for `mapStage.isAvailable` as part of `handleShufflePushCompleted` to prevent it from scheduling shuffle merge finalization but then it ultimately comes to stage completion and schedule shuffle merge finalization anyway at that point. But in this case at least we don't need to wait for 10 secs or the default timeout as we have min pushes completed. Thoughts? ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -3847,6 +3887,76 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti // Job successful ended. assert(results === Map(0 -> 11, 1 -> 12)) + } + + test("SPARK-33701: shuffle adaptive merge
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
AmplabJenkins removed a comment on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984307640 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145841/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
SparkQA removed a comment on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984307312 **[Test build #145841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145841/testReport)** for PR 34776 at commit [`ab9a2d6`](https://github.com/apache/spark/commit/ab9a2d636294dea1f4e19ae4057d19a156269b6d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
AmplabJenkins commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984307640 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145841/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984307627 **[Test build #145841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145841/testReport)** for PR 34776 at commit [`ab9a2d6`](https://github.com/apache/spark/commit/ab9a2d636294dea1f4e19ae4057d19a156269b6d). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34774: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark
SparkQA commented on pull request #34774: URL: https://github.com/apache/spark/pull/34774#issuecomment-984307383 **[Test build #145843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145843/testReport)** for PR 34774 at commit [`b14db3d`](https://github.com/apache/spark/commit/b14db3d31491cdb85401046371613912b99b84dd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate
SparkQA commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984307445 **[Test build #145845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145845/testReport)** for PR 34701 at commit [`099d3a8`](https://github.com/apache/spark/commit/099d3a8919189fa0b6f0d10079e327d114b657b8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf
SparkQA commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984307362 **[Test build #145844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145844/testReport)** for PR 34757 at commit [`f6f06cc`](https://github.com/apache/spark/commit/f6f06cc45413e1633975c09de20f9011608b841a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34775: [SPARK-37511][DOCS][FOLLOW-UP] Fix documentation build warning from TimedeltaIndex
SparkQA commented on pull request #34775: URL: https://github.com/apache/spark/pull/34775#issuecomment-984307321 **[Test build #145842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145842/testReport)** for PR 34775 at commit [`4682604`](https://github.com/apache/spark/commit/4682604d7628afc5ab855a1cefb8d5bb8e64004d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
SparkQA commented on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984307292 **[Test build #145840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145840/testReport)** for PR 34777 at commit [`65cdaa5`](https://github.com/apache/spark/commit/65cdaa56167da11389701880c949386c6902e826). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34776: [SPARK-37512][PYTHON] Support TimedeltaIndex creation given a timedelta Series/Index
SparkQA commented on pull request #34776: URL: https://github.com/apache/spark/pull/34776#issuecomment-984307312 **[Test build #145841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145841/testReport)** for PR 34776 at commit [`ab9a2d6`](https://github.com/apache/spark/commit/ab9a2d636294dea1f4e19ae4057d19a156269b6d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
AmplabJenkins removed a comment on pull request #34659: URL: https://github.com/apache/spark/pull/34659#issuecomment-984306042 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145831/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
AmplabJenkins removed a comment on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984306043 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50312/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
AmplabJenkins commented on pull request #34659: URL: https://github.com/apache/spark/pull/34659#issuecomment-984306042 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145831/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command
AmplabJenkins commented on pull request #34667: URL: https://github.com/apache/spark/pull/34667#issuecomment-984306043 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50312/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadikovi commented on pull request #34777: [SPARK-37326][SQL][FOLLOW-UP] Update code and tests for TimestampNTZ support in CSV data source
sadikovi commented on pull request #34777: URL: https://github.com/apache/spark/pull/34777#issuecomment-984304694 cc @MaxGekk @cloud-fan @gengliangwang for review. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org