[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r647164105 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -791,4 +791,28 @@ class AnalysisErrorSuite extends AnalysisTest { assertAnalysisError(plan, s"Correlated column is not allowed in predicate ($msg)" :: Nil) } } + + test("SPARK-35618: Resolve star expressions in subquery") { Review comment: Yes, currently only `Filter` can host outer references for correlated subqueries, and star expansion only happens when the node is either a `Project` or `Aggregate` (buildExpandedProjectList). It will be clearer with lateral subquery examples: ```sql // t: [a, b] SELECT * FROM t, LATERAL (SELECT t.*) // <--- t.* should be resolved as t.a, t.b ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-856505385 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43975/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
SparkQA commented on pull request #32769: URL: https://github.com/apache/spark/pull/32769#issuecomment-856505160 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on pull request #32653: URL: https://github.com/apache/spark/pull/32653#issuecomment-856504425 retest this, please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32653: [SPARK-35312][SS] Introduce new Option in Kafka source to specify minimum number of records to read per trigger
HeartSaVioR commented on a change in pull request #32653: URL: https://github.com/apache/spark/pull/32653#discussion_r647159789 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala ## @@ -95,15 +114,65 @@ private[kafka010] class KafkaMicroBatchStream( override def latestOffset(start: Offset, readLimit: ReadLimit): Offset = { val startPartitionOffsets = start.asInstanceOf[KafkaSourceOffset].partitionToOffsets latestPartitionOffsets = kafkaOffsetReader.fetchLatestOffsets(Some(startPartitionOffsets)) -endPartitionOffsets = KafkaSourceOffset(readLimit match { - case rows: ReadMaxRows => -rateLimit(rows.maxRows(), startPartitionOffsets, latestPartitionOffsets) - case _: ReadAllAvailable => -latestPartitionOffsets -}) + +val limits: Seq[ReadLimit] = readLimit match { + case rows: CompositeReadLimit => rows.getReadLimits + case rows => Seq(rows) +} + +val offsets = if (limits.exists(_.isInstanceOf[ReadAllAvailable])) { + // ReadAllAvailable has the highest priority + latestPartitionOffsets +} else { + val lowerLimit = limits.find(_.isInstanceOf[ReadMinRows]).map(_.asInstanceOf[ReadMinRows]) + val upperLimit = limits.find(_.isInstanceOf[ReadMaxRows]).map(_.asInstanceOf[ReadMaxRows]) + + lowerLimit.flatMap { limit => +// checking if we need to skip batch based on minOffsetPerTrigger criteria +val skipBatch = delayBatch( + limit.minRows, latestPartitionOffsets, startPartitionOffsets, limit.maxTriggerDelayMs) +if (skipBatch) { + logDebug( +s"Delaying batch as number of records available is less than minOffsetsPerTrigger") + Some(startPartitionOffsets) +} else { + None +} + }.orElse { +// checking if we need to adjust a range of offsets based on maxOffsetPerTrigger criteria +upperLimit.map { limit => + rateLimit(limit.maxRows(), startPartitionOffsets, latestPartitionOffsets) +} + }.getOrElse(latestPartitionOffsets) +} + +endPartitionOffsets = KafkaSourceOffset(offsets) endPartitionOffsets } + /** Checks if we need to skip this trigger based on minOffsetsPerTrigger & maxTriggerDelay */ + private def delayBatch( + minLimit: Long, + latestOffsets: Map[TopicPartition, Long], + currentOffsets: Map[TopicPartition, Long], + maxTriggerDelayMs: Long): Boolean = { +// Checking first if the maxbatchDelay time has passed Review comment: nit: It won't hurt if we only call `System.currentTimeMillis()` once and reuse it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
HyukjinKwon commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856501314 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
SparkQA commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856498062 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856496277 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection
SparkQA commented on pull request #32815: URL: https://github.com/apache/spark/pull/32815#issuecomment-856495276 **[Test build #139457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139457/testReport)** for PR 32815 at commit [`fa56cf7`](https://github.com/apache/spark/commit/fa56cf7223f1ea7f4342e9335e91b80182097be9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
maropu commented on a change in pull request #32769: URL: https://github.com/apache/spark/pull/32769#discussion_r647153040 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala ## @@ -42,9 +42,27 @@ case class ExpandExec( override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) - // The GroupExpressions can output data with arbitrary partitioning, so set it - // as UNKNOWN partitioning - override def outputPartitioning: Partitioning = UnknownPartitioning(0) + /** + * The Expand is commonly introduced by the RewriteDistinctAggregates optimizer rule. + * In that case there can be several attributes that are kept as they are by the Expand. + * If the child's output is partitioned by those attributes, then so will be + * the output of the Expand. + * In general case the Expand can output data with arbitrary partitioning, so set it + * as UNKNOWN partitioning. + */ + override def outputPartitioning: Partitioning = { +val stableAttrs = ExpressionSet(output.zipWithIndex.filter { + case (attr, i) => projections.forall(_(i).semanticEquals(attr)) +}.map(_._1)) + +child.outputPartitioning match { Review comment: Ah, I see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32788: [SPARK-35602][SS] Update state schema to be able to accept long length JSON
AmplabJenkins removed a comment on pull request #32788: URL: https://github.com/apache/spark/pull/32788#issuecomment-856493899 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32788: [SPARK-35602][SS] Update state schema to be able to accept long length JSON
SparkQA commented on pull request #32788: URL: https://github.com/apache/spark/pull/32788#issuecomment-856493864 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32788: [SPARK-35602][SS] Update state schema to be able to accept long length JSON
AmplabJenkins commented on pull request #32788: URL: https://github.com/apache/spark/pull/32788#issuecomment-856493899 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection
ulysses-you opened a new pull request #32815: URL: https://github.com/apache/spark/pull/32815 ### What changes were proposed in this pull request? Add `PartitioningCollection` in EnsureRequirements during remove shuffle. ### Why are the changes needed? Currently `EnsureRequirements` only check if child has semantic equal `HashPartitioning` and remove redundant shuffle. We can enhance this case using `PartitioningCollection`. ### Does this PR introduce _any_ user-facing change? Yes, plan might be changed. ### How was this patch tested? Add test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
tanelk commented on a change in pull request #32769: URL: https://github.com/apache/spark/pull/32769#discussion_r647151531 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala ## @@ -42,9 +42,27 @@ case class ExpandExec( override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) - // The GroupExpressions can output data with arbitrary partitioning, so set it - // as UNKNOWN partitioning - override def outputPartitioning: Partitioning = UnknownPartitioning(0) + /** + * The Expand is commonly introduced by the RewriteDistinctAggregates optimizer rule. + * In that case there can be several attributes that are kept as they are by the Expand. + * If the child's output is partitioned by those attributes, then so will be + * the output of the Expand. + * In general case the Expand can output data with arbitrary partitioning, so set it + * as UNKNOWN partitioning. + */ + override def outputPartitioning: Partitioning = { +val stableAttrs = ExpressionSet(output.zipWithIndex.filter { + case (attr, i) => projections.forall(_(i).semanticEquals(attr)) +}.map(_._1)) + +child.outputPartitioning match { Review comment: The `ProjectExec`, that was inserted after join, managed to simplify it to `HashPartitioning` for the cases I could think of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eejbyfeldt commented on a change in pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
eejbyfeldt commented on a change in pull request #32783: URL: https://github.com/apache/spark/pull/32783#discussion_r647149051 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -1181,21 +1176,25 @@ case class CatalystToExternalMap private( newMapBuilderMethod.invoke(moduleField).asInstanceOf[Builder[AnyRef, AnyRef]] } + private def keyValueIterator(md: MapData): Iterator[AnyRef] = { +val keyArray = md.keyArray() +val valueArray = md.valueArray() +val row = new GenericInternalRow(1) +0.until(md.numElements()).iterator.map { i => + row.update(0, keyArray.get(i, inputMapType.keyType)) + val key = keyLambdaFunction.eval(row) + row.update(0, valueArray.get(i, inputMapType.valueType)) + val value = valueLambdaFunction.eval(row) + Tuple2(key, value) +} + } + override def eval(input: InternalRow): Any = { val result = inputData.eval(input).asInstanceOf[MapData] if (result != null) { val builder = newMapBuilder() builder.sizeHint(result.numElements()) - val keyArray = result.keyArray() - val valueArray = result.valueArray() - var i = 0 - while (i < result.numElements()) { Review comment: I guess I am not sure whether it does or not. But I updated the PR to use a while instead, to be sure. Now the style is also more similar to what was there before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
maropu commented on a change in pull request #32783: URL: https://github.com/apache/spark/pull/32783#discussion_r647150106 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -1181,21 +1176,25 @@ case class CatalystToExternalMap private( newMapBuilderMethod.invoke(moduleField).asInstanceOf[Builder[AnyRef, AnyRef]] } + private def keyValueIterator(md: MapData): Iterator[AnyRef] = { +val keyArray = md.keyArray() +val valueArray = md.valueArray() +val row = new GenericInternalRow(1) +0.until(md.numElements()).iterator.map { i => + row.update(0, keyArray.get(i, inputMapType.keyType)) + val key = keyLambdaFunction.eval(row) + row.update(0, valueArray.get(i, inputMapType.valueType)) + val value = valueLambdaFunction.eval(row) + Tuple2(key, value) +} + } + override def eval(input: InternalRow): Any = { val result = inputData.eval(input).asInstanceOf[MapData] if (result != null) { val builder = newMapBuilder() builder.sizeHint(result.numElements()) - val keyArray = result.keyArray() - val valueArray = result.valueArray() - var i = 0 - while (i < result.numElements()) { Review comment: The latest one looks fine. Thanks ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
AmplabJenkins removed a comment on pull request #32783: URL: https://github.com/apache/spark/pull/32783#issuecomment-855167204 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eejbyfeldt commented on a change in pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
eejbyfeldt commented on a change in pull request #32783: URL: https://github.com/apache/spark/pull/32783#discussion_r647149051 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -1181,21 +1176,25 @@ case class CatalystToExternalMap private( newMapBuilderMethod.invoke(moduleField).asInstanceOf[Builder[AnyRef, AnyRef]] } + private def keyValueIterator(md: MapData): Iterator[AnyRef] = { +val keyArray = md.keyArray() +val valueArray = md.valueArray() +val row = new GenericInternalRow(1) +0.until(md.numElements()).iterator.map { i => + row.update(0, keyArray.get(i, inputMapType.keyType)) + val key = keyLambdaFunction.eval(row) + row.update(0, valueArray.get(i, inputMapType.valueType)) + val value = valueLambdaFunction.eval(row) + Tuple2(key, value) +} + } + override def eval(input: InternalRow): Any = { val result = inputData.eval(input).asInstanceOf[MapData] if (result != null) { val builder = newMapBuilder() builder.sizeHint(result.numElements()) - val keyArray = result.keyArray() - val valueArray = result.valueArray() - var i = 0 - while (i < result.numElements()) { Review comment: I guess I am not sure whether it does or not. But I updated the PR to use a while instead, to be sure. Now it the style is also more similar to what was there before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
SparkQA commented on pull request #32783: URL: https://github.com/apache/spark/pull/32783#issuecomment-856490236 **[Test build #139456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139456/testReport)** for PR 32783 at commit [`9cc2484`](https://github.com/apache/spark/commit/9cc2484600374022bb76a95039e22a8c232a4700). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
SparkQA commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856485101 **[Test build #139455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139455/testReport)** for PR 32805 at commit [`70b86fc`](https://github.com/apache/spark/commit/70b86fc2277ca2246e526b57db091853be166d98). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856485015 **[Test build #139454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139454/testReport)** for PR 32807 at commit [`8e53b88`](https://github.com/apache/spark/commit/8e53b88de26915038a5aa10dddece692eb33efad). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
AmplabJenkins removed a comment on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856483830 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139443/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
AmplabJenkins commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856483830 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139443/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
AmplabJenkins removed a comment on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856483016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139453/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-856483304 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43975/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
AmplabJenkins removed a comment on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856483018 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43966/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
SparkQA removed a comment on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856427751 **[Test build #139443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139443/testReport)** for PR 32812 at commit [`7d30f36`](https://github.com/apache/spark/commit/7d30f368a9c0543b7dad698b13f481902a83534e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides
AmplabJenkins removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-856483017 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139440/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
AmplabJenkins commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856483018 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43966/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
SparkQA commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856482909 **[Test build #139443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139443/testReport)** for PR 32812 at commit [`7d30f36`](https://github.com/apache/spark/commit/7d30f368a9c0543b7dad698b13f481902a83534e). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
AmplabJenkins commented on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856483016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139453/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides
AmplabJenkins commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-856483017 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139440/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
SparkQA commented on pull request #32769: URL: https://github.com/apache/spark/pull/32769#issuecomment-856482578 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 137alpha commented on pull request #32813: [SPARK-34591][MLLIB] Disable decision tree pruning
137alpha commented on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-856481729 @srowen @asolimando @sethah Tagging you as you were heavily contributors to the original pull request which this bugfix undoes (https://github.com/apache/spark/pull/20632). Deeply grateful for your input and attention here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
allisonwang-db commented on a change in pull request #32303: URL: https://github.com/apache/spark/pull/32303#discussion_r647142502 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -871,7 +871,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) { val from = ctx.relation.asScala.foldLeft(null: LogicalPlan) { (left, relation) => val right = plan(relation.relationPrimary) - val join = right.optionalMap(left)(Join(_, _, Inner, None, JoinHint.NONE)) + val join = right.optionalMap(left) { (left, right) => +if (relation.LATERAL != null) { + LateralJoin(left, LateralSubquery(right), Inner, None) Review comment: Actually, does it make sense to have join hints for lateral joins? A lateral join is essentially a nested loop join. Ideally, the evaluation logic should be for each row in the left, plug the outer query attribute values into the outer references and evaluate the subquery. So it should only be planned as a (correlated) nested loop join. But since Spark doesn't support such execution, it first decorrelates the subquery and then rewrites the lateral join as a normal join. This seems to be an implementation detail and it doesn't make sense to add join hints here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
sarutak commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856476375 retest this please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
SparkQA commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856475690 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32814: [SPARK-35664][SQL] Support java.time.LocalDateTime as an external type of TimestampWithoutTZ type
SparkQA commented on pull request #32814: URL: https://github.com/apache/spark/pull/32814#issuecomment-856474722 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43970/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856474653 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43971/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32788: [SPARK-35602][SS] Update state schema to be able to accept long length JSON
SparkQA commented on pull request #32788: URL: https://github.com/apache/spark/pull/32788#issuecomment-856472535 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides
SparkQA removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-856389959 **[Test build #139440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139440/testReport)** for PR 32513 at commit [`83d2710`](https://github.com/apache/spark/commit/83d27106a3e286547550c77f274ccdc5c4226391). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
SparkQA removed a comment on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856455317 **[Test build #139453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139453/testReport)** for PR 32795 at commit [`0bea115`](https://github.com/apache/spark/commit/0bea115293fa128ea2a36889e4721438181a0465). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
SparkQA commented on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856469089 **[Test build #139453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139453/testReport)** for PR 32795 at commit [`0bea115`](https://github.com/apache/spark/commit/0bea115293fa128ea2a36889e4721438181a0465). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute commands in QueryExecution instead of caller sides
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-856469129 **[Test build #139440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139440/testReport)** for PR 32513 at commit [`83d2710`](https://github.com/apache/spark/commit/83d27106a3e286547550c77f274ccdc5c4226391). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
maropu commented on a change in pull request #32769: URL: https://github.com/apache/spark/pull/32769#discussion_r647131837 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala ## @@ -42,9 +42,27 @@ case class ExpandExec( override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) - // The GroupExpressions can output data with arbitrary partitioning, so set it - // as UNKNOWN partitioning - override def outputPartitioning: Partitioning = UnknownPartitioning(0) + /** + * The Expand is commonly introduced by the RewriteDistinctAggregates optimizer rule. + * In that case there can be several attributes that are kept as they are by the Expand. + * If the child's output is partitioned by those attributes, then so will be + * the output of the Expand. + * In general case the Expand can output data with arbitrary partitioning, so set it + * as UNKNOWN partitioning. + */ + override def outputPartitioning: Partitioning = { +val stableAttrs = ExpressionSet(output.zipWithIndex.filter { + case (attr, i) => projections.forall(_(i).semanticEquals(attr)) +}.map(_._1)) + +child.outputPartitioning match { Review comment: we cannot use the shuffled join path for tests? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledJoin.scala#L48-L49 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32659: [SPARK-22639][SQL] Support aggregate cbo stats estimation if the group by clause involves substring
maropu commented on a change in pull request #32659: URL: https://github.com/apache/spark/pull/32659#discussion_r647121080 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala ## @@ -80,6 +80,54 @@ object EstimationUtils { expressions.collect { case alias @ Alias(attr: Attribute, _) if attributeStats.contains(attr) => alias.toAttribute -> attributeStats(attr) + case alias @ Alias(expn: Expression, _) if isExpressionStatsExist(expn, attributeStats) => +getExpressionStats(alias.toAttribute, expn, attributeStats) +} + } + + // Support for substring expressions. + // TODO: Support for more expressions like Multiply. + private def isExpressionStatsExist( + expn: Expression, Review comment: `expn` -> `expr` ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala ## @@ -80,6 +80,54 @@ object EstimationUtils { expressions.collect { case alias @ Alias(attr: Attribute, _) if attributeStats.contains(attr) => alias.toAttribute -> attributeStats(attr) + case alias @ Alias(expn: Expression, _) if isExpressionStatsExist(expn, attributeStats) => +getExpressionStats(alias.toAttribute, expn, attributeStats) +} + } + + // Support for substring expressions. + // TODO: Support for more expressions like Multiply. Review comment: Why do we need to handle individual exprs here? For aggregate stat estimation, we cannot just use upper-bould stat values from a child plan in `AggregateEstimation`? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala ## @@ -80,6 +80,54 @@ object EstimationUtils { expressions.collect { case alias @ Alias(attr: Attribute, _) if attributeStats.contains(attr) => alias.toAttribute -> attributeStats(attr) + case alias @ Alias(expn: Expression, _) if isExpressionStatsExist(expn, attributeStats) => Review comment: Why did you update this method instead of `AggregateEstimation`? `Project` uses this method though. Is this related to projections? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
SparkQA commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856463079 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43966/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
AmplabJenkins removed a comment on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856458336 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43967/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
SparkQA commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856458323 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43967/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
AmplabJenkins commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856458336 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43967/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
AmplabJenkins removed a comment on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856456977 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
SparkQA commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856456954 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
AmplabJenkins commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856456977 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
SparkQA commented on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856455317 **[Test build #139453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139453/testReport)** for PR 32795 at commit [`0bea115`](https://github.com/apache/spark/commit/0bea115293fa128ea2a36889e4721438181a0465). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
AmplabJenkins removed a comment on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856454817 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139444/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
AmplabJenkins commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856454817 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139444/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
SparkQA removed a comment on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856427834 **[Test build #139444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139444/testReport)** for PR 32805 at commit [`8fedbd5`](https://github.com/apache/spark/commit/8fedbd5ebe7807a6bd9b774da5d93d6e3924f67a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
SparkQA commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856454449 **[Test build #139444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139444/testReport)** for PR 32805 at commit [`8fedbd5`](https://github.com/apache/spark/commit/8fedbd5ebe7807a6bd9b774da5d93d6e3924f67a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
AmplabJenkins removed a comment on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856454137 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139448/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA removed a comment on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856451502 **[Test build #139448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139448/testReport)** for PR 32807 at commit [`8e53b88`](https://github.com/apache/spark/commit/8e53b88de26915038a5aa10dddece692eb33efad). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856454109 **[Test build #139448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139448/testReport)** for PR 32807 at commit [`8e53b88`](https://github.com/apache/spark/commit/8e53b88de26915038a5aa10dddece692eb33efad). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
AmplabJenkins commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856454137 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139448/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32795: [SPARK-35588][PYTHON][DOCS] Update quickstart.ipynb to use pyspark.pandas
AmplabJenkins removed a comment on pull request #32795: URL: https://github.com/apache/spark/pull/32795#issuecomment-856433282 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43965/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #32791: [SPARK-34290][SQL][FOLLOWUP] Cleanup truncate table not supported for V2Table error
yaooqinn commented on pull request #32791: URL: https://github.com/apache/spark/pull/32791#issuecomment-856452387 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn closed pull request #32791: [SPARK-34290][SQL][FOLLOWUP] Cleanup truncate table not supported for V2Table error
yaooqinn closed pull request #32791: URL: https://github.com/apache/spark/pull/32791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-856451925 **[Test build #139452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139452/testReport)** for PR 32303 at commit [`d646720`](https://github.com/apache/spark/commit/d646720edf977ea50ac8f273eea75b045915038e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
SparkQA commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856451650 **[Test build #139450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139450/testReport)** for PR 32786 at commit [`ee147ec`](https://github.com/apache/spark/commit/ee147ec2657a1229ea235d71b7d508cae5a83a08). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
SparkQA commented on pull request #32769: URL: https://github.com/apache/spark/pull/32769#issuecomment-856451723 **[Test build #139451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139451/testReport)** for PR 32769 at commit [`b3475f4`](https://github.com/apache/spark/commit/b3475f4f42cbe5a513fa6496e2d5c5d8b156b350). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32788: [SPARK-35602][SS] Update state schema to be able to accept long length JSON
SparkQA commented on pull request #32788: URL: https://github.com/apache/spark/pull/32788#issuecomment-856451620 **[Test build #139449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139449/testReport)** for PR 32788 at commit [`d660573`](https://github.com/apache/spark/commit/d66057348750553e438cc48faa962f545bfe2ca9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32814: [SPARK-35664][SQL] Support java.time.LocalDateTime as an external type of TimestampWithoutTZ type
SparkQA commented on pull request #32814: URL: https://github.com/apache/spark/pull/32814#issuecomment-856451528 **[Test build #139447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139447/testReport)** for PR 32814 at commit [`1101f55`](https://github.com/apache/spark/commit/1101f5550f7dd032fec39a211cb7e4bc345e565b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
SparkQA commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856451502 **[Test build #139448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139448/testReport)** for PR 32807 at commit [`8e53b88`](https://github.com/apache/spark/commit/8e53b88de26915038a5aa10dddece692eb33efad). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32785: [SPARK-35601][PYTHON] Support arithmetic operations against bool literals
AmplabJenkins removed a comment on pull request #32785: URL: https://github.com/apache/spark/pull/32785#issuecomment-856450035 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43963/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
AmplabJenkins removed a comment on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856450037 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139445/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
AmplabJenkins removed a comment on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856450034 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
AmplabJenkins commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856450037 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139445/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
AmplabJenkins commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856450034 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32785: [SPARK-35601][PYTHON] Support arithmetic operations against bool literals
AmplabJenkins commented on pull request #32785: URL: https://github.com/apache/spark/pull/32785#issuecomment-856450035 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43963/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
maropu commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r647115051 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -791,4 +791,28 @@ class AnalysisErrorSuite extends AnalysisTest { assertAnalysisError(plan, s"Correlated column is not allowed in predicate ($msg)" :: Nil) } } + + test("SPARK-35618: Resolve star expressions in subquery") { Review comment: I read the PR description and I thought this PR is to accept new query patterns, but this PR only has the negative test cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32812: [SPARK-35636][PYTHON][DOCS][FOLLOW-UP] Restructure reference API files according to the layout
SparkQA commented on pull request #32812: URL: https://github.com/apache/spark/pull/32812#issuecomment-856445369 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43966/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] q2w commented on a change in pull request #32766: [SPARK-35627][CORE] Decommission executors in batches to not overload network bandwidth
q2w commented on a change in pull request #32766: URL: https://github.com/apache/spark/pull/32766#discussion_r647114128 ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -519,10 +558,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp scheduler.sc.env.blockManager.master.decommissionBlockManagers(executorsToDecommission) if (!triggeredByExecutor) { - executorsToDecommission.foreach { executorId => -logInfo(s"Notify executor $executorId to decommissioning.") -executorDataMap(executorId).executorEndpoint.send(DecommissionExecutor) - } Review comment: No, i haven't seen this in a public cloud. We have some experience with this issue in private cloud which had bigger timeout for forceful node removal and this was the motive of this PR to give some control of decommissioning process to user. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
allisonwang-db commented on a change in pull request #32303: URL: https://github.com/apache/spark/pull/32303#discussion_r647113231 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -871,7 +871,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg override def visitFromClause(ctx: FromClauseContext): LogicalPlan = withOrigin(ctx) { val from = ctx.relation.asScala.foldLeft(null: LogicalPlan) { (left, relation) => val right = plan(relation.relationPrimary) - val join = right.optionalMap(left)(Join(_, _, Inner, None, JoinHint.NONE)) + val join = right.optionalMap(left) { (left, right) => +if (relation.LATERAL != null) { + LateralJoin(left, LateralSubquery(right), Inner, None) Review comment: Good point. I will add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32805: [SPARK-35666][ML] gemv skip array shape checking
SparkQA commented on pull request #32805: URL: https://github.com/apache/spark/pull/32805#issuecomment-856442577 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43967/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries
allisonwang-db commented on a change in pull request #32303: URL: https://github.com/apache/spark/pull/32303#discussion_r647112288 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ## @@ -315,19 +314,28 @@ object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper case ListQuery(sub, children, exprId, childOutputs, conditions) if children.nonEmpty => val (newPlan, newCond) = pullOutCorrelatedPredicates(sub, outerPlans) ListQuery(newPlan, children, exprId, childOutputs, getJoinCondition(newCond, conditions)) + case LateralSubquery(sub, children, exprId, conditions) if children.nonEmpty => +val (newPlan, newCond) = decorrelate(sub, outerPlans) +LateralSubquery(newPlan, children, exprId, getJoinCondition(newCond, conditions)) } } /** * Pull up the correlated predicates and rewrite all subqueries in an operator tree.. */ def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning( -_.containsAnyPattern(SCALAR_SUBQUERY, EXISTS_SUBQUERY, LIST_SUBQUERY)) { +_.containsPattern(PLAN_EXPRESSION)) { case f @ Filter(_, a: Aggregate) => rewriteSubQueries(f, Seq(a, a.child)) -// Only a few unary nodes (Project/Filter/Aggregate) can contain subqueries. +// Only a few unary nodes (Project/Filter/Aggregate/LateralJoin) can contain subqueries. case q: UnaryNode => - rewriteSubQueries(q, q.children) + val newPlan = rewriteSubQueries(q, q.children) + // Preserve the original output of the node. + if (newPlan.output != q.output) { Review comment: Nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
SparkQA commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856441509 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43968/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 137alpha commented on pull request #32813: [SPARK-34591][MLLIB] Disable decision tree pruning
137alpha commented on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-856441296 Hello, I am the author of the Jira ticket https://issues.apache.org/jira/browse/SPARK-34591. In my view, the behaviour described in the ticket is a serious problem - it makes the DecisionTreeClassifier and the RandomForestClassifier seriously unreliable for probability estimation problems for Spark 2.4.0 and all later versions. Additionally, the original implementation of the feature did not update the Spark ML documentation to describe this non-standard modification to the tree algorithm. The only way I could trace the behaviour (given that it was in conflict with the Spark documentation) was to examine every Jira ticket referenced in the release notes after Spark 2.3.0 (where I knew this problem did not exist) to identify ones that might be responsible. In my own experience, I have three clients which have been directly affected by this issue. The Jira ticket gives a minimal example with "maximally worst" behaviour - a tree that is pruned (outside the user's control) so that there are no splits at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
cloud-fan commented on a change in pull request #32807: URL: https://github.com/apache/spark/pull/32807#discussion_r647111792 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -699,20 +699,25 @@ abstract class PushableColumnBase { def unapply(e: Expression): Option[String] = { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper -def helper(e: Expression): Option[Seq[String]] = e match { - case a: Attribute => -// Attribute that contains dot "." in name is supported only when -// nested predicate pushdown is enabled. -if (nestedPredicatePushdownEnabled || !a.name.contains(".")) { - Some(Seq(a.name)) -} else { - None -} - case s: GetStructField if nestedPredicatePushdownEnabled => -helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) - case _ => None +if (nestedPredicatePushdownEnabled) { Review comment: cc @dbtsai @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
cloud-fan commented on a change in pull request #32807: URL: https://github.com/apache/spark/pull/32807#discussion_r647111703 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -699,20 +699,25 @@ abstract class PushableColumnBase { def unapply(e: Expression): Option[String] = { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper -def helper(e: Expression): Option[Seq[String]] = e match { - case a: Attribute => -// Attribute that contains dot "." in name is supported only when -// nested predicate pushdown is enabled. -if (nestedPredicatePushdownEnabled || !a.name.contains(".")) { - Some(Seq(a.name)) -} else { - None -} - case s: GetStructField if nestedPredicatePushdownEnabled => -helper(s.child).map(_ :+ s.childSchema(s.ordinal).name) - case _ => None +if (nestedPredicatePushdownEnabled) { Review comment: note that: 1. nestedPredicatePushdownEnabled is always enabled for DS v2 (by default) 2. nestedPredicatePushdownEnabled is never enabled for DS v1 3. nestedPredicatePushdownEnabled is only enabled for file source parquet and orc (by default) After changing the quoting logic: 1. DS v1 is not affected 2. file source is builtin so we are fine 3. DS v2 will be affected if the column name contains special chars. Personally, I think the new quoting behavior is better (more ANSI SQL), and most v2 implementations won't be affected as they already need to deal with quoted names. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
tanelk commented on a change in pull request #32769: URL: https://github.com/apache/spark/pull/32769#discussion_r647111719 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala ## @@ -42,9 +42,27 @@ case class ExpandExec( override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) - // The GroupExpressions can output data with arbitrary partitioning, so set it - // as UNKNOWN partitioning - override def outputPartitioning: Partitioning = UnknownPartitioning(0) + /** + * The Expand is commonly introduced by the RewriteDistinctAggregates optimizer rule. + * In that case there can be several attributes that are kept as they are by the Expand. + * If the child's output is partitioned by those attributes, then so will be + * the output of the Expand. + * In general case the Expand can output data with arbitrary partitioning, so set it + * as UNKNOWN partitioning. + */ + override def outputPartitioning: Partitioning = { +val stableAttrs = ExpressionSet(output.zipWithIndex.filter { + case (attr, i) => projections.forall(_(i).semanticEquals(attr)) +}.map(_._1)) + +child.outputPartitioning match { Review comment: Added that case, but I was not able to construct a test case for it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
maropu commented on pull request #32783: URL: https://github.com/apache/spark/pull/32783#issuecomment-856440404 cc: @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
maropu commented on pull request #32783: URL: https://github.com/apache/spark/pull/32783#issuecomment-856440290 Nice catch, @eejbyfeldt and thank you for your contribution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-856440257 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32783: [SPARK-35653][SQL] Fix CatalystToExternalMap interpreted path fails for Map with case classes as keys or values
maropu commented on a change in pull request #32783: URL: https://github.com/apache/spark/pull/32783#discussion_r647110803 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -1181,21 +1176,25 @@ case class CatalystToExternalMap private( newMapBuilderMethod.invoke(moduleField).asInstanceOf[Builder[AnyRef, AnyRef]] } + private def keyValueIterator(md: MapData): Iterator[AnyRef] = { +val keyArray = md.keyArray() +val valueArray = md.valueArray() +val row = new GenericInternalRow(1) +0.until(md.numElements()).iterator.map { i => + row.update(0, keyArray.get(i, inputMapType.keyType)) + val key = keyLambdaFunction.eval(row) + row.update(0, valueArray.get(i, inputMapType.valueType)) + val value = valueLambdaFunction.eval(row) + Tuple2(key, value) +} + } + override def eval(input: InternalRow): Any = { val result = inputData.eval(input).asInstanceOf[MapData] if (result != null) { val builder = newMapBuilder() builder.sizeHint(result.numElements()) - val keyArray = result.keyArray() - val valueArray = result.valueArray() - var i = 0 - while (i < result.numElements()) { Review comment: We tend to use `while` for perf-intensive code. The proposed code does not cause any perf overhead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #32814: [SPARK-35664][SQL] Support java.time. LocalDateTime as an external type of TimestampWithoutTZ type
gengliangwang opened a new pull request #32814: URL: https://github.com/apache/spark/pull/32814 ### What changes were proposed in this pull request? In the PR, I propose to extend Spark SQL API to accept java.time.LocalDateTime as an external type of recently added new Catalyst type - TimestampWithoutTZ. The Java class java.time.LocalDateTime has similar semantic to ANSI SQL timestamp without timezone type, and it is the most suitable to be an external type for TimestampWithoutTZType. In more details: * Added TimestampWithoutTZConverter which converts java.time.LocalDateTime instances to/from internal representation of the Catalyst type TimestampWithoutTZType (to Long type). The TimestampWithoutTZConverter object uses new methods of DateTimeUtils: * localDateTimeToMicros() converts the input date time to the total length in microseconds. * microsToLocalDateTime() obtains a java.time.LocalDateTime * Support new type TimestampWithoutTZType in RowEncoder via the methods createDeserializerForLocalDateTime() and createSerializerForLocalDateTime(). * Extended the Literal API to construct literals from java.time.LocalDateTime instances. ### Why are the changes needed? To allow users parallelization of java.time.LocalDateTime collections, and construct timestamp without time zone columns. Also to collect such columns back to the driver side. ### Does this PR introduce _any_ user-facing change? The PR extends existing functionality. So, users can parallelize instances of the java.time.LocalDateTime class and collect them back. ### How was this patch tested? New unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
SparkQA commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856439833 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
SparkQA removed a comment on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856427821 **[Test build #139445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139445/testReport)** for PR 32804 at commit [`ec9e127`](https://github.com/apache/spark/commit/ec9e127caa335ae1512714b61f2a1a8e5e67392a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint
SparkQA commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-856436747 **[Test build #139445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139445/testReport)** for PR 32804 at commit [`ec9e127`](https://github.com/apache/spark/commit/ec9e127caa335ae1512714b61f2a1a8e5e67392a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #32769: [SPARK-35630][SQL] ExpandExec should not introduce unnecessary exchanges
tanelk commented on a change in pull request #32769: URL: https://github.com/apache/spark/pull/32769#discussion_r647107606 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -4003,6 +4003,56 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } checkAnswer(sql(s"select /*+ REPARTITION(3, a) */ a b from values('123') t(a)"), Row("123")) } + + test("SPARK-35630: ExpandExec should not introduce unnecessary exchanges") { +withTable("test_table") { + spark.range(11) +.withColumn("group1", $"id" % 2) +.withColumn("group2", $"id" % 4) +.withColumn("a", $"id" % 3) +.withColumn("b", $"id" % 6) +.write.saveAsTable("test_table") Review comment: I simplified it even a bit further -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.
sarutak commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-856436191 retest this please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown
sarutak edited a comment on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-856411972 #31964 is mostly for the display format of queries so we can revert it safely and I don't mind. But even if we revert it, the potential problem is present isn't it? This problem happens with ```col("`a``b`")``` even before #31964. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org