[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
AmplabJenkins commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751986676 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38057/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AmplabJenkins commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-751986674 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38061/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AmplabJenkins commented on pull request #30958: URL: https://github.com/apache/spark/pull/30958#issuecomment-751986675 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38058/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
SparkQA commented on pull request #30955: URL: https://github.com/apache/spark/pull/30955#issuecomment-751985919 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38064/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751985845 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38063/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30905: [SPARK-33890][SQL] Improve the implement of trim/trimleft/trimright
cloud-fan commented on a change in pull request #30905: URL: https://github.com/apache/spark/pull/30905#discussion_r549605431 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -756,6 +756,54 @@ trait String2TrimExpression extends Expression with ImplicitCastInputTypes { override def nullable: Boolean = children.exists(_.nullable) override def foldable: Boolean = children.forall(_.foldable) + protected def doEval(srcString: UTF8String): UTF8String + protected def doEval(srcString: UTF8String, trimString: UTF8String): UTF8String + + override def eval(input: InternalRow): Any = { +val srcString = srcStr.eval(input).asInstanceOf[UTF8String] +if (srcString == null) { + null +} else if (trimStr.isDefined) { + doEval(srcString, trimStr.get.eval(input).asInstanceOf[UTF8String]) +} else { + doEval(srcString) +} + } + + protected val trimMethod: String + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evals = children.map(_.genCode(ctx)) +val srcString = evals(0) + +if (evals.length == 1) { + ev.copy(code = evals.map(_.code) :+ +code""" + |boolean ${ev.isNull} = false; + |UTF8String ${ev.value} = null; + |if (${srcString.isNull}) { + | ${ev.isNull} = true; + |} else { + | ${ev.value} = ${srcString.value}.$trimMethod(); + |} +""") +} else { + val trimString = evals(1) + ev.copy(code = evals.map(_.code) :+ Review comment: We can skip evaluating trim string if possible ``` ev.copy(code = code""" |${evals.head.code} |if (${srcString.isNull}) { | ... |} else { | ${trimString.code} | if (${trimString.isNull}) ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
SparkQA commented on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-751985404 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38065/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30905: [SPARK-33890][SQL] Improve the implement of trim/trimleft/trimright
cloud-fan commented on a change in pull request #30905: URL: https://github.com/apache/spark/pull/30905#discussion_r549604882 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -756,6 +756,54 @@ trait String2TrimExpression extends Expression with ImplicitCastInputTypes { override def nullable: Boolean = children.exists(_.nullable) override def foldable: Boolean = children.forall(_.foldable) + protected def doEval(srcString: UTF8String): UTF8String + protected def doEval(srcString: UTF8String, trimString: UTF8String): UTF8String + + override def eval(input: InternalRow): Any = { +val srcString = srcStr.eval(input).asInstanceOf[UTF8String] +if (srcString == null) { + null +} else if (trimStr.isDefined) { + doEval(srcString, trimStr.get.eval(input).asInstanceOf[UTF8String]) +} else { + doEval(srcString) +} + } + + protected val trimMethod: String + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evals = children.map(_.genCode(ctx)) +val srcString = evals(0) + +if (evals.length == 1) { + ev.copy(code = evals.map(_.code) :+ Review comment: nit: ``` ev.copy(code = code""" |${evals.head.code} |...""".stripMargin ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #30960: [SPARK-33847][SQL][FOLLOWUP] Remove the CaseWhen should consider deterministic
wangyum commented on pull request #30960: URL: https://github.com/apache/spark/pull/30960#issuecomment-751984460 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #30960: [SPARK-33847][SQL][FOLLOWUP] Remove the CaseWhen should consider deterministic
wangyum opened a new pull request #30960: URL: https://github.com/apache/spark/pull/30960 ### What changes were proposed in this pull request? This pr fix remove the CaseWhen if elseValue is empty and other outputs are null because of we should consider deterministic. ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers
cloud-fan commented on a change in pull request #30947: URL: https://github.com/apache/spark/pull/30947#discussion_r549603691 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala ## @@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits { implicit class MultipartIdentifierHelper(parts: Seq[String]) { if (parts.isEmpty) { - throw new AnalysisException("multi-part identifier cannot be empty.") + throw new AnalysisException("Namespaces in V1 catalog can have only a single name part.") Review comment: BTW, how does `SHOW TABLES IN $catalog` get related to table identifier? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30935: [SPARK-33859][SQL] Support V2 ALTER TABLE .. RENAME PARTITION
cloud-fan commented on pull request #30935: URL: https://github.com/apache/spark/pull/30935#issuecomment-751983972 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30935: [SPARK-33859][SQL] Support V2 ALTER TABLE .. RENAME PARTITION
cloud-fan closed pull request #30935: URL: https://github.com/apache/spark/pull/30935 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
cloud-fan commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549602461 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -235,8 +236,17 @@ class ResolveSessionCatalog( case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended) => DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended) -case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, isExtended) => - DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended) +case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, isExtended) => + column match { +case u: UnresolvedAttribute => + // For views, the column will not be resolved by `ResolveReferences` because + // `ResolvedView` stores only the identifier. + DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, isExtended) +case a: Attribute => + DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ a.name, isExtended) +case nested => + throw QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN") Review comment: How about we strip the `Alias` and then call `toPrettySQL` which is defined in `org.apache.spark.sql.catalyst.util`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AngersZh commented on a change in pull request #30958: URL: https://github.com/apache/spark/pull/30958#discussion_r549601552 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala ## @@ -440,6 +441,31 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU } } } + + test("SPARK-33930: Script Transform default FIELD DELIMIT should be \u0001 (no serde)") { +withTempView("v") { + val df = Seq( +(1, 2, 3), +(2, 3, 4), +(3, 4, 5) + ).toDF("a", "b", "c") // Note column d's data type is Decimal(38, 18) Review comment: > where is column d? Remove this unrelated comment. Copy code from other UT..forgot remove this comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too
AngersZh commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-751982256 gentle ping @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default
mridulm edited a comment on pull request #30876: URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314 Before answering specific queries below, I want to set the context. a) Enabling proactive replication could result in reduced recomputation cost when executors fail. b) Enabling it will result in increased transfers when executor(s) are lost. (Ignoring other minor impacts) I was trying to understand what the impact would be, what the tradeoffs involved are, when we enable by default: 1) Are the replication costs (b) lower now ? How do we estimate that cost ? (There was non-trivial impact when I had last done some expt's earlier) 2) Are we (community) running into cases where we benefit from (a) but are not (very) negatively impacted by (b) ? Is there any commonality when this happens ? (application types/characterstics ? resource manager ? almost all usage ?) 3) What is the impact to the application (and cluster) when we have nontrivial executor loss - executor release in DRA is one example of this, preemption is another. 4) Anything else to watch out for ? As I mentioned earlier, I am fine with collecting data by enabling this flag by default. I am hoping this and other discussions will help us understand what questions to better evaluate before we release 3.2. > 1. For this question, I answered at the beginning that this is a kind of self-healing feature [here](https://github.com/apache/spark/pull/30876#discussion_r547031257) > > > Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ? Spark is self-healing via lineage :-) Having said that, as mentioned above, I want to understand what the tradeoff for enabling this flag are. > > 1. For the following question, I asked your evidence first because I'm not aware of. :) > > > If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ? I am not proposing to change the default behavior, you are ... hence my query :-) As I mentioned above, when I had looked at this in the past - it was very helpful for some applications, but not others : it depended on the application and their requirements - `replication > 1` itself was not very commonly used then. > > 1. For the following question, it seems that you assume that the current Spark's behavior is the best. I don't think this question justifies that the loss of data inside Spark side is good. > > > What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ? Couple of points here: a) There is no data loss - spark recomputes when a lost block is required (but at some recomputation cost). b) My query was specifically about the cost for replication - given what I described is a common pattern in user applications : I was not saying this is desired code pattern, but it is a commonly observed behavior. > > 1. For the following, yes, but `exacerbates` doesn't look like a proper term here because we had better make Spark smarter to handle those cases as I replied at [here](https://github.com/apache/spark/pull/30876#discussion_r547421217) already. > > > Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost. If we can do better on this, I am definitely very keen on it ! Until that happens, we need to continue supporting existing scenarios where DRA impacts use of this flag. > > 1. For the following, I didn't make this PR for that specific use case. I made this PR to improve this feature in various environment in Apache Spark 3.2.0 timeframe [here](https://github.com/apache/spark/pull/30876#issuecomment-749953223). > > > Specifically for this usecase, we dont need to make it a spark default right ? ... This was in response to the [scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) described. Let us decouple discussion of that scenario from our discussion here - and focus on what we need to evaluate for enabling this by default. > > 1. For the following, I replied that YARN environment also can suffer from disk loss or executor loss [here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so
[GitHub] [spark] cloud-fan commented on a change in pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
cloud-fan commented on a change in pull request #30958: URL: https://github.com/apache/spark/pull/30958#discussion_r549600913 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala ## @@ -440,6 +441,31 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU } } } + + test("SPARK-33930: Script Transform default FIELD DELIMIT should be \u0001 (no serde)") { +withTempView("v") { + val df = Seq( +(1, 2, 3), +(2, 3, 4), +(3, 4, 5) + ).toDF("a", "b", "c") // Note column d's data type is Decimal(38, 18) Review comment: where is column d? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
imback82 commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549600597 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -235,8 +236,17 @@ class ResolveSessionCatalog( case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended) => DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended) -case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, isExtended) => - DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended) +case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, isExtended) => + column match { +case u: UnresolvedAttribute => + // For views, the column will not be resolved by `ResolveReferences` because + // `ResolvedView` stores only the identifier. + DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, isExtended) +case a: Attribute => + DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ a.name, isExtended) +case nested => + throw QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN") Review comment: For `DESC desc_complex_col_table col.x`, It will be: ``` DESC TABLE COLUMN command does not support nested data types: col.x ``` vs. ``` DESC TABLE COLUMN does not support nested column: spark_catalog.default.desc_complex_col_table.`col`.`x` AS `x` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle execu
cloud-fan closed pull request #30956: URL: https://github.com/apache/spark/pull/30956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default
mridulm edited a comment on pull request #30876: URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314 Before answering specific queries below, I want to set the context. a) Enabling proactive replication could result in reduced recomputation cost when executors fail. b) Enabling it will result in increased transfers when executor(s) are lost. (Ignoring other minor impacts) I was trying to understand what the impact would be, what the tradeoffs involved are, when we enable by default: 1) Are the replication costs (b) lower now ? How do we estimate that cost ? (There was non-trivial impact when I had last done some expt's earlier) 2) Are we (community) running into cases where we benefit from (a) but are not (very) negatively impacted by (b) ? Is there any commonality when this happens ? (application types/characterstics ? resource manager ? almost all usage ?) 3) What is the impact to the application (and cluster) when we have nontrivial executor loss - executor release in DRA is one example of this, preemption is another. 4) Anything else to watch out for ? As I mentioned earlier, I am fine with collecting data by enabling this flag by default. I am hoping this and other discussions will help us understand what questions to better evaluate before we release 3.2. > 1. For this question, I answered at the beginning that this is a kind of self-healing feature [here](https://github.com/apache/spark/pull/30876#discussion_r547031257) > > > Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ? Spark is self-healing via lineage :-) Having said that, as mentioned above, I want to understand what the tradeoff for enabling this flag are. > > 1. For the following question, I asked your evidence first because I'm not aware of. :) > > > If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ? I am not proposing to change the default behavior, you are ... hence my query :-) As I mentioned above, when I had looked at this in the past - it was very helpful for some applications, but not others : it depended on the application and their requirements - `replication > 1` itself was not very commonly used then. > > 1. For the following question, it seems that you assume that the current Spark's behavior is the best. I don't think this question justifies that the loss of data inside Spark side is good. > > > What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ? Couple of points here: a) There is no data loss - spark recomputes when a lost block is required (but at some recomputation cost). b) My query was specifically about the cost for replication - given what I described is a common pattern in user applications : I was not saying this is desired code pattern, but it is a commonly observed behavior. > > 1. For the following, yes, but `exacerbates` doesn't look like a proper term here because we had better make Spark smarter to handle those cases as I replied at [here](https://github.com/apache/spark/pull/30876#discussion_r547421217) already. > > > Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost. If we can do better on this, I am definitely very keen on it ! Until that happens, we need to continue supporting existing scenarios where DRA impacts use of this flag. > > 1. For the following, I didn't make this PR for that specific use case. I made this PR to improve this feature in various environment in Apache Spark 3.2.0 timeframe [here](https://github.com/apache/spark/pull/30876#issuecomment-749953223). > > > Specifically for this usecase, we dont need to make it a spark default right ? ... This was in response to the [scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) described. Let us decouple discussion of that scenario from our discussion here - and focus on what we need to evaluate for enabling this by default. > > 1. For the following, I replied that YARN environment also can suffer from disk loss or executor loss [here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so
[GitHub] [spark] cloud-fan commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle
cloud-fan commented on pull request #30956: URL: https://github.com/apache/spark/pull/30956#issuecomment-751981627 thanks, merging to master/3.1! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751980630 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38057/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-751980374 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38061/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default
mridulm commented on pull request #30876: URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314 Before answering specific queries below, I want to set the context. a) Enabling proactive replication could result in reduced recomputation cost when executors fail. b) Enabling it will result in increased transfers when executor(s) are lost. (Ignoring other minor impacts) I was trying to understand what the impact would be, what the tradeoffs involved are, when we enable by default: 1) Are the replication costs (b) lower now ? How do we estimate that cost ? (There was non-trivial impact when I had last done some expt's earlier) 2) Are we (community) running into cases where we benefit from (a) but are not (very) negatively impacted by (b) ? Is there any commonality when this happens ? (application types/characterstics ? resource manager ? almost all usage ?) 3) What is the impact to the application (and cluster) when we have nontrivial executor loss - executor release in DRA is one example of this, preemption is another. 4) Anything else to watch out for ? As I mentioned earlier, I am fine with collecting data by enabling this flag by default. I am hoping this and other discussions will help us understand what questions to better evaluate before we release 3.2. > 1. For this question, I answered at the beginning that this is a kind of self-healing feature [here](https://github.com/apache/spark/pull/30876#discussion_r547031257) > > > Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ? Spark is self-healing via lineage :-) Having said that, as mentioned above, I want to understand what the tradeoff for enabling this flag are. > > 1. For the following question, I asked your evidence first because I'm not aware of. :) > > > If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ? I am not proposing to change the default behavior, you are ... hence my query :-) As I mentioned above, when I had looked at this in the past - it was very helpful for some applications, but not others : it depended on the application and their requirements - `replication > 1` itself was not very commonly used then. > > 1. For the following question, it seems that you assume that the current Spark's behavior is the best. I don't think this question justifies that the loss of data inside Spark side is good. > > > What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ? Couple of points here: a) There is no data loss - spark recomputes when a lost block is required (but at some recomputation cost). b) My query was specifically about the cost for replication - given what I described is a common pattern in user applications : I was not saying this is desired code pattern, but it is a commonly observed behavior. > > 1. For the following, yes, but `exacerbates` doesn't look like a proper term here because we had better make Spark smarter to handle those cases as I replied at [here](https://github.com/apache/spark/pull/30876#discussion_r547421217) already. > > > Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost. If we can do better on this, I am definitely very keen on it ! Until that happens, we need to continue supporting existing scenarios where DRA impacts use of this flag. > > 1. For the following, I didn't make this PR for that specific use case. I made this PR to improve this feature in various environment in Apache Spark 3.2.0 timeframe [here](https://github.com/apache/spark/pull/30876#issuecomment-749953223). > > > Specifically for this usecase, we dont need to make it a spark default right ? ... This was in response to the [scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) described. Let us decouple discussion of that scenario from our discussion here - and focus on what we need to evaluate for enabling this by default. > > 1. For the following, I replied that YARN environment also can suffer from disk loss or executor loss [here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751979073 **[Test build #133477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133477/testReport)** for PR 30957 at commit [`6a7438b`](https://github.com/apache/spark/commit/6a7438bf6574d35ed841a7301f50003b4fb12341). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins removed a comment on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751978657 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133466/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins commented on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751978657 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133466/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
SparkQA removed a comment on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751938979 **[Test build #133466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133466/testReport)** for PR 30841 at commit [`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins removed a comment on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751978457 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751978457 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751978449 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
SparkQA commented on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751978390 **[Test build #133466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133466/testReport)** for PR 30841 at commit [`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AngersZh commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751977979 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins removed a comment on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751977685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133467/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751977685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133467/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
AmplabJenkins removed a comment on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751977414 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38062/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
AmplabJenkins commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751977414 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38062/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA removed a comment on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751951003 **[Test build #133467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133467/testReport)** for PR 30957 at commit [`adc9ded`](https://github.com/apache/spark/commit/adc9ded0d8fe957b203c047e433381645fe944e9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
AmplabJenkins removed a comment on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751977032 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133474/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751977118 **[Test build #133467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133467/testReport)** for PR 30957 at commit [`adc9ded`](https://github.com/apache/spark/commit/adc9ded0d8fe957b203c047e433381645fe944e9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
AmplabJenkins commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751977032 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133474/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when
AmplabJenkins removed a comment on pull request #30956: URL: https://github.com/apache/spark/pull/30956#issuecomment-751971918 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133464/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13
SparkQA removed a comment on pull request #30951: URL: https://github.com/apache/spark/pull/30951#issuecomment-751890200 **[Test build #133460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133460/testReport)** for PR 30951 at commit [`0d6ff72`](https://github.com/apache/spark/commit/0d6ff72b2272ccc355d076c8bf6f672d2da3751f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins removed a comment on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751971916 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38056/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
AmplabJenkins removed a comment on pull request #30898: URL: https://github.com/apache/spark/pull/30898#issuecomment-751972078 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133462/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
SparkQA removed a comment on pull request #30898: URL: https://github.com/apache/spark/pull/30898#issuecomment-751920157 **[Test build #133462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133462/testReport)** for PR 30898 at commit [`d3b072e`](https://github.com/apache/spark/commit/d3b072e2d1db3aef0ea4ab80767ab739502f7e81). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30928: [SPARK-33912][SQL] Refactor DependencyUtils ivy property parameter
AmplabJenkins removed a comment on pull request #30928: URL: https://github.com/apache/spark/pull/30928#issuecomment-751971917 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38060/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13
AmplabJenkins removed a comment on pull request #30951: URL: https://github.com/apache/spark/pull/30951#issuecomment-751975143 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133460/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AngersZh edited a comment on pull request #30958: URL: https://github.com/apache/spark/pull/30958#issuecomment-751974605 > Not related to the change. But I notice that some contributors usually use screenshots in the description. I personally don't recommend this approach. The images cannot be indexed and searched. So I suggest that for problem and fix description, some text are more helpful. Yea, thanks for your suggestion, I will update pr desc. And will pay attention to this problem. Maybe we should sen an email to mention this ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13
AmplabJenkins commented on pull request #30951: URL: https://github.com/apache/spark/pull/30951#issuecomment-751975143 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133460/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
cloud-fan closed pull request #30898: URL: https://github.com/apache/spark/pull/30898 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
cloud-fan commented on pull request #30898: URL: https://github.com/apache/spark/pull/30898#issuecomment-751974714 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers
MaxGekk commented on a change in pull request #30947: URL: https://github.com/apache/spark/pull/30947#discussion_r549594135 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala ## @@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits { implicit class MultipartIdentifierHelper(parts: Seq[String]) { if (parts.isEmpty) { - throw new AnalysisException("multi-part identifier cannot be empty.") + throw new AnalysisException("Namespaces in V1 catalog can have only a single name part.") Review comment: Actually, `parts` includes a table name. When we say that `parts` cannot be empty, we require at least a table name. Probably, `Namespaces in V1 catalog can have only a single name part` could confuse users too. We should say something like a table identifier must contain either a table name or database + a table name. Specifically in the check, we should say **"Table identification must have at least a table name"** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AngersZh commented on pull request #30958: URL: https://github.com/apache/spark/pull/30958#issuecomment-751974605 > Not related to the change. But I notice that some contributors usually use screenshots in the description. I personally don't recommend this approach. The images cannot be indexed and searched. So I suggest that for problem and fix description, some text are more helpful. Yea, thanks for your suggestion, I will update pr desc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13
SparkQA commented on pull request #30951: URL: https://github.com/apache/spark/pull/30951#issuecomment-751974120 **[Test build #133460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133460/testReport)** for PR 30951 at commit [`0d6ff72`](https://github.com/apache/spark/commit/0d6ff72b2272ccc355d076c8bf6f672d2da3751f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751973873 **[Test build #133474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133474/testReport)** for PR 30959 at commit [`65f97de`](https://github.com/apache/spark/commit/65f97dee09fde2bd77bf3514ed855278f15de974). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate
SparkQA commented on pull request #30315: URL: https://github.com/apache/spark/pull/30315#issuecomment-751973304 **[Test build #133476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133476/testReport)** for PR 30315 at commit [`ed3530a`](https://github.com/apache/spark/commit/ed3530a560927fbbf78a142aa7aec98237b7a77c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
cloud-fan commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549592550 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -272,8 +273,13 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat } DescribeTableExec(desc.output, r.table, isExtended) :: Nil -case DescribeColumn(_: ResolvedTable, _, _) => - throw new AnalysisException("Describing columns is not supported for v2 tables.") +case desc @ DescribeColumn(_: ResolvedTable, column, isExtended) => + column match { +case c: Attribute => + DescribeColumnExec(desc.output, c, isExtended) :: Nil +case _ => + throw QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN") Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
SparkQA commented on pull request #30955: URL: https://github.com/apache/spark/pull/30955#issuecomment-751972899 **[Test build #133475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133475/testReport)** for PR 30955 at commit [`82a343c`](https://github.com/apache/spark/commit/82a343c8b2b1c2258d49cc5799a590d7ba0d7651). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
cloud-fan commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549592416 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -235,8 +236,17 @@ class ResolveSessionCatalog( case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended) => DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended) -case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, isExtended) => - DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended) +case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, isExtended) => + column match { +case u: UnresolvedAttribute => + // For views, the column will not be resolved by `ResolveReferences` because + // `ResolvedView` stores only the identifier. + DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, isExtended) +case a: Attribute => + DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ a.name, isExtended) +case nested => + throw QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN") Review comment: > Construct the original name from GetStructField, GetArrayStructFields, etc. Is it simply `nested.sql`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-751972303 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38061/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
cloud-fan commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549591975 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -235,8 +236,17 @@ class ResolveSessionCatalog( case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended) => DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended) -case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, isExtended) => - DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended) +case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, isExtended) => + column match { +case u: UnresolvedAttribute => + // For views, the column will not be resolved by `ResolveReferences` because + // `ResolvedView` stores only the identifier. + DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, isExtended) Review comment: It's possible when the column name doesn't exist in the table, and we should give a clear error message: `Column $colName does not exist` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
AmplabJenkins commented on pull request #30898: URL: https://github.com/apache/spark/pull/30898#issuecomment-751972078 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133462/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751972015 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38057/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30928: [SPARK-33912][SQL] Refactor DependencyUtils ivy property parameter
AmplabJenkins commented on pull request #30928: URL: https://github.com/apache/spark/pull/30928#issuecomment-751971917 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38060/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AmplabJenkins commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751971916 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38056/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing
AmplabJenkins commented on pull request #30956: URL: https://github.com/apache/spark/pull/30956#issuecomment-751971918 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133464/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)
SparkQA commented on pull request #30898: URL: https://github.com/apache/spark/pull/30898#issuecomment-751971552 **[Test build #133462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133462/testReport)** for PR 30898 at commit [`d3b072e`](https://github.com/apache/spark/commit/d3b072e2d1db3aef0ea4ab80767ab739502f7e81). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
cloud-fan commented on a change in pull request #30955: URL: https://github.com/apache/spark/pull/30955#discussion_r549590506 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { foldables.nonEmpty && others.length < 2 } + // Not all UnaryExpression support push into (if / case) branches, e.g. Alias. + private def supportedUnaryExpression(e: UnaryExpression): Boolean = e match { +case _: IsNull | _: IsNotNull => true +case _: UnaryMathExpression | _: Abs | _: Bin | _: Factorial | _: Hex => true +case _: String2StringExpression | _: Ascii | _: Base64 | _: BitLength | _: Chr | _: Length => + true +case _: CastBase => true +case _: GetDateField | _: LastDay => true +case _: ExtractIntervalPart => true +case _: ArraySetLike => true +case _ => false + } + + private def supportedBinaryExpression(e: BinaryExpression): Boolean = e match { Review comment: let's add comments as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30948: [SPARK-33637][SQL] alter table drop partition equals alter table drop if ex…
AngersZh commented on pull request #30948: URL: https://github.com/apache/spark/pull/30948#issuecomment-751970421 Seem PR title not completed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
cloud-fan commented on a change in pull request #30955: URL: https://github.com/apache/spark/pull/30955#discussion_r549590192 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { foldables.nonEmpty && others.length < 2 } + // Not all UnaryExpression support push into (if / case) branches, e.g. Alias. Review comment: `Not all UnaryExpression can be pushed into (if / case) branches, e.g. Alias.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
cloud-fan commented on a change in pull request #30955: URL: https://github.com/apache/spark/pull/30955#discussion_r549590121 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { foldables.nonEmpty && others.length < 2 } + // Not all UnaryExpression support push into (if / case) branches, e.g. Alias. + private def supportedUnaryExpression(e: UnaryExpression): Boolean = e match { +case _: IsNull | _: IsNotNull => true +case _: UnaryMathExpression | _: Abs | _: Bin | _: Factorial | _: Hex => true +case _: String2StringExpression | _: Ascii | _: Base64 | _: BitLength | _: Chr | _: Length => + true +case _: CastBase => true +case _: GetDateField | _: LastDay => true +case _: ExtractIntervalPart => true +case _: ArraySetLike => true +case _ => false Review comment: let's include `ExtractValue` as well, which is common with nested fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog
cloud-fan closed pull request #30952: URL: https://github.com/apache/spark/pull/30952 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog
cloud-fan commented on pull request #30952: URL: https://github.com/apache/spark/pull/30952#issuecomment-751969553 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killin
SparkQA removed a comment on pull request #30956: URL: https://github.com/apache/spark/pull/30956#issuecomment-751938899 **[Test build #133464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133464/testReport)** for PR 30956 at commit [`ba0a4bc`](https://github.com/apache/spark/commit/ba0a4bca5a21417c78bac6626f4e1f6646c68a7b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle e
SparkQA commented on pull request #30956: URL: https://github.com/apache/spark/pull/30956#issuecomment-751969260 **[Test build #133464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133464/testReport)** for PR 30956 at commit [`ba0a4bc`](https://github.com/apache/spark/commit/ba0a4bca5a21417c78bac6626f4e1f6646c68a7b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751969171 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38059/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers
cloud-fan commented on a change in pull request #30947: URL: https://github.com/apache/spark/pull/30947#discussion_r549589001 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala ## @@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits { implicit class MultipartIdentifierHelper(parts: Seq[String]) { if (parts.isEmpty) { - throw new AnalysisException("multi-part identifier cannot be empty.") + throw new AnalysisException("Namespaces in V1 catalog can have only a single name part.") Review comment: to be more precise: `Namespaces in V1 catalog cannot be empty`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751964598 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38056/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers
MaxGekk commented on pull request #30947: URL: https://github.com/apache/spark/pull/30947#issuecomment-751963956 @cloud-fan Please, take a look at this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog
MaxGekk commented on pull request #30952: URL: https://github.com/apache/spark/pull/30952#issuecomment-751963841 @cloud-fan @HyukjinKwon Please, review this fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #30853: [SPARK-33848][SQL] Push the UnaryExpression into (if / case) branches
wangyum commented on a change in pull request #30853: URL: https://github.com/apache/spark/pull/30853#discussion_r549583380 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -542,29 +542,42 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsUp { + case a: Alias => a // Skip an alias. Review comment: https://github.com/apache/spark/pull/30955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751963050 **[Test build #133473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133473/testReport)** for PR 30959 at commit [`3f9b69d`](https://github.com/apache/spark/commit/3f9b69dfe634e3de390a787b84cc195206ffb440). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
viirya commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549582590 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -235,8 +236,17 @@ class ResolveSessionCatalog( case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), partitionSpec, isExtended) => DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended) -case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, isExtended) => - DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended) +case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, isExtended) => + column match { +case u: UnresolvedAttribute => + // For views, the column will not be resolved by `ResolveReferences` because + // `ResolvedView` stores only the identifier. + DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, isExtended) Review comment: Is it possible there is unresolved attribute but the `relation` of `DescribeColumn` is a v1 table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-751961455 **[Test build #133472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133472/testReport)** for PR 30212 at commit [`927f21e`](https://github.com/apache/spark/commit/927f21e3ebf9a3e71d2467fabe492d2b306a8037). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751959287 **[Test build #133471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133471/testReport)** for PR 30957 at commit [`6a7438b`](https://github.com/apache/spark/commit/6a7438bf6574d35ed841a7301f50003b4fb12341). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job
SparkQA commented on pull request #30959: URL: https://github.com/apache/spark/pull/30959#issuecomment-751958605 **[Test build #133469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133469/testReport)** for PR 30959 at commit [`edc4994`](https://github.com/apache/spark/commit/edc4994ae348a5c4c258143c57e015aeaf9d673f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
SparkQA commented on pull request #30958: URL: https://github.com/apache/spark/pull/30958#issuecomment-751958635 **[Test build #133470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133470/testReport)** for PR 30958 at commit [`1812826`](https://github.com/apache/spark/commit/1812826f67cc41ed8efb961e793c74a975e27d5d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751958171 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38056/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins removed a comment on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751957782 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38054/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
AmplabJenkins removed a comment on pull request #30955: URL: https://github.com/apache/spark/pull/30955#issuecomment-751957781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38053/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
AmplabJenkins commented on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751957782 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38054/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches
AmplabJenkins commented on pull request #30955: URL: https://github.com/apache/spark/pull/30955#issuecomment-751957781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38053/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action
dongjoon-hyun opened a new pull request #30959: URL: https://github.com/apache/spark/pull/30959 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables
imback82 commented on a change in pull request #30881: URL: https://github.com/apache/spark/pull/30881#discussion_r549576786 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala ## @@ -97,9 +98,26 @@ case class ResolvedNamespace(catalog: CatalogPlugin, namespace: Seq[String]) /** * A plan containing resolved table. */ -case class ResolvedTable(catalog: TableCatalog, identifier: Identifier, table: Table) +case class ResolvedTable( +catalog: TableCatalog, +identifier: Identifier, +table: Table, +outputAttributes: Seq[Attribute]) extends LeafNode { - override def output: Seq[Attribute] = Nil + override def output: Seq[Attribute] = { +val qualifier = catalog.name +: identifier.namespace :+ identifier.name +outputAttributes.map(_.withQualifier(qualifier)) Review comment: Or we can wrap this with `SubqueryAlias` similar to how `DataSourceV2Relation` is wrapped, but we need to update everywhere `ResolvedTable` is matched. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform
AngersZh commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-751953363 FYI @cloud-fan @maropu @alfozan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AngersZh commented on pull request #30958: URL: https://github.com/apache/spark/pull/30958#issuecomment-751953177 FYI @maropu @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'
AngersZh opened a new pull request #30958: URL: https://github.com/apache/spark/pull/30958 ### What changes were proposed in this pull request? For same SQL ``` SELECT TRANSFORM(a, b, c, null) ROW FORMAT DELIMITED USING 'cat' ROW FORMAT DELIMITED FIELDS TERMINATED BY '&' FROM (select 1 as a, 2 as b, 3 as c) t ``` In hive: ![image](https://user-images.githubusercontent.com/46485123/103260903-5c968a80-49da-11eb-9675-7c66b2ee35fb.png) In Spark ![image](https://user-images.githubusercontent.com/46485123/103260912-67511f80-49da-11eb-93df-663543c8e91f.png) We should keep same. Change default ROW FORMAT FIELD DELIMIT to `\u0001` ### Why are the changes needed? Keep same behavior with hive ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part
SparkQA commented on pull request #30841: URL: https://github.com/apache/spark/pull/30841#issuecomment-751952088 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38054/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org