[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751986676


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38057/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-751986674


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38061/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30958:
URL: https://github.com/apache/spark/pull/30958#issuecomment-751986675


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38058/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


SparkQA commented on pull request #30955:
URL: https://github.com/apache/spark/pull/30955#issuecomment-751985919


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38064/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751985845


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38063/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30905: [SPARK-33890][SQL] Improve the implement of trim/trimleft/trimright

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30905:
URL: https://github.com/apache/spark/pull/30905#discussion_r549605431



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##
@@ -756,6 +756,54 @@ trait String2TrimExpression extends Expression with 
ImplicitCastInputTypes {
   override def nullable: Boolean = children.exists(_.nullable)
   override def foldable: Boolean = children.forall(_.foldable)
 
+  protected def doEval(srcString: UTF8String): UTF8String
+  protected def doEval(srcString: UTF8String, trimString: UTF8String): 
UTF8String
+
+  override def eval(input: InternalRow): Any = {
+val srcString = srcStr.eval(input).asInstanceOf[UTF8String]
+if (srcString == null) {
+  null
+} else if (trimStr.isDefined) {
+  doEval(srcString, trimStr.get.eval(input).asInstanceOf[UTF8String])
+} else {
+  doEval(srcString)
+}
+  }
+
+  protected val trimMethod: String
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evals = children.map(_.genCode(ctx))
+val srcString = evals(0)
+
+if (evals.length == 1) {
+  ev.copy(code = evals.map(_.code) :+
+code"""
+  |boolean ${ev.isNull} = false;
+  |UTF8String ${ev.value} = null;
+  |if (${srcString.isNull}) {
+  |  ${ev.isNull} = true;
+  |} else {
+  |  ${ev.value} = ${srcString.value}.$trimMethod();
+  |}
+""")
+} else {
+  val trimString = evals(1)
+  ev.copy(code = evals.map(_.code) :+

Review comment:
   We can skip evaluating trim string if possible
   ```
   ev.copy(code = code"""
 |${evals.head.code}
 |if (${srcString.isNull}) {
 | ...
 |} else {
 |  ${trimString.code}
 |  if (${trimString.isNull}) ...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate

2020-12-28 Thread GitBox


SparkQA commented on pull request #30315:
URL: https://github.com/apache/spark/pull/30315#issuecomment-751985404


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38065/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30905: [SPARK-33890][SQL] Improve the implement of trim/trimleft/trimright

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30905:
URL: https://github.com/apache/spark/pull/30905#discussion_r549604882



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##
@@ -756,6 +756,54 @@ trait String2TrimExpression extends Expression with 
ImplicitCastInputTypes {
   override def nullable: Boolean = children.exists(_.nullable)
   override def foldable: Boolean = children.forall(_.foldable)
 
+  protected def doEval(srcString: UTF8String): UTF8String
+  protected def doEval(srcString: UTF8String, trimString: UTF8String): 
UTF8String
+
+  override def eval(input: InternalRow): Any = {
+val srcString = srcStr.eval(input).asInstanceOf[UTF8String]
+if (srcString == null) {
+  null
+} else if (trimStr.isDefined) {
+  doEval(srcString, trimStr.get.eval(input).asInstanceOf[UTF8String])
+} else {
+  doEval(srcString)
+}
+  }
+
+  protected val trimMethod: String
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evals = children.map(_.genCode(ctx))
+val srcString = evals(0)
+
+if (evals.length == 1) {
+  ev.copy(code = evals.map(_.code) :+

Review comment:
   nit:
   ```
   ev.copy(code = code"""
 |${evals.head.code}
 |...""".stripMargin 
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #30960: [SPARK-33847][SQL][FOLLOWUP] Remove the CaseWhen should consider deterministic

2020-12-28 Thread GitBox


wangyum commented on pull request #30960:
URL: https://github.com/apache/spark/pull/30960#issuecomment-751984460


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request #30960: [SPARK-33847][SQL][FOLLOWUP] Remove the CaseWhen should consider deterministic

2020-12-28 Thread GitBox


wangyum opened a new pull request #30960:
URL: https://github.com/apache/spark/pull/30960


   ### What changes were proposed in this pull request?
   
   This pr fix remove the CaseWhen if elseValue is empty and other outputs are 
null because of we should consider deterministic.
   
   ### Why are the changes needed?
   
   Fix bug.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30947:
URL: https://github.com/apache/spark/pull/30947#discussion_r549603691



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala
##
@@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits {
 
   implicit class MultipartIdentifierHelper(parts: Seq[String]) {
 if (parts.isEmpty) {
-  throw new AnalysisException("multi-part identifier cannot be empty.")
+  throw new AnalysisException("Namespaces in V1 catalog can have only a 
single name part.")

Review comment:
   BTW, how does `SHOW TABLES IN $catalog` get related to table identifier?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30935: [SPARK-33859][SQL] Support V2 ALTER TABLE .. RENAME PARTITION

2020-12-28 Thread GitBox


cloud-fan commented on pull request #30935:
URL: https://github.com/apache/spark/pull/30935#issuecomment-751983972


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30935: [SPARK-33859][SQL] Support V2 ALTER TABLE .. RENAME PARTITION

2020-12-28 Thread GitBox


cloud-fan closed pull request #30935:
URL: https://github.com/apache/spark/pull/30935


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549602461



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   How about we strip the `Alias` and then call `toPrettySQL` which is 
defined in `org.apache.spark.sql.catalyst.util`? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AngersZh commented on a change in pull request #30958:
URL: https://github.com/apache/spark/pull/30958#discussion_r549601552



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
##
@@ -440,6 +441,31 @@ abstract class BaseScriptTransformationSuite extends 
SparkPlanTest with SQLTestU
   }
 }
   }
+
+  test("SPARK-33930: Script Transform default FIELD DELIMIT should be \u0001 
(no serde)") {
+withTempView("v") {
+  val df = Seq(
+(1, 2, 3),
+(2, 3, 4),
+(3, 4, 5)
+  ).toDF("a", "b", "c") // Note column d's data type is Decimal(38, 18)

Review comment:
   > where is column d?
   
   Remove this unrelated comment. Copy code from other UT..forgot remove this 
comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2020-12-28 Thread GitBox


AngersZh commented on pull request #30869:
URL: https://github.com/apache/spark/pull/30869#issuecomment-751982256


   gentle ping @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm edited a comment on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default

2020-12-28 Thread GitBox


mridulm edited a comment on pull request #30876:
URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314


   Before answering specific queries below, I want to set the context.
   a) Enabling proactive replication could result in reduced recomputation cost 
when executors fail.
   b) Enabling it will result in increased transfers when executor(s) are lost.
   (Ignoring other minor impacts) 
   
   I was trying to understand what the impact would be, what the tradeoffs 
involved are, when we enable by default:
   
   1) Are the replication costs (b) lower now ? How do we estimate that cost ?
   (There was non-trivial impact when I had last done some expt's earlier)
   
   2) Are we (community) running into cases where we benefit from (a) but are 
not (very) negatively impacted by (b) ?
   Is there any commonality when this happens ?
   (application types/characterstics ? resource manager ? almost all usage ?)
   
   3) What is the impact to the application (and cluster) when we have 
nontrivial executor loss - executor release in DRA is one example of this, 
preemption is another.
   
   4) Anything else to watch out for ?
   
   As I mentioned earlier, I am fine with collecting data by enabling this flag 
by default.
   I am hoping this and other discussions will help us understand what 
questions to better evaluate before we release 3.2.
   
   
   > 1. For this question, I answered at the beginning that this is a kind of 
self-healing feature 
[here](https://github.com/apache/spark/pull/30876#discussion_r547031257)
   > 
   > > Making it default will impact all applications which have replication > 
1: given this PR is proposing to make it the default, I would like to know if 
there was any motivating reason to make this change ?
   
   Spark is self-healing via lineage :-)
   Having said that, as mentioned above, I want to understand what the tradeoff 
for enabling this flag are.
   
   > 
   > 1. For the following question, I asked your evidence first because I'm not 
aware of. :)
   > 
   > > If the cost of proactive replication is close to zero now (my 
experiments were from a while back), ofcourse the discussion is moot - did we 
have any results for this ?
   
   I am not proposing to change the default behavior, you are ... hence my 
query :-)
   As I mentioned above, when I had looked at this in the past - it was very 
helpful for some applications, but not others : it depended on the application 
and their requirements - `replication > 1` itself was not very commonly used 
then.
   
   > 
   > 1. For the following question, it seems that you assume that the current 
Spark's behavior is the best. I don't think this question justifies that the 
loss of data inside Spark side is good.
   > 
   > > What is the ongoing cost when application holds RDD references, but they 
are not in active use for rest of the application (not all references can be 
cleared by gc) - resulting in replication of blocks for an RDD which is 
legitimately not going to be used again ?
   
   Couple of points here:
   a) There is no data loss - spark recomputes when a lost block is required 
(but at some recomputation cost).
   b) My query was specifically about the cost for replication - given what I 
described is a common pattern in user applications : I was not saying this is 
desired code pattern, but it is a commonly observed behavior.
   
   
   > 
   > 1. For the following, yes, but `exacerbates` doesn't look like a proper 
term here because we had better make Spark smarter to handle those cases as I 
replied at 
[here](https://github.com/apache/spark/pull/30876#discussion_r547421217) 
already.
   > 
   > > Note that the above is orthogonal to DRA evicting an executor via 
storage timeout configuration. That just exacerbates the problem : since a 
larger number of executors could be lost.
   
   If we can do better on this, I am definitely very keen on it !
   Until that happens, we need to continue supporting existing scenarios where 
DRA impacts use of this flag.
   
   
   > 
   > 1. For the following, I didn't make this PR for that specific use case. I 
made this PR to improve this feature in various environment in Apache Spark 
3.2.0 timeframe 
[here](https://github.com/apache/spark/pull/30876#issuecomment-749953223).
   > 
   > > Specifically for this usecase, we dont need to make it a spark default 
right ? ...
   
   This was in response to the 
[scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) 
described.
   Let us decouple discussion of that scenario from our discussion here - and 
focus on what we need to evaluate for enabling this by default.
   
   
   > 
   > 1. For the following, I replied that YARN environment also can suffer from 
disk loss or executor loss 
[here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) 
because you insisted that YARN doesn't need this feature from the beginning. 
I'm still not sure that YARN environment is so 

[GitHub] [spark] cloud-fan commented on a change in pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30958:
URL: https://github.com/apache/spark/pull/30958#discussion_r549600913



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
##
@@ -440,6 +441,31 @@ abstract class BaseScriptTransformationSuite extends 
SparkPlanTest with SQLTestU
   }
 }
   }
+
+  test("SPARK-33930: Script Transform default FIELD DELIMIT should be \u0001 
(no serde)") {
+withTempView("v") {
+  val df = Seq(
+(1, 2, 3),
+(2, 3, 4),
+(3, 4, 5)
+  ).toDF("a", "b", "c") // Note column d's data type is Decimal(38, 18)

Review comment:
   where is column d?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


imback82 commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549600597



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   For `DESC desc_complex_col_table col.x`,
   It will be:
   ```
   DESC TABLE COLUMN command does not support nested data types: col.x
   ```
   vs.
   ```
   DESC TABLE COLUMN does not support nested column: 
spark_catalog.default.desc_complex_col_table.`col`.`x` AS `x`
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle execu

2020-12-28 Thread GitBox


cloud-fan closed pull request #30956:
URL: https://github.com/apache/spark/pull/30956


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm edited a comment on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default

2020-12-28 Thread GitBox


mridulm edited a comment on pull request #30876:
URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314


   Before answering specific queries below, I want to set the context.
   a) Enabling proactive replication could result in reduced recomputation cost 
when executors fail.
   b) Enabling it will result in increased transfers when executor(s) are lost.
   (Ignoring other minor impacts) 
   
   I was trying to understand what the impact would be, what the tradeoffs 
involved are, when we enable by default:
   
   1) Are the replication costs (b) lower now ? How do we estimate that cost ?
   (There was non-trivial impact when I had last done some expt's earlier)
   
   2) Are we (community) running into cases where we benefit from (a) but are 
not (very) negatively impacted by (b) ?
   Is there any commonality when this happens ?
   (application types/characterstics ? resource manager ? almost all usage ?)
   
   3) What is the impact to the application (and cluster) when we have 
nontrivial executor loss - executor release in DRA is one example of this, 
preemption is another.
   
   4) Anything else to watch out for ?
   
   As I mentioned earlier, I am fine with collecting data by enabling this flag 
by default.
   I am hoping this and other discussions will help us understand what 
questions to better evaluate before we release 3.2.
   
   
   > 1. For this question, I answered at the beginning that this is a kind of 
self-healing feature 
[here](https://github.com/apache/spark/pull/30876#discussion_r547031257)
   > 
   > > Making it default will impact all applications which have replication > 
1: given this PR is proposing to make it the default, I would like to know if 
there was any motivating reason to make this change ?
   
   Spark is self-healing via lineage :-)
   Having said that, as mentioned above, I want to understand what the tradeoff 
for enabling this flag are.
   
   > 
   > 1. For the following question, I asked your evidence first because I'm not 
aware of. :)
   > 
   > > If the cost of proactive replication is close to zero now (my 
experiments were from a while back), ofcourse the discussion is moot - did we 
have any results for this ?
   
   I am not proposing to change the default behavior, you are ... hence my 
query :-)
   As I mentioned above, when I had looked at this in the past - it was very 
helpful for some applications, but not others : it depended on the application 
and their requirements - `replication > 1` itself was not very commonly used 
then.
   
   > 
   > 1. For the following question, it seems that you assume that the current 
Spark's behavior is the best. I don't think this question justifies that the 
loss of data inside Spark side is good.
   > 
   > > What is the ongoing cost when application holds RDD references, but they 
are not in active use for rest of the application (not all references can be 
cleared by gc) - resulting in replication of blocks for an RDD which is 
legitimately not going to be used again ?
   
   Couple of points here:
   a) There is no data loss - spark recomputes when a lost block is required 
(but at some recomputation cost).
   b) My query was specifically about the cost for replication - given what I 
described is a common pattern in user applications : I was not saying this is 
desired code pattern, but it is a commonly observed behavior.
   
   
   > 
   > 1. For the following, yes, but `exacerbates` doesn't look like a proper 
term here because we had better make Spark smarter to handle those cases as I 
replied at 
[here](https://github.com/apache/spark/pull/30876#discussion_r547421217) 
already.
   > 
   > > Note that the above is orthogonal to DRA evicting an executor via 
storage timeout configuration. That just exacerbates the problem : since a 
larger number of executors could be lost.
   
   If we can do better on this, I am definitely very keen on it !
   Until that happens, we need to continue supporting existing scenarios where 
DRA impacts use of this flag.
   
   
   > 
   > 1. For the following, I didn't make this PR for that specific use case. I 
made this PR to improve this feature in various environment in Apache Spark 
3.2.0 timeframe 
[here](https://github.com/apache/spark/pull/30876#issuecomment-749953223).
   > 
   > > Specifically for this usecase, we dont need to make it a spark default 
right ? ...
   
   This was in response to the 
[scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) 
described.
   Let us decouple discussion of that scenario from our discussion here - and 
focus on what we need to evaluate for enabling this by default.
   
   
   > 
   > 1. For the following, I replied that YARN environment also can suffer from 
disk loss or executor loss 
[here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) 
because you insisted that YARN doesn't need this feature from the beginning. 
I'm still not sure that YARN environment is so 

[GitHub] [spark] cloud-fan commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle

2020-12-28 Thread GitBox


cloud-fan commented on pull request #30956:
URL: https://github.com/apache/spark/pull/30956#issuecomment-751981627


   thanks, merging to master/3.1!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751980630


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38057/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2020-12-28 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-751980374


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38061/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30876: [SPARK-33870][CORE] Enable spark.storage.replication.proactive by default

2020-12-28 Thread GitBox


mridulm commented on pull request #30876:
URL: https://github.com/apache/spark/pull/30876#issuecomment-751979314


   
   Before answering specific queries below, I want to set the context.
   a) Enabling proactive replication could result in reduced recomputation cost 
when executors fail.
   b) Enabling it will result in increased transfers when executor(s) are lost.
   (Ignoring other minor impacts) 
   
   I was trying to understand what the impact would be, what the tradeoffs 
involved are, when we enable by default:
   
   1) Are the replication costs (b) lower now ? How do we estimate that cost ?
   (There was non-trivial impact when I had last done some expt's earlier)
   
   2) Are we (community) running into cases where we benefit from (a) but are 
not (very) negatively impacted by (b) ?
   Is there any commonality when this happens ?
   (application types/characterstics ? resource manager ? almost all usage ?)
   
   3) What is the impact to the application (and cluster) when we have 
nontrivial executor loss - executor release in DRA is one example of this, 
preemption is another.
   
   4) Anything else to watch out for ?
   
   As I mentioned earlier, I am fine with collecting data by enabling this flag 
by default.
   I am hoping this and other discussions will help us understand what 
questions to better evaluate before we release 3.2.
   
   
   > 1. For this question, I answered at the beginning that this is a kind of 
self-healing feature 
[here](https://github.com/apache/spark/pull/30876#discussion_r547031257)
   > 
   > > Making it default will impact all applications which have replication > 
1: given this PR is proposing to make it the default, I would like to know if 
there was any motivating reason to make this change ?
   
   Spark is self-healing via lineage :-)
   Having said that, as mentioned above, I want to understand what the tradeoff 
for enabling this flag are.
   
   > 
   > 1. For the following question, I asked your evidence first because I'm not 
aware of. :)
   > 
   > > If the cost of proactive replication is close to zero now (my 
experiments were from a while back), ofcourse the discussion is moot - did we 
have any results for this ?
   
   I am not proposing to change the default behavior, you are ... hence my 
query :-)
   As I mentioned above, when I had looked at this in the past - it was very 
helpful for some applications, but not others : it depended on the application 
and their requirements - `replication > 1` itself was not very commonly used 
then.
   
   > 
   > 1. For the following question, it seems that you assume that the current 
Spark's behavior is the best. I don't think this question justifies that the 
loss of data inside Spark side is good.
   > 
   > > What is the ongoing cost when application holds RDD references, but they 
are not in active use for rest of the application (not all references can be 
cleared by gc) - resulting in replication of blocks for an RDD which is 
legitimately not going to be used again ?
   
   Couple of points here:
   a) There is no data loss - spark recomputes when a lost block is required 
(but at some recomputation cost).
   b) My query was specifically about the cost for replication - given what I 
described is a common pattern in user applications : I was not saying this is 
desired code pattern, but it is a commonly observed behavior.
   
   
   > 
   > 1. For the following, yes, but `exacerbates` doesn't look like a proper 
term here because we had better make Spark smarter to handle those cases as I 
replied at 
[here](https://github.com/apache/spark/pull/30876#discussion_r547421217) 
already.
   > 
   > > Note that the above is orthogonal to DRA evicting an executor via 
storage timeout configuration. That just exacerbates the problem : since a 
larger number of executors could be lost.
   
   If we can do better on this, I am definitely very keen on it !
   Until that happens, we need to continue supporting existing scenarios where 
DRA impacts use of this flag.
   
   
   > 
   > 1. For the following, I didn't make this PR for that specific use case. I 
made this PR to improve this feature in various environment in Apache Spark 
3.2.0 timeframe 
[here](https://github.com/apache/spark/pull/30876#issuecomment-749953223).
   > 
   > > Specifically for this usecase, we dont need to make it a spark default 
right ? ...
   
   This was in response to the 
[scenario](https://github.com/apache/spark/pull/30876#issuecomment-750471287) 
described.
   Let us decouple discussion of that scenario from our discussion here - and 
focus on what we need to evaluate for enabling this by default.
   
   
   > 
   > 1. For the following, I replied that YARN environment also can suffer from 
disk loss or executor loss 
[here](https://github.com/apache/spark/pull/30876#issuecomment-751060200) 
because you insisted that YARN doesn't need this feature from the beginning. 
I'm still not sure that YARN environment is so 

[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751979073


   **[Test build #133477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133477/testReport)**
 for PR 30957 at commit 
[`6a7438b`](https://github.com/apache/spark/commit/6a7438bf6574d35ed841a7301f50003b4fb12341).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751978657


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133466/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751978657


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133466/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


SparkQA removed a comment on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751938979


   **[Test build #133466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133466/testReport)**
 for PR 30841 at commit 
[`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751978457


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751978457


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751978449


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


SparkQA commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751978390


   **[Test build #133466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133466/testReport)**
 for PR 30841 at commit 
[`a495f6d`](https://github.com/apache/spark/commit/a495f6d56411f2f3bb1e271babe9efad008b3959).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AngersZh commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751977979


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751977685


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133467/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751977685


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133467/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751977414


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38062/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751977414


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38062/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA removed a comment on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751951003


   **[Test build #133467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133467/testReport)**
 for PR 30957 at commit 
[`adc9ded`](https://github.com/apache/spark/commit/adc9ded0d8fe957b203c047e433381645fe944e9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751977032


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133474/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751977118


   **[Test build #133467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133467/testReport)**
 for PR 30957 at commit 
[`adc9ded`](https://github.com/apache/spark/commit/adc9ded0d8fe957b203c047e433381645fe944e9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751977032


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133474/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30956:
URL: https://github.com/apache/spark/pull/30956#issuecomment-751971918


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133464/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13

2020-12-28 Thread GitBox


SparkQA removed a comment on pull request #30951:
URL: https://github.com/apache/spark/pull/30951#issuecomment-751890200


   **[Test build #133460 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133460/testReport)**
 for PR 30951 at commit 
[`0d6ff72`](https://github.com/apache/spark/commit/0d6ff72b2272ccc355d076c8bf6f672d2da3751f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751971916


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38056/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30898:
URL: https://github.com/apache/spark/pull/30898#issuecomment-751972078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133462/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


SparkQA removed a comment on pull request #30898:
URL: https://github.com/apache/spark/pull/30898#issuecomment-751920157


   **[Test build #133462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133462/testReport)**
 for PR 30898 at commit 
[`d3b072e`](https://github.com/apache/spark/commit/d3b072e2d1db3aef0ea4ab80767ab739502f7e81).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30928: [SPARK-33912][SQL] Refactor DependencyUtils ivy property parameter

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30928:
URL: https://github.com/apache/spark/pull/30928#issuecomment-751971917


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38060/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30951:
URL: https://github.com/apache/spark/pull/30951#issuecomment-751975143


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133460/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AngersZh edited a comment on pull request #30958:
URL: https://github.com/apache/spark/pull/30958#issuecomment-751974605


   > Not related to the change. But I notice that some contributors usually use 
screenshots in the description. I personally don't recommend this approach. The 
images cannot be indexed and searched. So I suggest that for problem and fix 
description, some text are more helpful.
   
   Yea, thanks for your suggestion, I will update pr desc. And will pay 
attention to this problem.
   Maybe we should sen an email to mention this ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30951:
URL: https://github.com/apache/spark/pull/30951#issuecomment-751975143


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133460/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


cloud-fan closed pull request #30898:
URL: https://github.com/apache/spark/pull/30898


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


cloud-fan commented on pull request #30898:
URL: https://github.com/apache/spark/pull/30898#issuecomment-751974714


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread GitBox


MaxGekk commented on a change in pull request #30947:
URL: https://github.com/apache/spark/pull/30947#discussion_r549594135



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala
##
@@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits {
 
   implicit class MultipartIdentifierHelper(parts: Seq[String]) {
 if (parts.isEmpty) {
-  throw new AnalysisException("multi-part identifier cannot be empty.")
+  throw new AnalysisException("Namespaces in V1 catalog can have only a 
single name part.")

Review comment:
   Actually, `parts` includes a table name. When we say that `parts` cannot 
be empty, we require at least a table name.
   
   Probably, `Namespaces in V1 catalog can have only a single name part` could 
confuse users too.
   
   We should say something like a table identifier must contain either a table 
name or database + a table name.
   
   Specifically in the check, we should say **"Table identification must have 
at least a table name"**





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AngersZh commented on pull request #30958:
URL: https://github.com/apache/spark/pull/30958#issuecomment-751974605


   > Not related to the change. But I notice that some contributors usually use 
screenshots in the description. I personally don't recommend this approach. The 
images cannot be indexed and searched. So I suggest that for problem and fix 
description, some text are more helpful.
   
   Yea, thanks for your suggestion, I will update pr desc.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30951: [SPARK-33775][FOLLOWUP][test-maven][BUILD] Suppress maven compilation warnings in Scala 2.13

2020-12-28 Thread GitBox


SparkQA commented on pull request #30951:
URL: https://github.com/apache/spark/pull/30951#issuecomment-751974120


   **[Test build #133460 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133460/testReport)**
 for PR 30951 at commit 
[`0d6ff72`](https://github.com/apache/spark/commit/0d6ff72b2272ccc355d076c8bf6f672d2da3751f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751973873


   **[Test build #133474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133474/testReport)**
 for PR 30959 at commit 
[`65f97de`](https://github.com/apache/spark/commit/65f97dee09fde2bd77bf3514ed855278f15de974).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30315: [SPARK-33388][SQL] Merge In and InSet predicate

2020-12-28 Thread GitBox


SparkQA commented on pull request #30315:
URL: https://github.com/apache/spark/pull/30315#issuecomment-751973304


   **[Test build #133476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133476/testReport)**
 for PR 30315 at commit 
[`ed3530a`](https://github.com/apache/spark/commit/ed3530a560927fbbf78a142aa7aec98237b7a77c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549592550



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -272,8 +273,13 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
   }
   DescribeTableExec(desc.output, r.table, isExtended) :: Nil
 
-case DescribeColumn(_: ResolvedTable, _, _) =>
-  throw new AnalysisException("Describing columns is not supported for v2 
tables.")
+case desc @ DescribeColumn(_: ResolvedTable, column, isExtended) =>
+  column match {
+case c: Attribute =>
+  DescribeColumnExec(desc.output, c, isExtended) :: Nil
+case _ =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


SparkQA commented on pull request #30955:
URL: https://github.com/apache/spark/pull/30955#issuecomment-751972899


   **[Test build #133475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133475/testReport)**
 for PR 30955 at commit 
[`82a343c`](https://github.com/apache/spark/commit/82a343c8b2b1c2258d49cc5799a590d7ba0d7651).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549592416



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   > Construct the original name from GetStructField, GetArrayStructFields, 
etc.
   
   Is it simply `nested.sql`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2020-12-28 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-751972303


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38061/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549591975



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)

Review comment:
   It's possible when the column name doesn't exist in the table, and we 
should give a clear error message: `Column $colName does not exist`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30898:
URL: https://github.com/apache/spark/pull/30898#issuecomment-751972078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133462/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751972015


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38057/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30928: [SPARK-33912][SQL] Refactor DependencyUtils ivy property parameter

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30928:
URL: https://github.com/apache/spark/pull/30928#issuecomment-751971917


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38060/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751971916


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38056/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30956:
URL: https://github.com/apache/spark/pull/30956#issuecomment-751971918


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133464/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30898: [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread GitBox


SparkQA commented on pull request #30898:
URL: https://github.com/apache/spark/pull/30898#issuecomment-751971552


   **[Test build #133462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133462/testReport)**
 for PR 30898 at commit 
[`d3b072e`](https://github.com/apache/spark/commit/d3b072e2d1db3aef0ea4ab80767ab739502f7e81).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30955:
URL: https://github.com/apache/spark/pull/30955#discussion_r549590506



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] 
with PredicateHelper {
 foldables.nonEmpty && others.length < 2
   }
 
+  // Not all UnaryExpression support push into (if / case) branches, e.g. 
Alias.
+  private def supportedUnaryExpression(e: UnaryExpression): Boolean = e match {
+case _: IsNull | _: IsNotNull => true
+case _: UnaryMathExpression | _: Abs | _: Bin | _: Factorial | _: Hex => 
true
+case _: String2StringExpression | _: Ascii | _: Base64 | _: BitLength | _: 
Chr | _: Length =>
+  true
+case _: CastBase => true
+case _: GetDateField | _: LastDay => true
+case _: ExtractIntervalPart => true
+case _: ArraySetLike => true
+case _ => false
+  }
+
+  private def supportedBinaryExpression(e: BinaryExpression): Boolean = e 
match {

Review comment:
   let's add comments as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30948: [SPARK-33637][SQL] alter table drop partition equals alter table drop if ex…

2020-12-28 Thread GitBox


AngersZh commented on pull request #30948:
URL: https://github.com/apache/spark/pull/30948#issuecomment-751970421


   Seem PR title not completed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30955:
URL: https://github.com/apache/spark/pull/30955#discussion_r549590192



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] 
with PredicateHelper {
 foldables.nonEmpty && others.length < 2
   }
 
+  // Not all UnaryExpression support push into (if / case) branches, e.g. 
Alias.

Review comment:
   `Not all UnaryExpression can be pushed into (if / case) branches, e.g. 
Alias.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30955:
URL: https://github.com/apache/spark/pull/30955#discussion_r549590121



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -548,41 +548,66 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] 
with PredicateHelper {
 foldables.nonEmpty && others.length < 2
   }
 
+  // Not all UnaryExpression support push into (if / case) branches, e.g. 
Alias.
+  private def supportedUnaryExpression(e: UnaryExpression): Boolean = e match {
+case _: IsNull | _: IsNotNull => true
+case _: UnaryMathExpression | _: Abs | _: Bin | _: Factorial | _: Hex => 
true
+case _: String2StringExpression | _: Ascii | _: Base64 | _: BitLength | _: 
Chr | _: Length =>
+  true
+case _: CastBase => true
+case _: GetDateField | _: LastDay => true
+case _: ExtractIntervalPart => true
+case _: ArraySetLike => true
+case _ => false

Review comment:
   let's include `ExtractValue` as well, which is common with nested fields.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog

2020-12-28 Thread GitBox


cloud-fan closed pull request #30952:
URL: https://github.com/apache/spark/pull/30952


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog

2020-12-28 Thread GitBox


cloud-fan commented on pull request #30952:
URL: https://github.com/apache/spark/pull/30952#issuecomment-751969553


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killin

2020-12-28 Thread GitBox


SparkQA removed a comment on pull request #30956:
URL: https://github.com/apache/spark/pull/30956#issuecomment-751938899


   **[Test build #133464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133464/testReport)**
 for PR 30956 at commit 
[`ba0a4bc`](https://github.com/apache/spark/commit/ba0a4bca5a21417c78bac6626f4e1f6646c68a7b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30956: [SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle e

2020-12-28 Thread GitBox


SparkQA commented on pull request #30956:
URL: https://github.com/apache/spark/pull/30956#issuecomment-751969260


   **[Test build #133464 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133464/testReport)**
 for PR 30956 at commit 
[`ba0a4bc`](https://github.com/apache/spark/commit/ba0a4bca5a21417c78bac6626f4e1f6646c68a7b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751969171


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30947:
URL: https://github.com/apache/spark/pull/30947#discussion_r549589001



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala
##
@@ -118,7 +118,7 @@ private[sql] object CatalogV2Implicits {
 
   implicit class MultipartIdentifierHelper(parts: Seq[String]) {
 if (parts.isEmpty) {
-  throw new AnalysisException("multi-part identifier cannot be empty.")
+  throw new AnalysisException("Namespaces in V1 catalog can have only a 
single name part.")

Review comment:
   to be more precise: `Namespaces in V1 catalog cannot be empty`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751964598


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38056/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #30947: [SPARK-33926][SQL] Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread GitBox


MaxGekk commented on pull request #30947:
URL: https://github.com/apache/spark/pull/30947#issuecomment-751963956


   @cloud-fan Please, take a look at this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #30952: [SPARK-33924][SQL][TESTS] Preserve partition metadata by INSERT INTO in v2 table catalog

2020-12-28 Thread GitBox


MaxGekk commented on pull request #30952:
URL: https://github.com/apache/spark/pull/30952#issuecomment-751963841


   @cloud-fan @HyukjinKwon Please, review this fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #30853: [SPARK-33848][SQL] Push the UnaryExpression into (if / case) branches

2020-12-28 Thread GitBox


wangyum commented on a change in pull request #30853:
URL: https://github.com/apache/spark/pull/30853#discussion_r549583380



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -542,29 +542,42 @@ object PushFoldableIntoBranches extends Rule[LogicalPlan] 
with PredicateHelper {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsUp {
+  case a: Alias => a // Skip an alias.

Review comment:
   https://github.com/apache/spark/pull/30955





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751963050


   **[Test build #133473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133473/testReport)**
 for PR 30959 at commit 
[`3f9b69d`](https://github.com/apache/spark/commit/3f9b69dfe634e3de390a787b84cc195206ffb440).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


viirya commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549582590



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)

Review comment:
   Is it possible there is unresolved attribute but the `relation` of 
`DescribeColumn` is a v1 table?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2020-12-28 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-751961455


   **[Test build #133472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133472/testReport)**
 for PR 30212 at commit 
[`927f21e`](https://github.com/apache/spark/commit/927f21e3ebf9a3e71d2467fabe492d2b306a8037).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751959287


   **[Test build #133471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133471/testReport)**
 for PR 30957 at commit 
[`6a7438b`](https://github.com/apache/spark/commit/6a7438bf6574d35ed841a7301f50003b4fb12341).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action `build_and_test` job

2020-12-28 Thread GitBox


SparkQA commented on pull request #30959:
URL: https://github.com/apache/spark/pull/30959#issuecomment-751958605


   **[Test build #133469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133469/testReport)**
 for PR 30959 at commit 
[`edc4994`](https://github.com/apache/spark/commit/edc4994ae348a5c4c258143c57e015aeaf9d673f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


SparkQA commented on pull request #30958:
URL: https://github.com/apache/spark/pull/30958#issuecomment-751958635


   **[Test build #133470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133470/testReport)**
 for PR 30958 at commit 
[`1812826`](https://github.com/apache/spark/commit/1812826f67cc41ed8efb961e793c74a975e27d5d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


SparkQA commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751958171


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38056/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751957782


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38054/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


AmplabJenkins removed a comment on pull request #30955:
URL: https://github.com/apache/spark/pull/30955#issuecomment-751957781


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38053/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751957782


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38054/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30955: [SPARK-33848][SQL][FOLLOWUP] Introduce allowList for push into (if / case) branches

2020-12-28 Thread GitBox


AmplabJenkins commented on pull request #30955:
URL: https://github.com/apache/spark/pull/30955#issuecomment-751957781


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38053/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #30959: [SPARK-33931][INFRA] Recover GitHub Action

2020-12-28 Thread GitBox


dongjoon-hyun opened a new pull request #30959:
URL: https://github.com/apache/spark/pull/30959


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


imback82 commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549576786



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala
##
@@ -97,9 +98,26 @@ case class ResolvedNamespace(catalog: CatalogPlugin, 
namespace: Seq[String])
 /**
  * A plan containing resolved table.
  */
-case class ResolvedTable(catalog: TableCatalog, identifier: Identifier, table: 
Table)
+case class ResolvedTable(
+catalog: TableCatalog,
+identifier: Identifier,
+table: Table,
+outputAttributes: Seq[Attribute])
   extends LeafNode {
-  override def output: Seq[Attribute] = Nil
+  override def output: Seq[Attribute] = {
+val qualifier = catalog.name +: identifier.namespace :+ identifier.name
+outputAttributes.map(_.withQualifier(qualifier))

Review comment:
   Or we can wrap this with `SubqueryAlias` similar to how 
`DataSourceV2Relation` is wrapped, but we need to update everywhere 
`ResolvedTable` is matched.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2020-12-28 Thread GitBox


AngersZh commented on pull request #30957:
URL: https://github.com/apache/spark/pull/30957#issuecomment-751953363


   FYI @cloud-fan @maropu @alfozan



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AngersZh commented on pull request #30958:
URL: https://github.com/apache/spark/pull/30958#issuecomment-751953177


   FYI @maropu @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #30958: [SPARK-33930][SQL] Spark SQL no serde row format field delimit default value is '\u0001'

2020-12-28 Thread GitBox


AngersZh opened a new pull request #30958:
URL: https://github.com/apache/spark/pull/30958


   ### What changes were proposed in this pull request?
   For same SQL
   ```
   SELECT TRANSFORM(a, b, c, null)
   ROW FORMAT DELIMITED
   USING 'cat' 
   ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '&'
   FROM (select 1 as a, 2 as b, 3  as c) t
   ```
   In hive:
   
![image](https://user-images.githubusercontent.com/46485123/103260903-5c968a80-49da-11eb-9675-7c66b2ee35fb.png)
   
   In Spark 
   
![image](https://user-images.githubusercontent.com/46485123/103260912-67511f80-49da-11eb-93df-663543c8e91f.png)
   
   We should keep same. Change default ROW FORMAT FIELD DELIMIT to `\u0001`
   
   ### Why are the changes needed?
   Keep same behavior with hive
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added UT
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2020-12-28 Thread GitBox


SparkQA commented on pull request #30841:
URL: https://github.com/apache/spark/pull/30841#issuecomment-751952088


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38054/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >