subject:"\[GitHub\] \[spark\] cloud\-fan commented on a change in pull request #25626\: \[SPARK\-28892\]\[SQL\] Add UPDATE support for DataSource V2"

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-17 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r325485701
 
 

 ##
 File path: docs/sql-keywords.md
 ##
 @@ -280,6 +280,7 @@ Below is a list of all the keywords in Spark SQL.
   
UNKNOWNreservednon-reservedreserved
   
UNLOCKnon-reservednon-reservednon-reserved
   
UNSETnon-reservednon-reservednon-reserved
+  
UPDATEnon-reservednon-reservedreserved
 
 Review comment:
   I checked other keywords and seems like the general rule here is to follow 
Postgres, we can revisit it later. cc @gengliangwang 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-17 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r325242615
 
 

 ##
 File path: docs/sql-keywords.md
 ##
 @@ -280,6 +280,7 @@ Below is a list of all the keywords in Spark SQL.
   
UNKNOWNreservednon-reservedreserved
   
UNLOCKnon-reservednon-reservednon-reserved
   
UNSETnon-reservednon-reservednon-reserved
+  
UPDATEnon-reservednon-reservedreserved
 
 Review comment:
   cc @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-17 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r325242527
 
 

 ##
 File path: docs/sql-keywords.md
 ##
 @@ -280,6 +280,7 @@ Below is a list of all the keywords in Spark SQL.
   
UNKNOWNreservednon-reservedreserved
   
UNLOCKnon-reservednon-reservednon-reserved
   
UNSETnon-reservednon-reservednon-reserved
+  
UPDATEnon-reservednon-reservedreserved
 
 Review comment:
   if UPDATE is reserved in SQL spec, we should make it reserved when ansi 
parser is enabled.
   
   We should put UPDATE in `ansiReserved`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-11 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r323098286
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/UpdateTableExec.scala
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.SparkException
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalog.v2.expressions.Expression
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.execution.LeafExecNode
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.sources.v2.SupportsUpdate
+
+
+case class UpdateTableExec(
+table: SupportsUpdate,
+attrs: Seq[String],
+values: Seq[Expression],
+condition: Array[Filter]) extends LeafExecNode {
+
+  override protected def doExecute(): RDD[InternalRow] = {
+try {
+  table.updateWhere(attrs.zip(values).toMap.asJava, condition)
+} catch {
+  case e: IllegalArgumentException =>
 
 Review comment:
   why catch this but not other exceptions?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-11 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r323096593
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,21 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+/**
+ * Update the data of table that specified with the condition with the given 
updated values.
+ * NOTE: Considering nested fields, we let the type of `attrs` to be 
`Seq[Expression]` becuase
+ * nested fields may be resolved to objects of type like `GetStructField`. 
However, currently
+ * we only support top-level fields, in which case `attrs` is actually of type 
`Seq[Attribute]`.
+ */
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Expression],
 
 Review comment:
   Note that, since we don't forbid nested fields at parser level, analyzer can 
resolve the column name to either `Attribute` or nested field `GetStructField`. 
We fail the query at planning time if nested fields are found. Hopefully we can 
get rid of this limitation in the future.
   
   If we change this to `Seq[Attribute]`, we must rely on type hack to hold 
`GetStructField` in `Seq[Attribute]`, which is too tricky.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-11 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r323096593
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,21 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+/**
+ * Update the data of table that specified with the condition with the given 
updated values.
+ * NOTE: Considering nested fields, we let the type of `attrs` to be 
`Seq[Expression]` becuase
+ * nested fields may be resolved to objects of type like `GetStructField`. 
However, currently
+ * we only support top-level fields, in which case `attrs` is actually of type 
`Seq[Attribute]`.
+ */
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Expression],
 
 Review comment:
   Note that, since we don't forbid nested fields at parser level, analyzer can 
resolve the column name to either `Attribute` or nested field `GetStructField`. 
We fail the query at planning time if nested fields are found. Hopefully we can 
get rid of this limitation in the future.
   
   If we change this to `Seq[Attribute]`, we must rely on type hack to hold 
`GetStructField` in `Seq[Attribute]`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-11 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r323094608
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/SupportsUpdate.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.Map;
+
+import org.apache.spark.annotation.Experimental;
+import org.apache.spark.sql.catalog.v2.expressions.Expression;
+import org.apache.spark.sql.sources.Filter;
+
+/**
+ * A mix-in interface for {@link Table} update support. Data sources can 
implement this
+ * interface to provide the ability to update data that matches filter 
expressions
+ * with the given sets.
+ */
+@Experimental
+public interface SupportsUpdate {
+
+  /**
+   * Update data that matches filter expressions with the given sets for a 
data source table.
+   * 
+   * Rows will be updated with the given values iff all of the filter 
expressions match.
+   * That is, the expressions must be interpreted as a set of filters that are 
ANDed together.
+   * 
+   * Implementations may reject a update operation if the update isn't 
possible without significant
+   * effort or it cannot deal with the sets expression. For example, 
partitioned data sources may
+   * reject updates that do not filter by partition columns because the filter 
may require
+   * rewriting files. The update may also be rejected if the update requires a 
complex computation
+   * that the data source does not support.
+   * To reject a delete implementations should throw {@link 
IllegalArgumentException} with a clear
+   * error message that identifies which expression was rejected.
+   *
+   * @param sets the fields to be updated and the corresponding updated values 
in form of
+   * key-value pairs in a map. The value can be a literal, or a 
simple expression
+   * like {{{ originalValue + 1 }}}.
+   * @param filters filter expressions, used to select rows to delete when all 
expressions match
 
 Review comment:
   `select rows to delete` -> `select rows to udate`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-11 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r323094608
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/SupportsUpdate.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.Map;
+
+import org.apache.spark.annotation.Experimental;
+import org.apache.spark.sql.catalog.v2.expressions.Expression;
+import org.apache.spark.sql.sources.Filter;
+
+/**
+ * A mix-in interface for {@link Table} update support. Data sources can 
implement this
+ * interface to provide the ability to update data that matches filter 
expressions
+ * with the given sets.
+ */
+@Experimental
+public interface SupportsUpdate {
+
+  /**
+   * Update data that matches filter expressions with the given sets for a 
data source table.
+   * 
+   * Rows will be updated with the given values iff all of the filter 
expressions match.
+   * That is, the expressions must be interpreted as a set of filters that are 
ANDed together.
+   * 
+   * Implementations may reject a update operation if the update isn't 
possible without significant
+   * effort or it cannot deal with the sets expression. For example, 
partitioned data sources may
+   * reject updates that do not filter by partition columns because the filter 
may require
+   * rewriting files. The update may also be rejected if the update requires a 
complex computation
+   * that the data source does not support.
+   * To reject a delete implementations should throw {@link 
IllegalArgumentException} with a clear
+   * error message that identifies which expression was rejected.
+   *
+   * @param sets the fields to be updated and the corresponding updated values 
in form of
+   * key-value pairs in a map. The value can be a literal, or a 
simple expression
+   * like {{{ originalValue + 1 }}}.
+   * @param filters filter expressions, used to select rows to delete when all 
expressions match
 
 Review comment:
   `select rows to delete` -> `select rows to update`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-05 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r321246693
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -422,12 +422,30 @@ case class DataSourceStrategy(conf: SQLConf) extends 
Strategy with Logging with
 }
 
 object DataSourceStrategy {
+
+  /**
+   * Tries to translate a Catalyst [[Expression]] into data source 
[[Expression]].
+   *
+   * @return a `Some[catalog.v2.expressions.Expression]` if the input 
[[Expression]]
+   * is convertible, otherwise a `None`.
+   */
+  protected[sql] def translateExpression(
 
 Review comment:
   how about the `LogicalExpressions` object?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r321062531
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -422,12 +422,30 @@ case class DataSourceStrategy(conf: SQLConf) extends 
Strategy with Logging with
 }
 
 object DataSourceStrategy {
+
+  /**
+   * Tries to translate a Catalyst [[Expression]] into data source 
[[Expression]].
+   *
+   * @return a `Some[catalog.v2.expressions.Expression]` if the input 
[[Expression]]
+   * is convertible, otherwise a `None`.
+   */
+  protected[sql] def translateExpression(
 
 Review comment:
   can we create a new object in `spark.sql.connector.expression` to put this 
method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r321061823
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +245,31 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
 
 Review comment:
   In general we should always use table capability, but for delete/upadte, 
it's really weird to add capability as table must implement 
`SupportDelete/Update`, which is redundant to the capability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r321060224
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -422,12 +422,30 @@ case class DataSourceStrategy(conf: SQLConf) extends 
Strategy with Logging with
 }
 
 object DataSourceStrategy {
+
+  /**
+   * Tries to translate a Catalyst [[Expression]] into data source 
[[Expression]].
+   *
+   * @return a `Some[catalog.v2.expressions.Expression]` if the input 
[[Expression]]
+   * is convertible, otherwise a `None`.
+   */
+  protected[sql] def translateExpression(
+  value: Expression): Option[catalog.v2.expressions.Expression] = {
+// TODO: currently we only support literal. Will add more public 
expression in future.
+value match {
+  case l: Literal =>
+Some(catalog.v2.expressions.LiteralValue(l.value, l.dataType))
+  case _ =>
+None
+}
+  }
+
   /**
* The attribute name of predicate could be different than the one in schema 
in case of
* case insensitive, we should change them to match the one in schema, so we 
do not need to
* worry about case sensitivity anymore.
*/
-  protected[sql] def normalizeFilters(
+  protected[sql] def normalizeAttrNames(
 
 Review comment:
   If we don't, then it's weird to use `normalizeFilters` to normalize update 
expressions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320738922
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +245,35 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  if (condition.exists(SubqueryExpression.hasSubquery)) {
+throw new AnalysisException(
+s"Update by condition with subquery is not supported: $condition")
+  }
+  val nested = attrs.filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
 
 Review comment:
   We don't need this now. The check is already done below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320692778
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2WriteSupportCheck.scala
 ##
 @@ -56,6 +56,11 @@ object V2WriteSupportCheck extends (LogicalPlan => Unit) {
 failAnalysis(s"Delete by condition with subquery is not supported: 
$condition")
   }
 
+case UpdateTable(_, _, _, condition) =>
 
 Review comment:
   can we move it to `DataSourceV2Strategy`? This is not a sanity check, this 
is just current limitation. We may support subquery in the future and it's 
better to put the current limitation near the implementation.
   
   Let's also move the delete check there too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320691685
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +245,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = attrs.filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+  val attrsNames = DataSourceStrategy.normalizeAttrNames(attrs, r.output)
+  .asInstanceOf[Seq[AttributeReference]].map(_.name)
 
 Review comment:
   nit:
   ```
   val attrsNames = DataSourceStrategy.normalizeAttrNames(attrs, r.output).map {
 case a: AttributeReference => a.name
 case other => throw new AnalysisException(s"Update only support non-nested 
fields. Nested: $other")
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320689949
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -352,6 +352,36 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 DeleteFromStatement(tableId, tableAlias, 
expression(ctx.whereClause().booleanExpression()))
   }
 
+  override def visitUpdateTable(ctx: UpdateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val tableAlias = if (ctx.tableAlias() != null) {
+  val ident = ctx.tableAlias().strictIdentifier()
+  // We do not allow columns aliases after table alias.
+  if (ctx.tableAlias().identifierList() != null) {
+throw new ParseException("Columns aliases is not allowed in UPDATE.",
+  ctx.tableAlias().identifierList())
+  }
+  if (ident != null) { Some(ident.getText) } else { None }
 
 Review comment:
   nit: `{` and `}` are not needed here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320687617
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
 ##
 @@ -158,6 +158,42 @@ class InMemoryTable(
   override def deleteWhere(filters: Array[Filter]): Unit = 
dataMap.synchronized {
 dataMap --= InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, 
filters)
   }
+
+
+  // NOTE: this only support updating data with a literal, i.e., it cannot 
update
+  // a date `d` to `date_add(d, 1)`, which is an function.
+  override def updateWhere(sets: util.Map[String, Expression],
+  filters: Array[Filter]): Unit = dataMap.synchronized {
+val keys = InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, 
filters)
+
+val newData = keys.map {
+  key =>
+val newRows = new BufferedRows
+dataMap.get(key) match {
+  case Some(rows) =>
+rows.rows.foreach {
+  row =>
+val values = mutable.Buffer[Any]()
+row.toSeq(schema).copyToBuffer(values)
 
 Review comment:
   isn't array mutable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320646035
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
 ##
 @@ -158,6 +158,42 @@ class InMemoryTable(
   override def deleteWhere(filters: Array[Filter]): Unit = 
dataMap.synchronized {
 dataMap --= InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, 
filters)
   }
+
+
+  // NOTE: this only support updating data with a literal, i.e., it cannot 
update
+  // a date `d` to `date_add(d, 1)`, which is an function.
+  override def updateWhere(sets: util.Map[String, Expression],
+  filters: Array[Filter]): Unit = dataMap.synchronized {
+val keys = InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, 
filters)
+
+val newData = keys.map {
+  key =>
+val newRows = new BufferedRows
+dataMap.get(key) match {
+  case Some(rows) =>
+rows.rows.foreach {
+  row =>
+val values = mutable.Buffer[Any]()
+row.toSeq(schema).copyToBuffer(values)
 
 Review comment:
   I think we can simply call `toSeq(...).toArray`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320645329
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
 ##
 @@ -764,6 +764,19 @@ class DDLParserSuite extends AnalysisTest {
 assert(exc.getMessage.contains("INSERT INTO ... IF NOT EXISTS"))
   }
 
+  test("update table: columns aliases is not allowed") {
 
 Review comment:
   let's add some simple positive tests as well. You can take a look at how 
other commands are tested in this suite.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320645329
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
 ##
 @@ -764,6 +764,19 @@ class DDLParserSuite extends AnalysisTest {
 assert(exc.getMessage.contains("INSERT INTO ... IF NOT EXISTS"))
   }
 
+  test("update table: columns aliases is not allowed") {
 
 Review comment:
   let's add some simple positive tests as well. You can take a look at how 
other commands are tested in this suite?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320638891
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1767,6 +1769,240 @@ class DataSourceV2SQLSuite
 }
   }
 
+  test("Update: basic") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   |CREATE TABLE $t (id bigint, name string, age int, p int)
+   |USING foo
+   |PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   |INSERT INTO $t
+   |VALUES (1L, 'Herry', 26, 1),
+   |(2L, 'Jack', 31, 2),
+   |(3L, 'Lisa', 28, 3),
+   |(4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET name='Robert', age=32")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Robert", 32, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Robert", 32, 3),
+  Row(4, "Robert", 32, 3)))
+}
+  }
+
+  test("Update: update with where clause") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   |CREATE TABLE $t (id bigint, name string, age int, p int)
+   |USING foo
+   |PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   |INSERT INTO $t
+   |VALUES (1L, 'Herry', 26, 1),
+   |(2L, 'Jack', 31, 2),
+   |(3L, 'Lisa', 28, 3),
+   |(4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET name='Robert', age=32 where p=2")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 3)))
+}
+  }
+
+  test("Update: update the partition key") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   |CREATE TABLE $t (id bigint, name string, age int, p int)
+   |USING foo
+   |PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   |INSERT INTO $t
+   |VALUES (1L, 'Herry', 26, 1),
+   |(2L, 'Jack', 31, 2),
+   |(3L, 'Lisa', 28, 3),
+   |(4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET p=4 where id=4")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Jack", 31, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 4)))
+}
+  }
+
+  test("Update: update with aliased target table") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   |CREATE TABLE $t (id bigint, name string, age int, p int)
+   |USING foo
+   |PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   |INSERT INTO $t
+   |VALUES (1L, 'Herry', 26, 1),
+   |(2L, 'Jack', 31, 2),
+   |(3L, 'Lisa', 28, 3),
+   |(4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t AS tbl SET tbl.name='Robert', tbl.age=32 where p=2")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 3)))
+}
+  }
+
+  test("Update: normalize attribute names") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   |CREATE TABLE $t (id bigint, name string, age int, p int)
+   |USING foo
+   |PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   |INSERT INTO $t
+   |VALUES (1L, 'Herry', 26, 1),
+   |(2L, 'Jack', 31, 2),
+   |(3L, 'Lisa', 28, 3),
+   |(4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t AS tbl SET tbl.NAME='Robert', tbl.AGE=32 where P=2")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 3)))
+}
+  }
+
+  test("Update: columns aliases is not allowed") {
 
 Review comment:
   We may need to rename `DDLParserSuite`. In fact it contains parser tests for 
all commands, including INSERT. I think DELETE/UPDATE should be tested there as 
well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320637903
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+  val attrsNames = DataSourceStrategy.normalizeFilters(attrs, r.output)
+  .asInstanceOf[Seq[NamedExpression]].map(_.name)
+  // fail if any updated value cannot be converted.
+  val updatedValues = values.map {
+v => DataSourceStrategy.translateExpression(v).getOrElse(
 
 Review comment:
   we should normalize attr name here too, in case we support more expressions 
in the future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320637238
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+  val attrsNames = DataSourceStrategy.normalizeFilters(attrs, r.output)
 
 Review comment:
   I think we should rename `normalizeFilters` to `normalizeAttrNames`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320637238
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+  val attrsNames = DataSourceStrategy.normalizeFilters(attrs, r.output)
 
 Review comment:
   I think we should rename `normalizeFilters` to `normalizeAttributesNames`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320636432
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,56 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(maybeRelation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+
+  val (relation, attrsNames, newValues, newCond) = maybeRelation match {
+case d: DataSourceV2Relation =>
+  (d, attrs.map(_.name), values, condition)
+case Project(aliasList: Seq[Alias], r: DataSourceV2Relation) =>
 
 Review comment:
   Yea the latter one provides better error message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320610391
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,15 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Attribute],
 
 Review comment:
   Or we can change the analyzer to only resolve column names in `UpdateTable` 
as attributes, which I don't think is worthwhile.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320610002
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,15 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Attribute],
 
 Review comment:
   Then I think we have no choice but use `Seq[Expression]`. We should add 
comments to explain that `attrs` here holds the expressions that maybe 
attributes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320608058
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1767,6 +1769,213 @@ class DataSourceV2SQLSuite
 }
   }
 
+  test("Update: basic") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   | CREATE TABLE $t (id bigint, name string, age int, p int)
+   | USING foo
+   | PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   | INSERT INTO $t
+   | VALUES (1L, 'Herry', 26, 1),
+   | (2L, 'Jack', 31, 2),
+   | (3L, 'Lisa', 28, 3),
+   | (4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET name='Robert', age=32")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Robert", 32, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Robert", 32, 3),
+  Row(4, "Robert", 32, 3)))
+}
+  }
+
+  test("Update: update with where clause") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   | CREATE TABLE $t (id bigint, name string, age int, p int)
+   | USING foo
+   | PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   | INSERT INTO $t
+   | VALUES (1L, 'Herry', 26, 1),
+   | (2L, 'Jack', 31, 2),
+   | (3L, 'Lisa', 28, 3),
+   | (4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET name='Robert', age=32 where p=2")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 3)))
+}
+  }
+
+  test("Update: update the partition key") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   | CREATE TABLE $t (id bigint, name string, age int, p int)
+   | USING foo
+   | PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   | INSERT INTO $t
+   | VALUES (1L, 'Herry', 26, 1),
+   | (2L, 'Jack', 31, 2),
+   | (3L, 'Lisa', 28, 3),
+   | (4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t SET p=4 where id=4")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Jack", 31, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 4)))
+}
+  }
+
+  test("Update: update with aliased target table") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   | CREATE TABLE $t (id bigint, name string, age int, p int)
+   | USING foo
+   | PARTITIONED BY (id, p)
+ """.stripMargin)
+  sql(
+s"""
+   | INSERT INTO $t
+   | VALUES (1L, 'Herry', 26, 1),
+   | (2L, 'Jack', 31, 2),
+   | (3L, 'Lisa', 28, 3),
+   | (4L, 'Frank', 33, 3)
+ """.stripMargin)
+  sql(s"UPDATE $t AS tbl SET tbl.name='Robert', tbl.age=32 where p=2")
+
+  checkAnswer(spark.table(t),
+Seq(Row(1, "Herry", 26, 1),
+  Row(2, "Robert", 32, 2),
+  Row(3, "Lisa", 28, 3),
+  Row(4, "Frank", 33, 3)))
+}
+  }
+
+  test("Update: columns aliases is not allowed.") {
 
 Review comment:
   let's test it in the `DDLParserSuite`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320607765
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1767,6 +1769,213 @@ class DataSourceV2SQLSuite
 }
   }
 
+  test("Update: basic") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(
+s"""
+   | CREATE TABLE $t (id bigint, name string, age int, p int)
 
 Review comment:
   why put a space after `|`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320607427
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+
+  val attrsNames = attrs.map(_.name)
+  // fail if any updated value cannot be converted.
+  val updatedValues = values.map {
+v => DataSourceStrategy.translateExpression(v).getOrElse(
+  throw new AnalysisException(s"Exec update failed:" +
+  s" cannot translate update set to source expression: $v"))
+  }
+  // fail if any filter cannot be converted. correctness depends on 
removing all matching data.
+  val filters = condition.map(
+splitConjunctivePredicates(_).map {
+  f => DataSourceStrategy.translateFilter(f).getOrElse(
+throw new AnalysisException(s"Exec update failed:" +
+s" cannot translate expression to source filter: $f"))
+}.toArray).getOrElse(Array.empty[Filter])
+  UpdateTableExec(r.table.asUpdatable, attrsNames, updatedValues, 
filters)::Nil
 
 Review comment:
   We should normalize the attribute name. Let's say a table has a column `id`, 
and users write `UPDATE ... ID=1`, then the `AttributeReference` will be named 
`ID` instead of `id`.
   
   The same thing should also apply to filters. You can take a look at 
`DataSourceStrategy.normalizeFilters` and see where we call it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320605613
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
 
 Review comment:
   Can we remove `.asInstanceOf[Seq[Any]]` now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320604850
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -352,6 +352,36 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 DeleteFromStatement(tableId, tableAlias, 
expression(ctx.whereClause().booleanExpression()))
   }
 
+  override def visitUpdateTable(ctx: UpdateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val tableAlias = if (ctx.tableAlias() != null) {
+  val ident = ctx.tableAlias().strictIdentifier()
+  // We do not allow columns aliases after table alias.
+  val identList = ctx.tableAlias().identifierList()
+  if (identList != null) {
 
 Review comment:
   we can just write `ctx.tableAlias().identifierList() != null`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320604716
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -352,6 +352,36 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 DeleteFromStatement(tableId, tableAlias, 
expression(ctx.whereClause().booleanExpression()))
   }
 
+  override def visitUpdateTable(ctx: UpdateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val tableAlias = if (ctx.tableAlias() != null) {
+  val ident = ctx.tableAlias().strictIdentifier()
+  // We do not allow columns aliases after table alias.
+  val identList = ctx.tableAlias().identifierList()
+  if (identList != null) {
+throw new ParseException("Columns aliases is not allowed.", identList)
 
 Review comment:
   let's make the error message more specific: `Columns aliases is not allowed 
in UPDATE.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320604116
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -215,6 +215,7 @@ statement
 | SET .*?  
#setConfiguration
 | RESET
#resetConfiguration
 | DELETE FROM multipartIdentifier tableAlias whereClause   
#deleteFromTable
+| UPDATE multipartIdentifier tableAlias setClause (whereClause)?   
#updateTable
 
 Review comment:
   let's change `tableAlias` to `(AS? strictIdentifier)?`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320604116
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -215,6 +215,7 @@ statement
 | SET .*?  
#setConfiguration
 | RESET
#resetConfiguration
 | DELETE FROM multipartIdentifier tableAlias whereClause   
#deleteFromTable
+| UPDATE multipartIdentifier tableAlias setClause (whereClause)?   
#updateTable
 
 Review comment:
   let's change `tableAlias` to `(AS? strictIdentifier)?`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320604116
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -215,6 +215,7 @@ statement
 | SET .*?  
#setConfiguration
 | RESET
#resetConfiguration
 | DELETE FROM multipartIdentifier tableAlias whereClause   
#deleteFromTable
+| UPDATE multipartIdentifier tableAlias setClause (whereClause)?   
#updateTable
 
 Review comment:
   let's change `tableAlias` to `(AS? strictIdentifier identifierList?)?`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-04 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320603803
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,56 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(maybeRelation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+
+  val (relation, attrsNames, newValues, newCond) = maybeRelation match {
+case d: DataSourceV2Relation =>
+  (d, attrs.map(_.name), values, condition)
+case Project(aliasList: Seq[Alias], r: DataSourceV2Relation) =>
 
 Review comment:
   Let's only keep table alias. We can change the parser, replace `tableAlias` 
with `(AS? strictIdentifier)?`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-03 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320535543
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,56 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(maybeRelation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new AnalysisException(s"Update only support non-nested fields. 
Nested: $nested")
+  }
+
+  val (relation, attrsNames, newValues, newCond) = maybeRelation match {
+case d: DataSourceV2Relation =>
+  (d, attrs.map(_.name), values, condition)
+case Project(aliasList: Seq[Alias], r: DataSourceV2Relation) =>
 
 Review comment:
   This seems like too much code. Can you check with other databases and see if 
they support column alias as well? If most of them don't support it, we can 
throw exception in the parser if column alias is used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-03 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320533863
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,15 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Attribute],
 
 Review comment:
   how about `Seq[NamedExpression]`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-02 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320095937
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ##
 @@ -574,6 +574,15 @@ case class DeleteFromTable(
   override def children: Seq[LogicalPlan] = child :: Nil
 }
 
+case class UpdateTable(
+child: LogicalPlan,
+attrs: Seq[Attribute],
 
 Review comment:
   can we really use `Seq[Attribute]`? When Spark resolves it to nested field, 
it will be `Alias` which is not an `Attribute`, and we will get weird errors.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-02 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320095359
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1767,6 +1767,125 @@ class DataSourceV2SQLSuite
 }
   }
 
+  test("Update: basic") {
+val t = "testcat.ns1.ns2.tbl"
+withTable(t) {
+  sql(s"CREATE TABLE $t (id bigint, name string, age int, p int)" +
 
 Review comment:
   nit: we can use multiline string, e.g.
   ```
   sql(
 s"""
  |xxx
 """.stripMargin)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-02 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320095086
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
 
 Review comment:
   why do we need the `.asInstanceOf[Seq[Any]]`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-02 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320095143
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +246,28 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val nested = 
attrs.asInstanceOf[Seq[Any]].filterNot(_.isInstanceOf[AttributeReference])
+  if (nested.nonEmpty) {
+throw new RuntimeException(s"Update only support non-nested fields. 
Nested: $nested")
 
 Review comment:
   I'd prefer AnalysisException


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-09-02 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r320093967
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -352,6 +352,31 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 DeleteFromStatement(tableId, tableAlias, 
expression(ctx.whereClause().booleanExpression()))
   }
 
+  override def visitUpdateTable(ctx: UpdateTableContext): LogicalPlan = 
withOrigin(ctx) {
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier)
+val tableAlias = if (ctx.tableAlias() != null) {
+  val ident = ctx.tableAlias().strictIdentifier()
+  if (ident != null) { Some(ident.getText) } else { None }
+} else {
+  None
+}
+val sets = ctx.setClause().assign().asScala.map {
+  kv => visitMultipartIdentifier(kv.key) -> expression(kv.value)
+}.toMap
 
 Review comment:
   instead of `toMap` here and get keys/values later, how about
   ```
   val (attrs, values) = ctx.setClause().unzip()
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add UPDATE support for DataSource V2

2019-08-30 Thread GitBox

cloud-fan commented on a change in pull request #25626: [SPARK-28892][SQL] Add 
UPDATE support for DataSource V2
URL: https://github.com/apache/spark/pull/25626#discussion_r319385359
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ##
 @@ -245,6 +245,22 @@ object DataSourceV2Strategy extends Strategy with 
PredicateHelper {
   }.toArray
   DeleteFromTableExec(r.table.asDeletable, filters) :: Nil
 
+case UpdateTable(r: DataSourceV2Relation, attrs, values, condition) =>
+  val attrsNames = attrs.map(_.name)
 
 Review comment:
   shall we fail if some `attrs` are resolved to a nested field?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

45 matches

Mail list logo