[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465523230



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,

Review comment:
   I don't know which PR causes this regression problem since there is a 
big gap between 2.4 and 3.0 about CTE. I am not familiar with this part.
   I fixed this by debuging. In 3.0, I found only the `child of CTE` could 
enter into the rule of `ResolveCoalesceHints`
   For the above test case:
   ```
   CTE [cte]
   :  +- 'SubqueryAlias `cte`
   : +- 'UnresolvedHint REPARTITION, [3]
   :+- 'Project [*]
   :   +- 'UnresolvedRelation `t`
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```
   Only this child `Project` branch could enter into the rule of 
`ResolveCoalesceHints`
   ```
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465523230



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,

Review comment:
   I don't know which PR causes this regression problem since there is a 
big gap between 2.4 and 3.0 about CTE. I am not familiar with this part.
   I fixed this by debuging. I found only the `child of CTE` could enter into 
the rule of `ResolveCoalesceHints`
   For the above test case:
   ```
   CTE [cte]
   :  +- 'SubqueryAlias `cte`
   : +- 'UnresolvedHint REPARTITION, [3]
   :+- 'Project [*]
   :   +- 'UnresolvedRelation `t`
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```
   Only this child `Project` branch could enter into the rule of 
`ResolveCoalesceHints`
   ```
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465523230



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,

Review comment:
   I don't know which PR causes this regression problem since there is a 
big gap between 2.4 and 3.0 about CTE. I am not familiar with this part.
   I fixed this by debuging. In 3.0, I found only the `child of CTE` could 
enter into the rule of `ResolveCoalesceHints`. For the above test case:
   ```
   CTE [cte]
   :  +- 'SubqueryAlias `cte`
   : +- 'UnresolvedHint REPARTITION, [3]
   :+- 'Project [*]
   :   +- 'UnresolvedRelation `t`
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```
   Only this child `Project` branch could enter into the rule of 
`ResolveCoalesceHints`
   ```
   +- 'Project [*]
  +- 'UnresolvedRelation `cte`
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465516673



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 }
   }
+
+  test("SPARK-32237: Hint in CTE") {
+withTable("t") {
+  sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id")
+  checkAnswer(
+sql(s"""
+  |WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t)

Review comment:
   Yes. The same issue reported by multiple times in Jira with different 
use cases, like SPARK-32237, SPARK-32347, SPARK-32535, I will file a new PR to 
add more unit tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465516673



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 }
   }
+
+  test("SPARK-32237: Hint in CTE") {
+withTable("t") {
+  sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id")
+  checkAnswer(
+sql(s"""
+  |WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t)

Review comment:
   Yes. I will file a new PR to add more unit tests.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-08-05 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r465516846



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 }
   }
+
+  test("SPARK-32237: Hint in CTE") {

Review comment:
   Sure. I will add a new suite.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-22 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r458529424



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,
+  CTESubstitution,
+  WindowsSubstitution,
+  EliminateUnions,
+  new SubstituteUnresolvedOrdinals(conf)),

Review comment:
   I found the problem also exists in branch-2.4. See 
https://github.com/apache/spark/pull/29062#issuecomment-662233804





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-21 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r458529424



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,
+  CTESubstitution,
+  WindowsSubstitution,
+  EliminateUnions,
+  new SubstituteUnresolvedOrdinals(conf)),

Review comment:
   I found the problem also exists in branch-2.4. See 
https://github.com/apache/spark/pull/29062#issuecomment-662233804





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-12 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r453409787



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -200,18 +200,18 @@ class Analyzer(
   val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil
 
   lazy val batches: Seq[Batch] = Seq(
+Batch("Substitution", fixedPoint,
+  CTESubstitution,
+  WindowsSubstitution,
+  EliminateUnions,
+  new SubstituteUnresolvedOrdinals(conf)),

Review comment:
   It's hard to find which change causing it for me since 2.4 and 3.0 have 
many differences that I don't know.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-09 Thread GitBox


LantaoJin commented on a change in pull request #29062:
URL: https://github.com/apache/spark/pull/29062#discussion_r452626883



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
##
@@ -756,6 +756,19 @@ class DataSourceV2SQLSuite
 }
   }
 
+  test("SPARK-32237: Hint in CTE") {

Review comment:
   Oh, it’s not only for v2. I just use a similar test for modification. I 
will move the test to another suite





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org