[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r552672436 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +670,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } -// Create the transform. +val plan = visitCommonSelectQueryClausePlan( + relation, + lateralView, + transformClause.namedExpressionSeq, + whereClause, + aggregationClause, + havingClause, + windowClause, + isDistinct = false) + +// Create the transform, here we pass UnresolvedStart as ScriptTransform's input. +// In analyzer after child plan is resolved, we resolve UnresolvedStart as child's output. ScriptTransformation( - expressions, + Seq(UnresolvedStar(None)), Review comment: > hm, on second thought, we cannot remove this param `input` from `ScriptTransformation` in this PR? Since the input exprs of the current `ScriptTransformation` implementaiton always coms from child's output, IIUC we don't need this param anymore? It looks like this. We can replace `input` with `child. output` directly. That's a really nice suggestion since current way(converting LogicalPlan it looks a little weird). If we remove this `input` parameter and add correct comment. The whole process looks more natural. I have tried it in local, a new big diff. How about a new ticket about refactor this? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +670,23 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } -// Create the transform. +val plan = visitCommonSelectQueryClausePlan( + relation, + lateralView, + transformClause.namedExpressionSeq, + whereClause, + aggregationClause, + havingClause, + windowClause, + isDistinct = false) + +// Create the transform, here we pass UnresolvedStart as ScriptTransform's input. +// In analyzer after child plan is resolved, we resolve UnresolvedStart as child's output. ScriptTransformation( - expressions, + Seq(UnresolvedStar(None)), Review comment: > hm, on second thought, we cannot remove this param `input` from `ScriptTransformation` in this PR? Since the input exprs of the current `ScriptTransformation` implementaiton always coms from child's output, IIUC we don't need this param anymore? It looks like this. We can replace `input` with `child.output` directly. That's a really nice suggestion since current way(converting LogicalPlan it looks a little weird). If we remove this `input` parameter and add correct comment. The whole process looks more natural. I have tried it in local, a new big diff. How about a new ticket about refactor this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551973037 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -301,6 +302,185 @@ class SparkSqlParserSuite extends AnalysisTest { List.empty, List.empty, None, None, false))) } + test("SPARK-28227: script transform with row format delimit with aggregation") { +assertEqual( + """ +|SELECT TRANSFORM(a, sum(b), max(c)) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'null' +| USING 'cat' AS (a, b, c) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'NULL' +|FROM testData +|GROUP BY a +|HAVING sum(b) > 10 + """.stripMargin, + ScriptTransformation( +Seq(UnresolvedStar(None)), +"cat", +Seq(AttributeReference("a", StringType)(), + AttributeReference("b", StringType)(), + AttributeReference("c", StringType)()), +UnresolvedHaving( + GreaterThan( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), +Literal(10)), + Aggregate( +Seq('a), +Seq( + 'a, + UnresolvedAlias( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), None), + UnresolvedAlias( +UnresolvedFunction("max", Seq(UnresolvedAttribute("c")), isDistinct = false), None) +), +UnresolvedRelation(TableIdentifier("testData", +ScriptInputOutputSchema( + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "null"), +("TOK_TABLEROWFORMATLINES", "\n")), + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "NULL"), +("TOK_TABLEROWFORMATLINES", "\n")), None, None, Review comment: > I feel this part is redundant, so could you pull out this part as a shared variable? How about current change? merge two UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551959690 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,126 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM(b, a, CAST(c AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2, 3) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) + USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM( + b AS d5, a, + CASE +WHEN c > 100 THEN 1 +WHEN c < 100 THEN 2 + ELSE 3 END) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b, a, c + 1) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(*) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; Review comment: > I just want to know why these tests are added in this PR... That's because these tests seems to be not related to aggregation/window/lateralView. Yea, here want to show after this pr's change, each kind of expressions can work well, such as `alias`, `case when`, `binary compute` etc... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551771493 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -692,13 +712,41 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg havingClause: HavingClauseContext, windowClause: WindowClauseContext, relation: LogicalPlan): LogicalPlan = withOrigin(ctx) { +val isDistinct = selectClause.setQuantifier() != null && + selectClause.setQuantifier().DISTINCT() != null + +// Visit common project Review comment: Just remove this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551771170 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +670,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } +val plan = visitCommonSelectQueryClausePlan(relation, + lateralView, + transformClause.namedExpressionSeq, + whereClause, + aggregationClause, + havingClause, + windowClause, + isDistinct = false) + // Create the transform. ScriptTransformation( - expressions, + Seq(UnresolvedStar(None)), Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551771073 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,126 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM(b, a, CAST(c AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2, 3) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) + USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM( + b AS d5, a, + CASE +WHEN c > 100 THEN 1 +WHEN c < 100 THEN 2 + ELSE 3 END) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b, a, c + 1) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(*) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(sum(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max(a) > 1; + +SELECT TRANSFORM(b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +WINDOW w AS (PARTITION BY b ORDER BY a); + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING), myCol, myCol2) + USING 'cat' AS (a, b, c, d, e) +FROM script_trans +LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol +LATERAL VIEW explode(myTable.myCol) myTable2 AS myCol2 +WHERE a <= 4 +GROUP BY b, myCol, myCol2 +HAVING max(a) > 1; + + Review comment: DOne This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551770899 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,126 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM(b, a, CAST(c AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2, 3) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) + USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM( + b AS d5, a, + CASE +WHEN c > 100 THEN 1 +WHEN c < 100 THEN 2 + ELSE 3 END) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b, a, c + 1) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(*) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(sum(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max(a) > 1; + +SELECT TRANSFORM(b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +WINDOW w AS (PARTITION BY b ORDER BY a); + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING), myCol, myCol2) + USING 'cat' AS (a, b, c, d, e) +FROM script_trans +LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol +LATERAL VIEW explode(myTable.myCol) myTable2 AS myCol2 +WHERE a <= 4 +GROUP BY b, myCol, myCol2 +HAVING max(a) > 1; + + +FROM( + FROM script_trans + SELECT TRANSFORM(a, b) +USING 'cat' AS (`a` INT, b STRING) +) t +SELECT a + 1; + +FROM( + SELECT TRANSFORM(a, SUM(b) b) +USING 'cat' AS (`a` INT, b STRING) + FROM script_trans + GROUP BY a +) t +SELECT (b + 1) AS result +ORDER BY result; + +MAP k / 10 USING 'cat' AS (one) FROM (SELECT 10 AS k); + +FROM (SELECT 1 AS key, 100 AS value) src +MAP src.*, src.key, CAST(src.key / 10 AS INT), CAST(src.key % 10 AS INT), src.value + USING 'cat' AS (k, v, tkey, ten, one, tvalue); + +SELECT TRANSFORM(1) + USING 'cat' AS (a) +FROM script_trans +HAVING true; + +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=true; + +SELECT TRANSFORM(1) + USING 'cat' AS (a) +FROM script_trans +HAVING true; + +SET spark.sql.parser.quotedRegexColumnNames=true; + +SELECT TRANSFORM(`(a|b)?+.+`) + USING 'cat' AS (c) +FROM script_trans; Review comment: DOne ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,126 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM(b, a, CAST(c AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2, 3) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) + USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM( + b AS d5, a, + CASE +WHEN c > 100 THEN 1 +WHEN c < 100 THEN 2 + ELSE 3 END) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b, a, c + 1) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(*) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(sum(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max(a) > 1; + +SELECT TRANSFORM(b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +WINDOW w AS (PARTITION BY b ORDER BY a); + +SELECT TRANSFORM(b, MAX(a) as max_a,
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551770515 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -301,6 +302,187 @@ class SparkSqlParserSuite extends AnalysisTest { List.empty, List.empty, None, None, false))) } + test("SPARK-28227: script transform with row format delimit with aggregation") { +assertEqual( + """ +|SELECT TRANSFORM(a, sum(b), max(c)) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'null' +| USING 'cat' AS (a, b, c) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'NULL' +|FROM testData +|GROUP BY a +|HAVING sum(b) > 10 + """.stripMargin, + ScriptTransformation( +Seq(UnresolvedStar(None)), +"cat", +Seq(AttributeReference("a", StringType)(), + AttributeReference("b", StringType)(), + AttributeReference("c", StringType)()), +UnresolvedHaving( + GreaterThan( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), +Literal(10)), + Aggregate( +Seq('a), +Seq( + 'a, + UnresolvedAlias( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), None), + UnresolvedAlias( +UnresolvedFunction("max", Seq(UnresolvedAttribute("c")), isDistinct = false), None) +), +UnresolvedRelation(TableIdentifier("testData", +ScriptInputOutputSchema( + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "null"), +("TOK_TABLEROWFORMATLINES", "\n")), + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "NULL"), +("TOK_TABLEROWFORMATLINES", "\n")), None, None, + List.empty, List.empty, None, None, false))) + + Review comment: DOne ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -301,6 +302,187 @@ class SparkSqlParserSuite extends AnalysisTest { List.empty, List.empty, None, None, false))) } + test("SPARK-28227: script transform with row format delimit with aggregation") { +assertEqual( + """ +|SELECT TRANSFORM(a, sum(b), max(c)) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'null' +| USING 'cat' AS (a, b, c) +| ROW FORMAT DELIMITED +| FIELDS TERMINATED BY ',' +| COLLECTION ITEMS TERMINATED BY '#' +| MAP KEYS TERMINATED BY '@' +| LINES TERMINATED BY '\n' +| NULL DEFINED AS 'NULL' +|FROM testData +|GROUP BY a +|HAVING sum(b) > 10 + """.stripMargin, + ScriptTransformation( +Seq(UnresolvedStar(None)), +"cat", +Seq(AttributeReference("a", StringType)(), + AttributeReference("b", StringType)(), + AttributeReference("c", StringType)()), +UnresolvedHaving( + GreaterThan( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), +Literal(10)), + Aggregate( +Seq('a), +Seq( + 'a, + UnresolvedAlias( +UnresolvedFunction("sum", Seq(UnresolvedAttribute("b")), isDistinct = false), None), + UnresolvedAlias( +UnresolvedFunction("max", Seq(UnresolvedAttribute("c")), isDistinct = false), None) +), +UnresolvedRelation(TableIdentifier("testData", +ScriptInputOutputSchema( + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "null"), +("TOK_TABLEROWFORMATLINES", "\n")), + Seq(("TOK_TABLEROWFORMATFIELD", ","), +("TOK_TABLEROWFORMATCOLLITEMS", "#"), +("TOK_TABLEROWFORMATMAPKEYS", "@"), +("TOK_TABLEROWFORMATNULL", "NULL"), +("TOK_TABLEROWFORMATLINES", "\n")), None, None, + List.empty,
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551769365 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,126 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM(b, a, CAST(c AS STRING)) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2, 3) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) + USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM( + b AS d5, a, + CASE +WHEN c > 100 THEN 1 +WHEN c < 100 THEN 2 + ELSE 3 END) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(b, a, c + 1) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(*) + USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; Review comment: > These queries above are not supported in the current master? Support, what's wrong? It have a correct answer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551711403 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -301,6 +302,61 @@ class SparkSqlParserSuite extends AnalysisTest { List.empty, List.empty, None, None, false))) } + test("SPARK-28227: script transform with row format delimit with aggregation") { Review comment: > Please add tests for window/lateral view cases. Added ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +670,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } +val plan = visitCommonSelectQueryClausePlan(relation, + lateralView, + transformClause.namedExpressionSeq, + whereClause, + aggregationClause, + havingClause, + windowClause, + isDistinct = false) Review comment: > I think we don't need to support it in this PR and its okay to just follow the hive behaviour. Yea This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551677043 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +670,21 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } +val plan = visitCommonSelectQueryClausePlan(relation, + lateralView, + transformClause.namedExpressionSeq, + whereClause, + aggregationClause, + havingClause, + windowClause, + isDistinct = false) Review comment: > In hive, one cannot use `distinct` for TRANSFORM? e.g., `SELECT TRANSFORM(distinct a, b)`? Not support. Do we need to support this? ``` hive> > > select transform(distinct a) using 'cat' from (select 1 as a) temp; NoViableAltException(96@[104:1: selectExpression : ( ( tableAllColumns )=> tableAllColumns | expression );]) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser$DFA19.specialStateTransition(HiveParser_SelectClauseParser.java:5380) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectExpression(HiveParser_SelectClauseParser.java:2215) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectExpressionList(HiveParser_SelectClauseParser.java:2297) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectTrfmClause(HiveParser_SelectClauseParser.java:1299) at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:968) at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:41964) at org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:36720) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:36987) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36633) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:17 cannot recognize input near 'distinct' 'a' ')' in select expression ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551675638 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1519,12 +1519,7 @@ class Analyzer(override val catalogManager: CatalogManager) } // If the script transformation input contains Stars, expand it. Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551672696 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1519,12 +1519,7 @@ class Analyzer(override val catalogManager: CatalogManager) } // If the script transformation input contains Stars, expand it. case t: ScriptTransformation if containsStar(t.input) => -t.copy( - input = t.input.flatMap { -case s: Star => s.expand(t.child, resolver) -case o => o :: Nil - } -) +t.copy(input = t.child.output) Review comment: Doneļ¼ can be supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551670573 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,114 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1, 2, 3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) +USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b, a, c + 1) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (*) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(sum(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max(a) > 1; + +SELECT TRANSFORM (b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +WINDOW w AS (PARTITION BY b ORDER BY a); + +set spark.sql.legacy.bucketedTableScan.outputOrdering=true; + +SELECT TRANSFORM (MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b) +FROM script_trans +WHERE a <= 4 +HAVING max(a) > 1; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING), myCol, myCol2) +USING 'cat' AS (a, b, c, d, e) +FROM script_trans +LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol +LATERAL VIEW explode(myTable.myCol) myTable2 AS myCol2 +WHERE a <= 4 +GROUP BY b, myCol, myCol2 +HAVING max(a) > 1; + + +FROM +(FROM script_trans SELECT TRANSFORM(a, b) USING 'cat' AS (`a` INT, b STRING)) t +SELECT a + 1; + +FROM +(SELECT TRANSFORM(a, SUM(b) b) +USING 'cat' AS (`a` INT, b STRING) Review comment: Done ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,114 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING)) Review comment: Done ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,114 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1, 2, 3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) +USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, Review comment: Done ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,114 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1, 2, 3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) +USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b, a, c + 1) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (*) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(sum(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM (b, MAX(a) as max_a,
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551668776 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,114 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1, 2, 3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1, 2) +USING 'cat' AS (a INT, b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b, a, c + 1) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (*) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(sum(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 2 +GROUP BY b; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max_a > 0; + +SELECT TRANSFORM (b, MAX(a) as max_a, CAST(SUM(c) AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +GROUP BY b +HAVING max(a) > 1; + +SELECT TRANSFORM (b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING)) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4 +WINDOW w AS (PARTITION BY b ORDER BY a); + +set spark.sql.legacy.bucketedTableScan.outputOrdering=true; Review comment: test for this https://github.com/apache/spark/pull/29087#discussion_r551342009 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551668632 ## File path: sql/core/src/test/resources/sql-tests/inputs/pivot.sql ## @@ -46,7 +46,7 @@ SELECT * FROM ( SELECT course, earnings FROM courseSales ) PIVOT ( - sum(earnings) + sum(earnings)LATERAL VIEW Review comment: Mistake, reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation/window/lateralView
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551668237 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +669,39 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } +val namedExpressions = expressions.map { + case e: NamedExpression => e + case e: Expression => UnresolvedAlias(e) +} + +def createProject() = if (namedExpressions.nonEmpty) { + Project(namedExpressions, withFilter) +} else { + withFilter +} + +val withProject = if (aggregationClause == null && havingClause != null) { + if (conf.getConf(SQLConf.LEGACY_HAVING_WITHOUT_GROUP_BY_AS_WHERE)) { +// If the legacy conf is set, treat HAVING without GROUP BY as WHERE. +withHavingClause(havingClause, createProject()) + } else { +// According to SQL standard, HAVING without GROUP BY means global aggregate. +withHavingClause(havingClause, Aggregate(Nil, namedExpressions, withFilter)) + } +} else if (aggregationClause != null) { + val aggregate = withAggregationClause(aggregationClause, namedExpressions, withFilter) + aggregate.optionalMap(havingClause)(withHavingClause) +} else { + // When hitting this branch, `having` must be null. + createProject() +} Review comment: > Is this part just copied from `withSelectQuerySpecification`? If so, could you share it between `withSelectQuerySpecification` and `withTransformQuerySpecification`? Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551391778 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1,2) +USING 'cat' AS (a INT,b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b, a, c + 1 ) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (*) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING)) Review comment: > In hive, one can use an aggregate filter, e.g., `SELECT TRANSFORM (b AS d, MAX(a) FILTER (WHER a > 3) as max_a`? Could you add tests for it? Add a test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551342009 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -659,12 +669,39 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg AttributeReference("value", StringType)()), true) } +val namedExpressions = expressions.map { + case e: NamedExpression => e + case e: Expression => UnresolvedAlias(e) +} + +def createProject() = if (namedExpressions.nonEmpty) { + Project(namedExpressions, withFilter) +} else { + withFilter +} + +val withProject = if (aggregationClause == null && havingClause != null) { + if (conf.getConf(SQLConf.LEGACY_HAVING_WITHOUT_GROUP_BY_AS_WHERE)) { Review comment: > Since this is a new feature, we don't need to follow this config here? We can follow this, why not, but I need to add test case about this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551327184 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1,2) +USING 'cat' AS (a INT,b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a,b,c) Review comment: Done ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1,2) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551326940 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala ## @@ -1041,11 +1041,11 @@ class PlanParserSuite extends AnalysisTest { |FROM testData """.stripMargin, ScriptTransformation( -Seq('a, 'b, 'c), +Seq(UnresolvedStar(None)), Review comment: > Why do we need to update this? Since in parser level, we pass UnresolvedStar to replace all output of child's output and then resolve this in Analyzer level to realize supporting aggregation with transform This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551326208 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1,2) +USING 'cat' AS (a INT,b INT) +FROM script_trans +LIMIT 1; + +SELECT TRANSFORM ( +b AS d5, a, +CASE + WHEN c > 100 THEN 1 + WHEN c < 100 THEN 2 +ELSE 3 END) +USING 'cat' AS (a,b,c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (b, a, c + 1 ) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551326110 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM(1,2) +USING 'cat' AS (a INT,b INT) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551325918 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) +USING 'cat' AS (a, b, c) +FROM script_trans +WHERE a <= 4; + +SELECT TRANSFORM (1,2,3) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551325708 ## File path: sql/core/src/test/resources/sql-tests/inputs/transform.sql ## @@ -183,3 +189,84 @@ SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM ( NULL DEFINED AS 'NULL' FROM t ) tmp; + +SELECT TRANSFORM (b, a, CAST(c AS STRING) ) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551325000 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -582,7 +584,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg val from = OneRowRelation().optional(ctx.fromClause) { visitFromClause(ctx.fromClause) } -withTransformQuerySpecification(ctx, ctx.transformClause, ctx.whereClause, from) +withTransformQuerySpecification( + ctx, + ctx.transformClause, + ctx.whereClause, + ctx.aggregationClause, + ctx.havingClause, from Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r551323515 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -506,7 +506,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r547288935 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -2558,6 +2558,131 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } } + + test("SPARK-28227: test script transform with aggregation") { Review comment: @maropu UT have been moved to `transform.sql` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454102278 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Can we add this after all things done? and we need to add a new page like `Where clause` and need to add a lot reference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454102278 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Can we add this after all things done? and we need to add a new page like `Where clause` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454045113 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Yea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454101882 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -2558,6 +2558,131 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } } + + test("SPARK-28227: test script transform with aggregation") { Review comment: > Could you move the tests into `SQLQueryTestSuite`? This should wait for https://github.com/apache/spark/pull/29085, since currently we can't use script transform in sql/core This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation
AngersZh commented on a change in pull request #29087: URL: https://github.com/apache/spark/pull/29087#discussion_r454045113 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -496,7 +496,9 @@ fromStatementBody querySpecification : transformClause fromClause? - whereClause? #transformQuerySpecification + whereClause? + aggregationClause? + havingClause? #transformQuerySpecification Review comment: > Could you update the SQL doc, too? Yea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org