[PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


viirya opened a new pull request, #47546:
URL: https://github.com/apache/spark/pull/47546

   
   
   ### What changes were proposed in this pull request?
   
   
   This patch fixes `TransformWithStateExec` so when its `hasInitialState` is 
false, the `initialState` won't be rewritten by planner incorrectly.
   
   ### Why are the changes needed?
   
   
   [SPARK-47363](https://issues.apache.org/jira/browse/SPARK-47363) added the 
support for users to provide initial state for streaming query. Such query 
operators like `TransformWithStateExec` might have `hasInitialState` as false 
which means the initial state related parameters are not used. But when query 
planner applies rules on the query, it will still apply on the initial state 
query plan. When `hasInitialState` is false, some related parameters like 
`initialStateGroupingAttrs` are invalid and some rules will use these invalid 
parameters to transform the initial state query plan.
   For example, `EnsureRequirements` may apply invalid Sort and Exchange on the 
initial query plan. We encountered these invalid query plan in our extension 
rules.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


JeonDaehong commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2259766374

   Hello, I am Mr. Jeon, a developer in Korea.
   I would like to become a Committer for Apache Spark. Could you please give 
me some advice on how to achieve this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


dongjoon-hyun commented on code in PR #47546:
URL: https://github.com/apache/spark/pull/47546#discussion_r1697967206


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala:
##
@@ -1231,6 +1232,46 @@ class TransformWithStateSuite extends 
StateStoreMetricsTest
   }
 }
   }
+
+  test("SPARK-49070: transformWithState - valid initial state plan") {

Review Comment:
   Thank you for adding a test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


dongjoon-hyun commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2259772917

   > Hello, I am Mr. Jeon, a developer in Korea. I would like to become a 
Committer for Apache Spark. Could you please give me some advice on how to 
achieve this?
   
   Hi, @JeonDaehong . Here is the community guide. However, this kind of Q&A is 
irrelevant to this PR. So, please use the official mailing list for the general 
questions, 
[d...@spark.apache.org](https://lists.apache.org/list.html?d...@spark.apache.org)
  .
   - https://spark.apache.org/contributing.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


dongjoon-hyun commented on code in PR #47546:
URL: https://github.com/apache/spark/pull/47546#discussion_r1697967206


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala:
##
@@ -1231,6 +1232,46 @@ class TransformWithStateSuite extends 
StateStoreMetricsTest
   }
 }
   }
+
+  test("SPARK-49070: transformWithState - valid initial state plan") {

Review Comment:
   Thank you for adding a test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


JeonDaehong commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2259780351

   > > 안녕하세요, 저는 한국에 있는 개발자 전씨입니다. Apache Spark의 커미터가 되고 싶습니다. 이것을 달성하는 방법에 대한 
조언을 주시겠습니까?
   > 
   > 안녕하세요. 다음은 커뮤니티 가이드입니다. 그러나 이러한 종류의 Q&A는 이 PR과 관련이 없습니다. 따라서 일반적인 질문은 공식 
메일링 리스트를 사용하십시오 
[d...@spark.apache.org](https://lists.apache.org/list.html?d...@spark.apache.org).
   > 
   > * https://spark.apache.org/contributing.html
   
   Thank you for your help.
   I am very interested in Spark and want to become a committer, but I didn't 
know how, so I sought advice here. I will refer to the links you provided and 
work hard on it.
   Thank you so much. :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub


viirya commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2259798746

   Thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-31 Thread via GitHub


anishshri-db commented on code in PR #47546:
URL: https://github.com/apache/spark/pull/47546#discussion_r1699048614


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala:
##
@@ -415,8 +415,13 @@ case class FlatMapGroupsWithStateExec(
   override def right: SparkPlan = initialState
 
   override protected def withNewChildrenInternal(
-  newLeft: SparkPlan, newRight: SparkPlan): FlatMapGroupsWithStateExec =
-copy(child = newLeft, initialState = newRight)
+  newLeft: SparkPlan, newRight: SparkPlan): FlatMapGroupsWithStateExec = {
+if (hasInitialState) {

Review Comment:
   Oh nice - so it seems FMGWS was also broken for this case ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-31 Thread via GitHub


viirya commented on code in PR #47546:
URL: https://github.com/apache/spark/pull/47546#discussion_r1699060705


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala:
##
@@ -415,8 +415,13 @@ case class FlatMapGroupsWithStateExec(
   override def right: SparkPlan = initialState
 
   override protected def withNewChildrenInternal(
-  newLeft: SparkPlan, newRight: SparkPlan): FlatMapGroupsWithStateExec =
-copy(child = newLeft, initialState = newRight)
+  newLeft: SparkPlan, newRight: SparkPlan): FlatMapGroupsWithStateExec = {
+if (hasInitialState) {

Review Comment:
   Yea, it is same as TransformWithStateExec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-08-01 Thread via GitHub


huaxingao closed pull request #47546: [SPARK-49070][SS][SQL] 
TransformWithStateExec.initialState is rewritten incorrectly to produce invalid 
query plan
URL: https://github.com/apache/spark/pull/47546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-08-01 Thread via GitHub


huaxingao commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2263810714

   Merged to master. Thanks @viirya!
   Also thanks @dongjoon-hyun @anishshri-db @jingz-db for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-08-01 Thread via GitHub


viirya commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2263813007

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-08-01 Thread via GitHub


dongjoon-hyun commented on PR #47546:
URL: https://github.com/apache/spark/pull/47546#issuecomment-2263819434

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org