[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752870#comment-17752870
 ] 

Franck Tago commented on SPARK-44759:
-

Spark Dag for the use case . The failure is from the execution of 
WholeStageCodeGen(2)

!image-2023-08-10-09-33-47-788.png!

> Do not combine multiple Generate nodes in the same WholeStageCodeGen 
> nodebecause it can  easily cause OOM failures if arrays are relatively large
> -
>
> Key: SPARK-44759
> URL: https://issues.apache.org/jira/browse/SPARK-44759
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Franck Tago
>Priority: Major
> Attachments: image-2023-08-10-09-27-24-124.png, 
> image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, 
> image-2023-08-10-09-33-47-788.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows to the node. 
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from the attachment ,  the rows created in the nested loop 
> are saved in writer buffer.  In my case because the rows were big , I hit an 
> Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a 
> nested loop that for each row  , will explode the parent array and then 
> explode the inner array.  This is prone to OutOfmerry errors
>  
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whoe 
> stage code gen node . Doing so leads to potential memory issues.
>  
> !image-2023-08-10-02-18-24-758.png!
>  
> !image-2023-08-10-09-19-23-973.png!
>  
> !image-2023-08-10-09-21-32-371.png!
>  
> WSCG method for first Generate node
> !image-2023-08-10-09-22-29-949.png!
>  
> WSCG for second Generate node 
> As you can see the execution of generate_doConsume_1 and generate_doConsume_0 
>  triggers a nested loop.
> !image-2023-08-10-09-24-02-755.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752868#comment-17752868
 ] 

Franck Tago commented on SPARK-44759:
-

WSCG  generated code for second Generate node 

!image-2023-08-10-09-32-46-163.png!

> Do not combine multiple Generate nodes in the same WholeStageCodeGen 
> nodebecause it can  easily cause OOM failures if arrays are relatively large
> -
>
> Key: SPARK-44759
> URL: https://issues.apache.org/jira/browse/SPARK-44759
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Franck Tago
>Priority: Major
> Attachments: image-2023-08-10-09-27-24-124.png, 
> image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows to the node. 
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from the attachment ,  the rows created in the nested loop 
> are saved in writer buffer.  In my case because the rows were big , I hit an 
> Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a 
> nested loop that for each row  , will explode the parent array and then 
> explode the inner array.  This is prone to OutOfmerry errors
>  
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whoe 
> stage code gen node . Doing so leads to potential memory issues.
>  
> !image-2023-08-10-02-18-24-758.png!
>  
> !image-2023-08-10-09-19-23-973.png!
>  
> !image-2023-08-10-09-21-32-371.png!
>  
> WSCG method for first Generate node
> !image-2023-08-10-09-22-29-949.png!
>  
> WSCG for second Generate node 
> As you can see the execution of generate_doConsume_1 and generate_doConsume_0 
>  triggers a nested loop.
> !image-2023-08-10-09-24-02-755.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752866#comment-17752866
 ] 

Franck Tago commented on SPARK-44759:
-

WSCG  generated code for first Generate node 

!image-2023-08-10-09-29-24-804.png!

> Do not combine multiple Generate nodes in the same WholeStageCodeGen 
> nodebecause it can  easily cause OOM failures if arrays are relatively large
> -
>
> Key: SPARK-44759
> URL: https://issues.apache.org/jira/browse/SPARK-44759
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Franck Tago
>Priority: Major
> Attachments: image-2023-08-10-09-27-24-124.png, 
> image-2023-08-10-09-29-24-804.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows to the node. 
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from the attachment ,  the rows created in the nested loop 
> are saved in writer buffer.  In my case because the rows were big , I hit an 
> Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a 
> nested loop that for each row  , will explode the parent array and then 
> explode the inner array.  This is prone to OutOfmerry errors
>  
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whoe 
> stage code gen node . Doing so leads to potential memory issues.
>  
> !image-2023-08-10-02-18-24-758.png!
>  
> !image-2023-08-10-09-19-23-973.png!
>  
> !image-2023-08-10-09-21-32-371.png!
>  
> WSCG method for first Generate node
> !image-2023-08-10-09-22-29-949.png!
>  
> WSCG for second Generate node 
> As you can see the execution of generate_doConsume_1 and generate_doConsume_0 
>  triggers a nested loop.
> !image-2023-08-10-09-24-02-755.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752864#comment-17752864
 ] 

Franck Tago commented on SPARK-44759:
-

!image-2023-08-10-09-27-24-124.png!

> Do not combine multiple Generate nodes in the same WholeStageCodeGen 
> nodebecause it can  easily cause OOM failures if arrays are relatively large
> -
>
> Key: SPARK-44759
> URL: https://issues.apache.org/jira/browse/SPARK-44759
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Franck Tago
>Priority: Major
> Attachments: image-2023-08-10-09-27-24-124.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows to the node. 
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from the attachment ,  the rows created in the nested loop 
> are saved in writer buffer.  In my case because the rows were big , I hit an 
> Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a 
> nested loop that for each row  , will explode the parent array and then 
> explode the inner array.  This is prone to OutOfmerry errors
>  
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whoe 
> stage code gen node . Doing so leads to potential memory issues.
>  
> !image-2023-08-10-02-18-24-758.png!
>  
> !image-2023-08-10-09-19-23-973.png!
>  
> !image-2023-08-10-09-21-32-371.png!
>  
> WSCG method for first Generate node
> !image-2023-08-10-09-22-29-949.png!
>  
> WSCG for second Generate node 
> As you can see the execution of generate_doConsume_1 and generate_doConsume_0 
>  triggers a nested loop.
> !image-2023-08-10-09-24-02-755.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org