[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-12-06 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-739698601


   I recently generalized subexpression elimination feature to interpreted 
project and predicate. So now both whole-stage codegen and interpreted 
execution support subexpression elimination that could avoid the performance 
issue caused by embedding common expressions from collapsing projects.
   
   That's said, I think this patch is less useful for now. I'm closing it now.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-19 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-712378966


   @maropu @dongjoon-hyun Any more comments?
   
   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-19 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-712378514


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-14 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-708683946


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-13 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-70782


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-13 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-707851438


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-06 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-704566080


   cc @cloud-fan @dongjoon-hyun too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-06 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-704561415


   > Perhaps the max number of common expressions is not the best metric here?
   > 
   > Lets compare two cases:
   > 
   > 1. On the lower project you have a `JsonToStructs` and on upper 
Project you get 3 fields from that struct. This would mean 2 redundant 
computations and the "metric" you are looking at is 3.
   > 
   > 2. On the lower project you have two `JsonToStructs` and on upper 
Project you get 2 fields from both stucts. This would also mean 2 redundant 
computations and the "metric" you are looking at is 2.
   > 
   > 
   > Adding more `JsonToStructs` to the lower level would increase the number 
redundant computations without increasing the max value.
   > So as an alternative I would propose "the number of redundant 
computations" (sum of values in the `exprMap` minus its size) as a metric to 
use.
   > 
   > Although I must admit, that in that case we might cache more values for 
the number of extra computations we save.
   > So both of them have their benefits.
   
   Yes, in the case you add the number of redundant computations each time you 
add one more `JsonToStructs`. But overall it should not cause noticeable 
performance issue because you simply three times the running cost of 
`JsonToStructs` (assume each `JsonToStructs` has 2 redundant computations).
   
   The number of redundant computations is misleading. If we have 100 
`JsonToStructs` in lower project, each of them has 2 redundant computations in 
upper project, it doesn't mean we 100 times the running cost of 
`JsonToStructs`. In other words, it is hard to tell the performance difference 
between 10 and 20 redundant computations. If the redundant computations come 
from the same expression, then we have 10 times v.s. 20 times running cost, but 
if they are from 10 expressions? We might just have 2 to 3 times running cost.
   
   
   
   
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs

2020-10-05 Thread GitBox


viirya commented on pull request #29950:
URL: https://github.com/apache/spark/pull/29950#issuecomment-704038432


   > Related to #29094 ?
   
   No, after did a quick scan of that PR. That PR targets driver OOM caused by 
too many leaf expressions in collapsed Project. Here this diff cares about 
duplicated common expressions in collapsed Project. Different problems, I think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org