[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-739698601 I recently generalized subexpression elimination feature to interpreted project and predicate. So now both whole-stage codegen and interpreted execution support subexpression elimination that could avoid the performance issue caused by embedding common expressions from collapsing projects. That's said, I think this patch is less useful for now. I'm closing it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-712378966 @maropu @dongjoon-hyun Any more comments? cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-712378514 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-708683946 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-70782 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-707851438 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-704566080 cc @cloud-fan @dongjoon-hyun too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-704561415 > Perhaps the max number of common expressions is not the best metric here? > > Lets compare two cases: > > 1. On the lower project you have a `JsonToStructs` and on upper Project you get 3 fields from that struct. This would mean 2 redundant computations and the "metric" you are looking at is 3. > > 2. On the lower project you have two `JsonToStructs` and on upper Project you get 2 fields from both stucts. This would also mean 2 redundant computations and the "metric" you are looking at is 2. > > > Adding more `JsonToStructs` to the lower level would increase the number redundant computations without increasing the max value. > So as an alternative I would propose "the number of redundant computations" (sum of values in the `exprMap` minus its size) as a metric to use. > > Although I must admit, that in that case we might cache more values for the number of extra computations we save. > So both of them have their benefits. Yes, in the case you add the number of redundant computations each time you add one more `JsonToStructs`. But overall it should not cause noticeable performance issue because you simply three times the running cost of `JsonToStructs` (assume each `JsonToStructs` has 2 redundant computations). The number of redundant computations is misleading. If we have 100 `JsonToStructs` in lower project, each of them has 2 redundant computations in upper project, it doesn't mean we 100 times the running cost of `JsonToStructs`. In other words, it is hard to tell the performance difference between 10 and 20 redundant computations. If the redundant computations come from the same expression, then we have 10 times v.s. 20 times running cost, but if they are from 10 expressions? We might just have 2 to 3 times running cost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29950: [SPARK-32945][SQL] Avoid collapsing projects if reaching max allowed common exprs
viirya commented on pull request #29950: URL: https://github.com/apache/spark/pull/29950#issuecomment-704038432 > Related to #29094 ? No, after did a quick scan of that PR. That PR targets driver OOM caused by too many leaf expressions in collapsed Project. Here this diff cares about duplicated common expressions in collapsed Project. Different problems, I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org