[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641775258 Okay, let me try to just workaround the problem alone .. it's too invasive .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935 BTW, I skimmed `Aggregator` related analysis and I think we're all good with the current change if I didn't miss anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs
HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641674305 > will this cause perf regression? e.g. if grouping expr is expensive, with the Project we only need to evaluate it once. I would say this is kind of a design choice. In that why, we should add the projection to all grouping expressions. It will send less data on the other hand. This PR matches the implementation - it shouldn't be matched with object expressions because grouped and cogrouped UDFs actually should pass a key separately to UDF which object expressions don't. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org