[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-10 Thread GitBox


HyukjinKwon commented on pull request #28745:
URL: https://github.com/apache/spark/pull/28745#issuecomment-641775258


   Okay, let me try to just workaround the problem alone .. it's too invasive ..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox


HyukjinKwon commented on pull request #28745:
URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935


   BTW, I skimmed `Aggregator` related analysis and I think we're all good with 
the current change if I didn't miss anything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox


HyukjinKwon commented on pull request #28745:
URL: https://github.com/apache/spark/pull/28745#issuecomment-641674305


   > will this cause perf regression? e.g. if grouping expr is expensive, with 
the Project we only need to evaluate it once.
   
   I would say this is kind of a design choice. In that why, we should add the 
projection to all grouping expressions. It will send less data on the other 
hand. This PR matches the implementation - it shouldn't be matched with object 
expressions because grouped and cogrouped UDFs actually should pass a key 
separately to UDF which object expressions don't.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org