subject:"\[GitHub\] \[spark\] HyukjinKwon commented on pull request #28745\: \[SPARK\-31915\]\[SQL\]\[PYTHON\] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs"

[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-10 Thread GitBox

HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641775258 Okay, let me try to just workaround the problem alone .. it's too invasive .. This is an automated message

[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox

HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641692935 BTW, I skimmed `Aggregator` related analysis and I think we're all good with the current change if I didn't miss anything.

[GitHub] [spark] HyukjinKwon commented on pull request #28745: [SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs

2020-06-09 Thread GitBox

HyukjinKwon commented on pull request #28745: URL: https://github.com/apache/spark/pull/28745#issuecomment-641674305 > will this cause perf regression? e.g. if grouping expr is expensive, with the Project we only need to evaluate it once. I would say this is kind of a design choice.