[GitHub] [spark] HyukjinKwon commented on a change in pull request #34812: [WIP][PARK-37553][PYTHON] Fix underscore (`_`) bug in pyspark.pandas.frames.DataFrame.pivot_table

GitBox Sun, 05 Dec 2021 16:11:36 -0800


HyukjinKwon commented on a change in pull request #34812:
URL: https://github.com/apache/spark/pull/34812#discussion_r762637825




##########
File path: python/pyspark/pandas/frame.py
##########
@@ -6054,17 +6056,21 @@ def pivot_table(
                     # E.g. if column is b and values is ['b','e'],
                     # then ['2_b', '2_e', '3_b', '3_e'].
 
-                    # We sort the columns of Spark DataFrame by values.
-                    data_columns.sort(key=lambda x: x.split("_", 1)[1])

Review comment:
       Can we just simply use `-1` index? Calling `Series.unique` will trigger 
aggregation on each column that's pretty expensive. We could alternatively just 
subtract original column names instead of invoking `unique`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34812: [WIP][PARK-37553][PYTHON] Fix underscore (`_`) bug in pyspark.pandas.frames.DataFrame.pivot_table

Reply via email to