oeuf opened a new pull request #34812: URL: https://github.com/apache/spark/pull/34812
### What changes were proposed in this pull request? - Adds code changes to allow for underscores in the elements for the `columns` arg and for the column names used for the `values` arg. ### Why are the changes needed? Fixes a bug with the method `pyspark.pandas.frames.DataFrame.pivot_table` that causes a `KeyError` when an underscore is present (more details in [SPARK-37553](https://issues.apache.org/jira/browse/SPARK-37553)). ```python >>> import numpy as np >>> import pandas as pd >>> from pyspark import pandas as ps >>> pdf = pd.DataFrame( { "a": [4, 2, 3, 4, 8, 6], "b_b": [1, 2, 2, 4, 2, 4], "e": [10, 20, 20, 40, 20, 40], "c": [1, 2, 9, 4, 7, 4], "d": [-1, -2, -3, -4, -5, -6], }, index=np.random.rand(6), ) >>> psdf = ps.from_pandas(pdf) >>> psdf.pivot_table(index=["c"], columns="a", values=["b_b", "e"]) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-8-32d5bb0e1166> in <module> ----> 1 psdf.pivot_table(index=["c"], columns="a", values=["b_b", "e"]) ~/.pyenv/versions/3.7.9/envs/venv37/lib/python3.7/site-packages/pyspark/pandas/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value) 6053 column_labels = [ 6054 tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]]) -> 6055 for name in data_columns 6056 ] 6057 column_label_names = ( ~/.pyenv/versions/3.7.9/envs/venv37/lib/python3.7/site-packages/pyspark/pandas/frame.py in <listcomp>(.0) 6053 column_labels = [ 6054 tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]]) -> 6055 for name in data_columns 6056 ] 6057 column_label_names = ( KeyError: 'b' ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - [x] Add unit tests for code changes - [] Build package via Github Actions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
