Haejoon Lee created SPARK-43282:
-----------------------------------

             Summary: Investigate DataFrame.sort_values with pandas behavior.
                 Key: SPARK-43282
                 URL: https://issues.apache.org/jira/browse/SPARK-43282
             Project: Spark
          Issue Type: Sub-task
          Components: Pandas API on Spark
    Affects Versions: 3.5.0
            Reporter: Haejoon Lee


{code:java}
import pandas as pd
pdf = pd.DataFrame(
    {
        "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
        "b": pd.Categorical(
            ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
        ),
    },
)
pdf.groupby("a").apply(lambda x: x).sort_values(["a"])

Traceback (most recent call last):
...
ValueError: 'a' is both an index level and a column label, which is ambiguous. 
{code}
We should investigate this issue whether this is intended behavior or just bug 
in pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to