[GitHub] [spark] itholic opened a new pull request, #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

via GitHub Mon, 04 Sep 2023 01:58:50 -0700


itholic opened a new pull request, #42798:
URL: https://github.com/apache/spark/pull/42798


   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to support string type columns for `DataFrameGroupBy.sum`.
   
   
   ### Why are the changes needed?
   
   To match the behavior with latest pandas.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, from now on the `DataFrameGroupBy.sum` follows the behavior of latest 
pandas as below:
   
   **Test DataFrame**
   ```python
   >>> psdf
      A    B  C      D
   0  1  3.1  a   True
   1  2  4.1  b  False
   2  1  4.1  b  False
   3  2  3.1  a   True
   ```
   
   **Before**
   ```python
   >>> psdf.groupby("A").sum().sort_index()
        B  D
   A
   1  7.2  1
   2  7.2  1
   ```
   
   **After**
   ```python
   >>> psdf.groupby("A").sum().sort_index()
        B   C  D
   A
   1  7.2  ab  1
   2  7.2  ba  1
   ```
   
   ### How was this patch tested?
   
   Updated the existing UTs to support string type columns.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic opened a new pull request, #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

Reply via email to