[GitHub] [spark] HyukjinKwon opened a new pull request #32884: [SPARK-35738][PYTHON] Support x and y properly in DataFrame with non-numeric columns with plots

GitBox Mon, 14 Jun 2021 00:44:49 -0700


HyukjinKwon opened a new pull request #32884:
URL: https://github.com/apache/spark/pull/32884



   ### What changes were proposed in this pull request?
   
   This PR proposes to port the fix 
https://github.com/databricks/koalas/pull/2172.
   
   ```python
   ks.DataFrame({'a': [1, 2, 3], 'b':["a", "b", "c"], 'c': [4, 5, 
6]}).plot(kind='hist', x='a', y='c', bins=200)
   ```
   
   **Before:**
   
   ```
   pyspark.sql.utils.AnalysisException: cannot resolve 'least(min(a), min(b), 
min(c))' due to data type mismatch: The expressions should all have the same 
type, got LEAST(bigint, string, bigint).;
   'Aggregate [unresolvedalias(least(min(a#1L), min(b#2), min(c#3L)), 
Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d94840@42fb0cc1)), 
unresolvedalias(greatest(max(a#1L), max(b#2), max(c#3L)), 
Some(org.apache.spark.sql.Column$$Lambda$1556/0x0000000800d94840@42fb0cc1))]
   +- Project [a#1L, b#2, c#3L]
      +- Project [__index_level_0__#0L, a#1L, b#2, c#3L, 
monotonically_increasing_id() AS __natural_order__#8L]
         +- LogicalRDD [__index_level_0__#0L, a#1L, b#2, c#3L], false
   ```
   
   **After:**
   
   ```python
   Figure({
       'data': [{'hovertemplate': 'variable=a<br>value=%{text}<br>count=%{y}',
                 'name': 'a',
   ...
   ```
   
   ### Why are the changes needed?
   
   To match the behaviour with panadas' and allow users to set `x` and `y` in 
the DataFrame with non-numeric columns.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No to end users since the changes is not released yet. Yes to dev as 
described before.
   
   ### How was this patch tested?
   
   Manually tested, added a test and tested in notebooks:
   
   
   ![Screen Shot 2021-06-11 at 9 11 25 
PM](https://user-images.githubusercontent.com/6477701/121686038-a47a1b80-cafb-11eb-8f8e-8d968db7ebef.png)
   
   ![Screen Shot 2021-06-11 at 9 48 58 
PM](https://user-images.githubusercontent.com/6477701/121688858-e22c7380-cafe-11eb-9d0a-adcbe560030f.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon opened a new pull request #32884: [SPARK-35738][PYTHON] Support x and y properly in DataFrame with non-numeric columns with plots

Reply via email to