[jira] [Updated] (SPARK-38833) PySpark applyInPandas should allow to return empty DataFrame without columns

Enrico Minack (Jira) Fri, 08 Apr 2022 05:37:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-38833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Enrico Minack updated SPARK-38833:
----------------------------------
    Summary: PySpark applyInPandas should allow to return empty DataFrame 
without columns  (was: PySpark allows applyInPandas return empty DataFrame 
without columns)

> PySpark applyInPandas should allow to return empty DataFrame without columns
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-38833
>                 URL: https://issues.apache.org/jira/browse/SPARK-38833
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.4.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> Currently, returning an empty Pandas DataFrame from {{applyInPandas}} raises 
> an error:
> {noformat}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema. Expected: 2 Actual: 0
> {noformat}
> Here is an example:
> {code}
> import pandas as pd  
> from pyspark.sql.functions import pandas_udf, ceil
> df = spark.createDataFrame(
>     [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
>     ("id", "v"))  
> def mean_func(key, pdf):
>     if key == (1,):
>         return pd.DataFrame([])
>     else:
>         return pd.DataFrame([key + (pdf.v.mean(),)])
> df.groupby('id').applyInPandas(mean_func, schema="id long, v double").show()
> {code}
> Since the schema is defined when calling {{applyInPandas()}}, it looks 
> redundant to define the columns when returning an empty {{pd.DataFrame}}. 
> Returning a non-empty DataFrame does not require defining columns, so 
> returning an empty DataFrame shouldn't require that either.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38833) PySpark applyInPandas should allow to return empty DataFrame without columns

Reply via email to