[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Description: 
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
both None and NaN to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null

  was:
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null


> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None/NaN properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts both None and NaN to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Description: 
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null

  was:

{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null


> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None/NaN properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts None to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Summary: `createDataFrame` doesn't handle None/NaN properly  (was: 
`createDataFrame` doesn't handle None properly)

> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts None to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org