[jira] [Commented] (SPARK-21163) DataFrame.toPandas should respect the data type

2018-04-14 Thread Ed Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438535#comment-16438535
 ] 

Ed Lee commented on SPARK-21163:


Had a question: in Spark 2.2.1, if I do a .toPandas on a Spark DataFrame with 
column integer type, the dtypes in pandas is int64.  Whereas in in Spark 2.3.0 
they ints are converted to int32. I ran the below in Spark 2.2.1 and 2.3.0:

```
df = spark.sparkContext.parallelize([(i, ) for i in [1, 2, 
3]]).toDF(["a"]).select(sf.col('a').cast('int')).toPandas()
df.dtypes
```
Is this intended? We ran into as we have unit tests in a project that passed in 
Spark 2.2.1 that fail in Spark 2.3.0

Left a comment on github:

[https://github.com/apache/spark/pull/18378/files/d8ba5452539c5fd5b650b7f5e51e467aabc33739#diff-6fc344560230bf0ef711bb9b5573f1faR1775]

 

> DataFrame.toPandas should respect the data type
> ---
>
> Key: SPARK-21163
> URL: https://issues.apache.org/jira/browse/SPARK-21163
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21163) DataFrame.toPandas should respect the data type

2017-06-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057729#comment-16057729
 ] 

Apache Spark commented on SPARK-21163:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/18378

> DataFrame.toPandas should respect the data type
> ---
>
> Key: SPARK-21163
> URL: https://issues.apache.org/jira/browse/SPARK-21163
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org