subject:"\[jira\] \[Commented\] \(SPARK\-18709\) Automatic null conversion bug \(instead of throwing error\) when creating a Spark Datarame with incompatible types for fields."

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727763#comment-15727763
 ] 

Dongjoon Hyun commented on SPARK-18709:
---

@srowen . 
The type verification was introduced by 
https://issues.apache.org/jira/browse/SPARK-14945  when `session.py` is created 
in 2.0.0.

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724144#comment-15724144
 ] 

Dongjoon Hyun commented on SPARK-18709:
---

I'll check which commit added the guard condition.

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724101#comment-15724101
 ] 

Sean Owen commented on SPARK-18709:
---

BTW do you know what change fixed this, by any chance?

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Amogh Param (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723236#comment-15723236
 ] 

Amogh Param commented on SPARK-18709:
-

Thanks, I'll close the ticket.

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723201#comment-15723201
 ] 

Dongjoon Hyun commented on SPARK-18709:
---

Yes. It will not be in 1.6.4 (if exists).

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Amogh Param (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723189#comment-15723189
 ] 

Amogh Param commented on SPARK-18709:
-

[~dongjoon] Thanks for the fix. Just to clarify, does this mean that the fix 
will only be in 2.0.0 and not in 1.6.4 (assuming there will be a 1.6.4 update)?

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-05 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723100#comment-15723100
 ] 

Shixiong Zhu commented on SPARK-18709:
--

[~dongjoon] Is it already resolved? If so, could you close this ticket?

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-04 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720691#comment-15720691
 ] 

Dongjoon Hyun commented on SPARK-18709:
---

Also, I updated the fix version to 2.0.0 after testing on 2.0.0.

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.0
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

2016-12-04 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720687#comment-15720687
 ] 

Dongjoon Hyun commented on SPARK-18709:
---

Hi, [~amogh.91].
I removed the target version since 1.6.3 is already released.
In addition, only committers decide the target version.

> Automatic null conversion bug (instead of throwing error) when creating a 
> Spark Datarame with incompatible types for fields.
> 
>
> Key: SPARK-18709
> URL: https://issues.apache.org/jira/browse/SPARK-18709
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 1.6.3
>Reporter: Amogh Param
>  Labels: bug
> Fix For: 2.0.2
>
>
> When converting an RDD with a `float` type field to a spark dataframe with an 
> `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently 
> convert the field values to nulls instead of throwing an error like `LongType 
> can not accept object ___ in type `. However, this seems to be 
> fixed in Spark 2.0.2.
> The following example should make the problem clear:
> {code}
> from pyspark.sql.types import StructField, StructType, LongType, DoubleType
> schema = StructType([
> StructField("0", LongType(), True),
> StructField("1", DoubleType(), True),
> ])
> data = [[1.0, 1.0], [nan, 2.0]]
> spark_df = sqlContext.createDataFrame(sc.parallelize(data), schema)
> spark_df.show()
> {code}
> Instead of throwing an error like:
> {code}
> LongType can not accept object 1.0 in type 
> {code}
> Spark converts all the values in the first column to nulls
> Running `spark_df.show()` gives:
> {code}
> ++---+
> |   0|  1|
> ++---+
> |null|1.0|
> |null|1.0|
> ++---+
> {code}
> For the purposes of my computation, I'm doing a `mapPartitions` on a spark 
> data frame, and for each partition, converting it into a pandas data frame, 
> doing a few computations on this pandas dataframe and the return value will 
> be a list of lists, which is converted to an RDD while being returned from 
> 'mapPartitions' (for all partitions). This RDD is then converted into a spark 
> dataframe similar to the example above, using 
> `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should 
> be converted to a `LongType` in the spark data frame, but since it has 
> missing values, it is a `float` type. When spark tries to create the data 
> frame, it converts all the values in that column to nulls instead of throwing 
> an error that there is a type mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

[jira] [Commented] (SPARK-18709) Automatic null conversion bug (instead of throwing error) when creating a Spark Datarame with incompatible types for fields.

9 matches

Site Navigation

Mail list logo

Footer information