[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069044#comment-15069044
 ] 

Cheng Lian commented on SPARK-12478:


I'm leaving this ticket open since we also need to backport this to branch-1.6 
after the release.

> Dataset fields of product types can't be null
> -
>
> Key: SPARK-12478
> URL: https://issues.apache.org/jira/browse/SPARK-12478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>  Labels: backport-needed
>
> Spark shell snippet for reproduction:
> {code}
> import sqlContext.implicits._
> case class Inner(f: Int)
> case class Outer(i: Inner)
> Seq(Outer(null)).toDS().toDF().show()
> Seq(Outer(null)).toDS().show()
> {code}
> Expected output should be:
> {noformat}
> ++
> |   i|
> ++
> |null|
> ++
> ++
> |   i|
> ++
> |null|
> ++
> {noformat}
> Actual output:
> {noformat}
> +--+
> | i|
> +--+
> |[null]|
> +--+
> java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: 
> Null value appeared in non-nullable field Inner.f of type scala.Int. If the 
> schema is inferred from a Scala tuple/case class, or a Java bean, please try 
> to use scala.Option[_] or other nullable types (e.g. java.lang.Integer 
> instead of int/scala.Int).
> newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, 
> StructType(StructField(f,IntegerType,false))])) null else newinstance(class 
> $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class 
> $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3))
> +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null 
> else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>:- isnull(input[0, StructType(StructField(f,IntegerType,false))])
>:  +- input[0, StructType(StructField(f,IntegerType,false))]
>:- null
>+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>   +- assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int)
>  +- input[0, StructType(StructField(f,IntegerType,false))].f
> +- input[0, StructType(StructField(f,IntegerType,false))]
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> at org.apache.spark.sql.Dataset.collect(Dataset.scala:704)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:725)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:230)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:193)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:201)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:46)
> at $iwC$$iwC$$iwC$$iwC.(:48)
> at $iwC$$iwC$$iwC.(:50)
> at $iwC$$iwC.(:52)
> at $iwC.(:54)
> at (:56)
> at .(:60)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> a

[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068052#comment-15068052
 ] 

Apache Spark commented on SPARK-12478:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/10431

> Dataset fields of product types can't be null
> -
>
> Key: SPARK-12478
> URL: https://issues.apache.org/jira/browse/SPARK-12478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> Spark shell snippet for reproduction:
> {code}
> import sqlContext.implicits._
> case class Inner(f: Int)
> case class Outer(i: Inner)
> Seq(Outer(null)).toDS().toDF().show()
> Seq(Outer(null)).toDS().show()
> {code}
> Expected output should be:
> {noformat}
> ++
> |   i|
> ++
> |null|
> ++
> ++
> |   i|
> ++
> |null|
> ++
> {noformat}
> Actual output:
> {noformat}
> +--+
> | i|
> +--+
> |[null]|
> +--+
> java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: 
> Null value appeared in non-nullable field Inner.f of type scala.Int. If the 
> schema is inferred from a Scala tuple/case class, or a Java bean, please try 
> to use scala.Option[_] or other nullable types (e.g. java.lang.Integer 
> instead of int/scala.Int).
> newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, 
> StructType(StructField(f,IntegerType,false))])) null else newinstance(class 
> $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class 
> $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3))
> +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null 
> else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>:- isnull(input[0, StructType(StructField(f,IntegerType,false))])
>:  +- input[0, StructType(StructField(f,IntegerType,false))]
>:- null
>+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>   +- assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int)
>  +- input[0, StructType(StructField(f,IntegerType,false))].f
> +- input[0, StructType(StructField(f,IntegerType,false))]
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> at org.apache.spark.sql.Dataset.collect(Dataset.scala:704)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:725)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:230)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:193)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:201)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:46)
> at $iwC$$iwC$$iwC$$iwC.(:48)
> at $iwC$$iwC$$iwC.(:50)
> at $iwC$$iwC.(:52)
> at $iwC.(:54)
> at (:56)
> at .(:60)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.spark.repl.Spar

[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null

2015-12-22 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068043#comment-15068043
 ] 

Cheng Lian commented on SPARK-12478:


[~marmbrus] I guess this issue probably doesn't block 1.6 since Dataset is 
still experimental?

> Dataset fields of product types can't be null
> -
>
> Key: SPARK-12478
> URL: https://issues.apache.org/jira/browse/SPARK-12478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>
> Spark shell snippet for reproduction:
> {code}
> import sqlContext.implicits._
> case class Inner(f: Int)
> case class Outer(i: Inner)
> Seq(Outer(null)).toDS().toDF().show()
> Seq(Outer(null)).toDS().show()
> {code}
> Expected output should be:
> {noformat}
> ++
> |   i|
> ++
> |null|
> ++
> ++
> |   i|
> ++
> |null|
> ++
> {noformat}
> Actual output:
> {noformat}
> +--+
> | i|
> +--+
> |[null]|
> +--+
> java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: 
> Null value appeared in non-nullable field Inner.f of type scala.Int. If the 
> schema is inferred from a Scala tuple/case class, or a Java bean, please try 
> to use scala.Option[_] or other nullable types (e.g. java.lang.Integer 
> instead of int/scala.Int).
> newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, 
> StructType(StructField(f,IntegerType,false))])) null else newinstance(class 
> $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class 
> $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3))
> +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null 
> else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>:- isnull(input[0, StructType(StructField(f,IntegerType,false))])
>:  +- input[0, StructType(StructField(f,IntegerType,false))]
>:- null
>+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class
>  $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0))
>   +- assertnotnull(input[0, 
> StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int)
>  +- input[0, StructType(StructField(f,IntegerType,false))].f
> +- input[0, StructType(StructField(f,IntegerType,false))]
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
> at org.apache.spark.sql.Dataset.collect(Dataset.scala:704)
> at org.apache.spark.sql.Dataset.take(Dataset.scala:725)
> at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:230)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:193)
> at org.apache.spark.sql.Dataset.show(Dataset.scala:201)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:46)
> at $iwC$$iwC$$iwC$$iwC.(:48)
> at $iwC$$iwC$$iwC.(:50)
> at $iwC$$iwC.(:52)
> at $iwC.(:54)
> at (:56)
> at .(:60)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.spark.repl.SparkIMain$Read