[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069044#comment-15069044 ] Cheng Lian commented on SPARK-12478: I'm leaving this ticket open since we also need to backport this to branch-1.6 after the release. > Dataset fields of product types can't be null > - > > Key: SPARK-12478 > URL: https://issues.apache.org/jira/browse/SPARK-12478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > Labels: backport-needed > > Spark shell snippet for reproduction: > {code} > import sqlContext.implicits._ > case class Inner(f: Int) > case class Outer(i: Inner) > Seq(Outer(null)).toDS().toDF().show() > Seq(Outer(null)).toDS().show() > {code} > Expected output should be: > {noformat} > ++ > | i| > ++ > |null| > ++ > ++ > | i| > ++ > |null| > ++ > {noformat} > Actual output: > {noformat} > +--+ > | i| > +--+ > |[null]| > +--+ > java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: > Null value appeared in non-nullable field Inner.f of type scala.Int. If the > schema is inferred from a Scala tuple/case class, or a Java bean, please try > to use scala.Option[_] or other nullable types (e.g. java.lang.Integer > instead of int/scala.Int). > newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, > StructType(StructField(f,IntegerType,false))])) null else newinstance(class > $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class > $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3)) > +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null > else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) >:- isnull(input[0, StructType(StructField(f,IntegerType,false))]) >: +- input[0, StructType(StructField(f,IntegerType,false))] >:- null >+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) > +- assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int) > +- input[0, StructType(StructField(f,IntegerType,false))].f > +- input[0, StructType(StructField(f,IntegerType,false))] > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:704) > at org.apache.spark.sql.Dataset.take(Dataset.scala:725) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:240) > at org.apache.spark.sql.Dataset.show(Dataset.scala:230) > at org.apache.spark.sql.Dataset.show(Dataset.scala:193) > at org.apache.spark.sql.Dataset.show(Dataset.scala:201) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:46) > at $iwC$$iwC$$iwC$$iwC.(:48) > at $iwC$$iwC$$iwC.(:50) > at $iwC$$iwC.(:52) > at $iwC.(:54) > at (:56) > at .(:60) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > a
[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068052#comment-15068052 ] Apache Spark commented on SPARK-12478: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/10431 > Dataset fields of product types can't be null > - > > Key: SPARK-12478 > URL: https://issues.apache.org/jira/browse/SPARK-12478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > > Spark shell snippet for reproduction: > {code} > import sqlContext.implicits._ > case class Inner(f: Int) > case class Outer(i: Inner) > Seq(Outer(null)).toDS().toDF().show() > Seq(Outer(null)).toDS().show() > {code} > Expected output should be: > {noformat} > ++ > | i| > ++ > |null| > ++ > ++ > | i| > ++ > |null| > ++ > {noformat} > Actual output: > {noformat} > +--+ > | i| > +--+ > |[null]| > +--+ > java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: > Null value appeared in non-nullable field Inner.f of type scala.Int. If the > schema is inferred from a Scala tuple/case class, or a Java bean, please try > to use scala.Option[_] or other nullable types (e.g. java.lang.Integer > instead of int/scala.Int). > newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, > StructType(StructField(f,IntegerType,false))])) null else newinstance(class > $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class > $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3)) > +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null > else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) >:- isnull(input[0, StructType(StructField(f,IntegerType,false))]) >: +- input[0, StructType(StructField(f,IntegerType,false))] >:- null >+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) > +- assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int) > +- input[0, StructType(StructField(f,IntegerType,false))].f > +- input[0, StructType(StructField(f,IntegerType,false))] > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:704) > at org.apache.spark.sql.Dataset.take(Dataset.scala:725) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:240) > at org.apache.spark.sql.Dataset.show(Dataset.scala:230) > at org.apache.spark.sql.Dataset.show(Dataset.scala:193) > at org.apache.spark.sql.Dataset.show(Dataset.scala:201) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:46) > at $iwC$$iwC$$iwC$$iwC.(:48) > at $iwC$$iwC$$iwC.(:50) > at $iwC$$iwC.(:52) > at $iwC.(:54) > at (:56) > at .(:60) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.repl.Spar
[jira] [Commented] (SPARK-12478) Dataset fields of product types can't be null
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068043#comment-15068043 ] Cheng Lian commented on SPARK-12478: [~marmbrus] I guess this issue probably doesn't block 1.6 since Dataset is still experimental? > Dataset fields of product types can't be null > - > > Key: SPARK-12478 > URL: https://issues.apache.org/jira/browse/SPARK-12478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 2.0.0 >Reporter: Cheng Lian >Assignee: Apache Spark > > Spark shell snippet for reproduction: > {code} > import sqlContext.implicits._ > case class Inner(f: Int) > case class Outer(i: Inner) > Seq(Outer(null)).toDS().toDF().show() > Seq(Outer(null)).toDS().show() > {code} > Expected output should be: > {noformat} > ++ > | i| > ++ > |null| > ++ > ++ > | i| > ++ > |null| > ++ > {noformat} > Actual output: > {noformat} > +--+ > | i| > +--+ > |[null]| > +--+ > java.lang.RuntimeException: Error while decoding: java.lang.RuntimeException: > Null value appeared in non-nullable field Inner.f of type scala.Int. If the > schema is inferred from a Scala tuple/case class, or a Java bean, please try > to use scala.Option[_] or other nullable types (e.g. java.lang.Integer > instead of int/scala.Int). > newinstance(class $iwC$$iwC$Outer,if (isnull(input[0, > StructType(StructField(f,IntegerType,false))])) null else newinstance(class > $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)),false,ObjectType(class > $iwC$$iwC$Outer),Some($iwC$$iwC@6ab35ce3)) > +- if (isnull(input[0, StructType(StructField(f,IntegerType,false))])) null > else newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) >:- isnull(input[0, StructType(StructField(f,IntegerType,false))]) >: +- input[0, StructType(StructField(f,IntegerType,false))] >:- null >+- newinstance(class $iwC$$iwC$Inner,assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int),false,ObjectType(class > $iwC$$iwC$Inner),Some($iwC$$iwC@6616b9e0)) > +- assertnotnull(input[0, > StructType(StructField(f,IntegerType,false))].f,Inner,f,scala.Int) > +- input[0, StructType(StructField(f,IntegerType,false))].f > +- input[0, StructType(StructField(f,IntegerType,false))] > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:224) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > org.apache.spark.sql.Dataset$$anonfun$collect$2.apply(Dataset.scala:704) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:704) > at org.apache.spark.sql.Dataset.take(Dataset.scala:725) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:240) > at org.apache.spark.sql.Dataset.show(Dataset.scala:230) > at org.apache.spark.sql.Dataset.show(Dataset.scala:193) > at org.apache.spark.sql.Dataset.show(Dataset.scala:201) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:46) > at $iwC$$iwC$$iwC$$iwC.(:48) > at $iwC$$iwC$$iwC.(:50) > at $iwC$$iwC.(:52) > at $iwC.(:54) > at (:56) > at .(:60) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.repl.SparkIMain$Read