[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551582#comment-16551582 ] Antonio Murgia commented on SPARK-24862: We can check if {{y}} is also synthesized as a field and if it is we can access it through reflection. About the inconsistency I actually don’t know. Maybe you are right, the inconsistency may cause issues. If it is the case, we might throw an exception earlier (when generating the encoder) instead of throwing it when the first action is called. I am just sketching down ideas anyway. > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists > > > Key: SPARK-24862 > URL: https://issues.apache.org/jira/browse/SPARK-24862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.1 >Reporter: Antonio Murgia >Priority: Major > > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists. > For example if I create a case class with multiple constructor argument lists: > {code:java} > case class Multi(x: String)(y: Int){code} > Scala creates a product with arity 1, while if I apply > {code:java} > Encoders.product[Multi].schema.printTreeString{code} > I get > {code:java} > root > |-- x: string (nullable = true) > |-- y: integer (nullable = false){code} > That is not consistent and leads to: > {code:java} > Error while encoding: java.lang.RuntimeException: Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682) > at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196) > at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692) > at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonf
[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551416#comment-16551416 ] Liang-Chi Hsieh commented on SPARK-24862: - Isn't it inconsistent between the schema and the ser/de? And for serializer, for example, how can we get the {{y}} from {{Multi}} objects? > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists > > > Key: SPARK-24862 > URL: https://issues.apache.org/jira/browse/SPARK-24862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.1 >Reporter: Antonio Murgia >Priority: Major > > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists. > For example if I create a case class with multiple constructor argument lists: > {code:java} > case class Multi(x: String)(y: Int){code} > Scala creates a product with arity 1, while if I apply > {code:java} > Encoders.product[Multi].schema.printTreeString{code} > I get > {code:java} > root > |-- x: string (nullable = true) > |-- y: integer (nullable = false){code} > That is not consistent and leads to: > {code:java} > Error while encoding: java.lang.RuntimeException: Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682) > at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196) > at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692) > at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInB
[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551312#comment-16551312 ] Antonio Murgia commented on SPARK-24862: Yeah, they are definitely not supported. Therefore I think they encoder generator should generate the schema based on the first parameter and the ser/de based on all the param lists. I can think of a PR if you’d like. > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists > > > Key: SPARK-24862 > URL: https://issues.apache.org/jira/browse/SPARK-24862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.1 >Reporter: Antonio Murgia >Priority: Major > > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists. > For example if I create a case class with multiple constructor argument lists: > {code:java} > case class Multi(x: String)(y: Int){code} > Scala creates a product with arity 1, while if I apply > {code:java} > Encoders.product[Multi].schema.printTreeString{code} > I get > {code:java} > root > |-- x: string (nullable = true) > |-- y: integer (nullable = false){code} > That is not consistent and leads to: > {code:java} > Error while encoding: java.lang.RuntimeException: Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682) > at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196) > at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692) > at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala
[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551174#comment-16551174 ] Liang-Chi Hsieh commented on SPARK-24862: - Even we only retrieve the first parameter list at {{getConstructorParameters}}, when we need to deserialize {{Multi}}, we don't have the {{y}} in input columns because we only serialize {{x}}. I think the multiple parameter lists case class is not supported for Encoder. > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists > > > Key: SPARK-24862 > URL: https://issues.apache.org/jira/browse/SPARK-24862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.1 >Reporter: Antonio Murgia >Priority: Major > > Spark Encoder is not consistent to scala case class semantic for multiple > argument lists. > For example if I create a case class with multiple constructor argument lists: > {code:java} > case class Multi(x: String)(y: Int){code} > Scala creates a product with arity 1, while if I apply > {code:java} > Encoders.product[Multi].schema.printTreeString{code} > I get > {code:java} > root > |-- x: string (nullable = true) > |-- y: integer (nullable = false){code} > That is not consistent and leads to: > {code:java} > Error while encoding: java.lang.RuntimeException: Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > Couldn't find y on class > it.enel.next.platform.service.events.common.massive.immutable.Multi > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).x, true) AS x#0 > assertnotnull(assertnotnull(input[0, > it.enel.next.platform.service.events.common.massive.immutable.Multi, > true])).y AS y#1 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at > it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682) > at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196) > at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692) > at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:392) > at o