[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-21 Thread Antonio Murgia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551582#comment-16551582
 ] 

Antonio Murgia commented on SPARK-24862:


We can check if {{y}} is also synthesized as a field and if it is we can access 
it through reflection. About the inconsistency I actually don’t know. Maybe you 
are right, the inconsistency may cause issues. If it is the case, we might 
throw an exception earlier (when generating the encoder) instead of throwing it 
when the first action is called. I am just sketching down ideas anyway.

> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists
> 
>
> Key: SPARK-24862
> URL: https://issues.apache.org/jira/browse/SPARK-24862
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Antonio Murgia
>Priority: Major
>
> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists.
> For example if I create a case class with multiple constructor argument lists:
> {code:java}
> case class Multi(x: String)(y: Int){code}
> Scala creates a product with arity 1, while if I apply 
> {code:java}
> Encoders.product[Multi].schema.printTreeString{code}
> I get
> {code:java}
> root
> |-- x: string (nullable = true)
> |-- y: integer (nullable = false){code}
> That is not consistent and leads to:
> {code:java}
> Error while encoding: java.lang.RuntimeException: Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
> at org.scalatest.Transformer.apply(Transformer.scala:20)
> at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682)
> at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
> at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692)
> at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> at 
> org.scalatest.SuperEngine$$anonf

[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-20 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551416#comment-16551416
 ] 

Liang-Chi Hsieh commented on SPARK-24862:
-

Isn't it inconsistent between the schema and the ser/de? And for serializer, 
for example, how can we get the {{y}} from {{Multi}} objects?

> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists
> 
>
> Key: SPARK-24862
> URL: https://issues.apache.org/jira/browse/SPARK-24862
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Antonio Murgia
>Priority: Major
>
> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists.
> For example if I create a case class with multiple constructor argument lists:
> {code:java}
> case class Multi(x: String)(y: Int){code}
> Scala creates a product with arity 1, while if I apply 
> {code:java}
> Encoders.product[Multi].schema.printTreeString{code}
> I get
> {code:java}
> root
> |-- x: string (nullable = true)
> |-- y: integer (nullable = false){code}
> That is not consistent and leads to:
> {code:java}
> Error while encoding: java.lang.RuntimeException: Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
> at org.scalatest.Transformer.apply(Transformer.scala:20)
> at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682)
> at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
> at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692)
> at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInB

[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-20 Thread Antonio Murgia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551312#comment-16551312
 ] 

Antonio Murgia commented on SPARK-24862:


Yeah, they are definitely not supported. Therefore I think they encoder 
generator should generate the schema based on the first parameter and the 
ser/de based on all the param lists. I can think of a PR if you’d like.

> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists
> 
>
> Key: SPARK-24862
> URL: https://issues.apache.org/jira/browse/SPARK-24862
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Antonio Murgia
>Priority: Major
>
> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists.
> For example if I create a case class with multiple constructor argument lists:
> {code:java}
> case class Multi(x: String)(y: Int){code}
> Scala creates a product with arity 1, while if I apply 
> {code:java}
> Encoders.product[Multi].schema.printTreeString{code}
> I get
> {code:java}
> root
> |-- x: string (nullable = true)
> |-- y: integer (nullable = false){code}
> That is not consistent and leads to:
> {code:java}
> Error while encoding: java.lang.RuntimeException: Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
> at org.scalatest.Transformer.apply(Transformer.scala:20)
> at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682)
> at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
> at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692)
> at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala

[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-20 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551174#comment-16551174
 ] 

Liang-Chi Hsieh commented on SPARK-24862:
-

Even we only retrieve the first parameter list at {{getConstructorParameters}}, 
when we need to deserialize {{Multi}}, we don't have the {{y}} in input columns 
because we only serialize {{x}}. I think the multiple parameter lists case 
class is not supported for Encoder.

> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists
> 
>
> Key: SPARK-24862
> URL: https://issues.apache.org/jira/browse/SPARK-24862
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Antonio Murgia
>Priority: Major
>
> Spark Encoder is not consistent to scala case class semantic for multiple 
> argument lists.
> For example if I create a case class with multiple constructor argument lists:
> {code:java}
> case class Multi(x: String)(y: Int){code}
> Scala creates a product with arity 1, while if I apply 
> {code:java}
> Encoders.product[Multi].schema.printTreeString{code}
> I get
> {code:java}
> root
> |-- x: string (nullable = true)
> |-- y: integer (nullable = false){code}
> That is not consistent and leads to:
> {code:java}
> Error while encoding: java.lang.RuntimeException: Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> Couldn't find y on class 
> it.enel.next.platform.service.events.common.massive.immutable.Multi
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).x, true) AS x#0
> assertnotnull(assertnotnull(input[0, 
> it.enel.next.platform.service.events.common.massive.immutable.Multi, 
> true])).y AS y#1
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:464)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:464)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply$mcV$sp(ParquetQueueSuite.scala:48)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at 
> it.enel.next.platform.service.events.common.massive.immutable.ParquetQueueSuite$$anonfun$1.apply(ParquetQueueSuite.scala:46)
> at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
> at org.scalatest.Transformer.apply(Transformer.scala:20)
> at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1682)
> at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
> at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1679)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1692)
> at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1692)
> at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1685)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1750)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at o