[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-10-17 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583873#comment-15583873
 ] 

Aris Vlasakakis commented on SPARK-17368:
-

That is great, thank you for the help with this.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>Assignee: Jakob Odersky
> Fix For: 2.1.0
>
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-09-22 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514085#comment-15514085
 ] 

Aris Vlasakakis commented on SPARK-17131:
-

Hi there,

I discovered a bug, and it also pertains to code generation with many columns 
-- although in my case the bugs within Janino code generation in Catalyst  
start after several hundred columns. Are these somehow related?

My bug report was merged into this one: 
[https://issues.apache.org/jira/browse/SPARK-16845]

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-06 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469564#comment-15469564
 ] 

Aris Vlasakakis commented on SPARK-17368:
-

It goes from inconvenient to actually prohibitive in a practical sense. I have 
a Dataset[Something], and inside case class Something I have various other case 
classes, and somewhere inside there there is a particular value class. It is so 
crazy to do manual unwrapping and rewrapping that at this point I just decided 
to eat the performance cost and use a regular class, not value class (I removed 
the 'extends AnyVal').

More generally, specially accommodating for value classes is *really hard* in a 
practical setting because if I have a whole bunch of ADTs and other case 
classes I'm working with, how do I know if anywhere in my domain I used a 
*value class* and I suddenly have to jump through a bunch of hoops just so 
Spark doesn't blow up? If I just had a Dataset[ThisIsAValueClass] with the 
top-level class being a value class, what you're saying is easy, but in 
practice the value class is one of many things somewhere deeper.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-02 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459649#comment-15459649
 ] 

Aris Vlasakakis commented on SPARK-17368:
-

I actually had an identical first thought from my experience of value classes 
and how they disappear in the JVM byte code. It would be very helpful if in the 
documentation somewhere that it was said that Scala value classes were 
explicitly not supported by spark datasets. The error messages are extremely 
cryptic and very confusing. Even better would be some kind of macro support or 
whatever else by Spark that would find and call it out of your code, but that's 
wishful thinking.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17367) Cannot define value classes in REPL

2016-09-02 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457816#comment-15457816
 ] 

Aris Vlasakakis commented on SPARK-17367:
-

No, value classes work in the regular Scala REPL, just not spark-shell.

> Cannot define value classes in REPL
> ---
>
> Key: SPARK-17367
> URL: https://issues.apache.org/jira/browse/SPARK-17367
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Reporter: Jakob Odersky
>
> It is currently not possible to define a class extending `AnyVal` in the 
> REPL. The underlying reason is the {{-Yrepl-class-based}} option used by 
> Spark Shell.
> The report here is more of an FYI for anyone stumbling upon the problem, see 
> the upstream issue [https://issues.scala-lang.org/browse/SI-9910] for any 
> progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456979#comment-15456979
 ] 

Aris Vlasakakis commented on SPARK-17368:
-

Yes, agreed that this code cannot be pasted into {{spark-shell}}. My assumption 
was that this would be a compiled JAR and then passed into {{spark-submit}} -- 
a normal Spark application. When you do so, the code compiles but Spark breaks 
at runtime.

> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Aris Vlasakakis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aris Vlasakakis updated SPARK-17368:

Description: 
Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
and 1.6.X.

This simple Spark 2 application demonstrates that the code will compile, but 
will break at runtime with the error. The value class is of course *FeatureId*, 
as it extends AnyVal.

{noformat}
Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
java.lang.RuntimeException: Couldn't find v on int
assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
+- assertnotnull(input[0, int, true], top level non-flat input object).v
   +- assertnotnull(input[0, int, true], top level non-flat input object)
  +- input[0, int, true]".
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
{noformat}

Test code for Spark 2.0.0:

{noformat}
import org.apache.spark.sql.{Dataset, SparkSession}

object BreakSpark {
  case class FeatureId(v: Int) extends AnyVal

  def main(args: Array[String]): Unit = {
val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("warn")
val ds: Dataset[FeatureId] = spark.createDataset(seq)
println(s"BREAK HERE: ${ds.count}")
  }
}

{noformat}


  was:
Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
and 1.6.X.

This simple Spark 2 application demonstrates that the code will compile, but 
will break at runtime with the error. The value class is of course *FeatureId*, 
as it extends AnyVal.

{noformat}
Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
java.lang.RuntimeException: Couldn't find v on int
assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
+- assertnotnull(input[0, int, true], top level non-flat input object).v
   +- assertnotnull(input[0, int, true], top level non-flat input object)
  +- input[0, int, true]".
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
{noformat}

Test code:

{noformat}
import org.apache.spark.sql.{Dataset, SparkSession}

object BreakSpark {
  case class FeatureId(v: Int) extends AnyVal

  def main(args: Array[String]): Unit = {
val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("warn")
val ds: Dataset[FeatureId] = spark.createDataset(seq)
println(s"BREAK HERE: ${ds.count}")
  }
}

{noformat}



> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code for Spark 2.0.0:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> 

[jira] [Updated] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Aris Vlasakakis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aris Vlasakakis updated SPARK-17368:

Environment: 
JDK 8 on MacOS
Scala 2.11.8
Spark 2.0.0

  was:Java 8 on MacOS


> Scala value classes create encoder problems and break at runtime
> 
>
> Key: SPARK-17368
> URL: https://issues.apache.org/jira/browse/SPARK-17368
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 2.0.0
> Environment: JDK 8 on MacOS
> Scala 2.11.8
> Spark 2.0.0
>Reporter: Aris Vlasakakis
>
> Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
> and 1.6.X.
> This simple Spark 2 application demonstrates that the code will compile, but 
> will break at runtime with the error. The value class is of course 
> *FeatureId*, as it extends AnyVal.
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
> java.lang.RuntimeException: Couldn't find v on int
> assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
> +- assertnotnull(input[0, int, true], top level non-flat input object).v
>+- assertnotnull(input[0, int, true], top level non-flat input object)
>   +- input[0, int, true]".
> at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> at 
> org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
> {noformat}
> Test code:
> {noformat}
> import org.apache.spark.sql.{Dataset, SparkSession}
> object BreakSpark {
>   case class FeatureId(v: Int) extends AnyVal
>   def main(args: Array[String]): Unit = {
> val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
> val spark = SparkSession.builder.getOrCreate()
> import spark.implicits._
> spark.sparkContext.setLogLevel("warn")
> val ds: Dataset[FeatureId] = spark.createDataset(seq)
> println(s"BREAK HERE: ${ds.count}")
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17368) Scala value classes create encoder problems and break at runtime

2016-09-01 Thread Aris Vlasakakis (JIRA)
Aris Vlasakakis created SPARK-17368:
---

 Summary: Scala value classes create encoder problems and break at 
runtime
 Key: SPARK-17368
 URL: https://issues.apache.org/jira/browse/SPARK-17368
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.0.0, 1.6.2
 Environment: Java 8 on MacOS
Reporter: Aris Vlasakakis


Using Scala value classes as the inner type for Datasets breaks in Spark 2.0 
and 1.6.X.

This simple Spark 2 application demonstrates that the code will compile, but 
will break at runtime with the error. The value class is of course *FeatureId*, 
as it extends AnyVal.

{noformat}
Exception in thread "main" java.lang.RuntimeException: Error while encoding: 
java.lang.RuntimeException: Couldn't find v on int
assertnotnull(input[0, int, true], top level non-flat input object).v AS v#0
+- assertnotnull(input[0, int, true], top level non-flat input object).v
   +- assertnotnull(input[0, int, true], top level non-flat input object)
  +- input[0, int, true]".
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:279)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:421)
{noformat}

Test code:

{noformat}
import org.apache.spark.sql.{Dataset, SparkSession}

object BreakSpark {
  case class FeatureId(v: Int) extends AnyVal

  def main(args: Array[String]): Unit = {
val seq = Seq(FeatureId(1), FeatureId(2), FeatureId(3))
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("warn")
val ds: Dataset[FeatureId] = spark.createDataset(seq)
println(s"BREAK HERE: ${ds.count}")
  }
}

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org