Re: SparkSQL where with BigDecimal type gives stacktrace

2014-03-30 Thread smallmonkey...@hotmail.com
can I get the whole operation? then i can try to locate  the error




smallmonkey...@hotmail.com

From: Manoj Samel
Date: 2014-03-31 01:16
To: user
Subject: SparkSQL where with BigDecimal type gives stacktrace
Hi,


If I do a where on BigDecimal, I get a stack trace. Changing BigDecimal to 
Double works ...

scala case class JournalLine(account: String, credit: BigDecimal, debit: 
BigDecimal, date: String, company: String, currency: String, costcenter: 
String, region: String)
defined class JournalLine
...
scala jl.where('credit  0).foreach(println)
scala.MatchError: scala.BigDecimal (of class 
scala.reflect.internal.Types$TypeRef$$anon$3)
at 
org.apache.sparksql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:41)
at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:45)
at 
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:45)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:45)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:38)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:32)
at 
org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:128)
at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:79)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:39)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:44)
at $iwC$$iwC$$iwC$$iwC.init(console:46)
at $iwC$$iwC$$iwC.init(console:48)
at $iwC$$iwC.init(console:50)
at $iwC.init(console:52)
at init(console:54)
at .init(console:58)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:777)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:795)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:840)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:752)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:607)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:610)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:935)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at 
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:883)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:981)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)


Thanks,

Re: SparkSQL where with BigDecimal type gives stacktrace

2014-03-30 Thread Manoj Samel
Hi,

Would the same issue be present for other Java type like Date ?

Converting the person/teenager example on Patricks page reproduces the
problem ...

Thanks,


scala import scala.math
import scala.math

scala case class Person(name: String, age: BigDecimal)
defined class Person

scala val people =
sc.textFile(/data/spark/examples/src/main/resources/people.txt).map(_.split(,)).map(p
= Person(p(0), BigDecimal(p(1).trim.toInt)))
14/03/31 00:23:40 INFO MemoryStore: ensureFreeSpace(32960) called with
curMem=0, maxMem=308713881
14/03/31 00:23:40 INFO MemoryStore: Block broadcast_0 stored as values to
memory (estimated size 32.2 KB, free 294.4 MB)
people: org.apache.spark.rdd.RDD[Person] = MappedRDD[3] at map at
console:20

scala people take 1
...

scala val t = people.where('age  12 )
scala.MatchError: scala.BigDecimal (of class
scala.reflect.internal.Types$TypeRef$$anon$3)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:41)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:45)
at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:45)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:45)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:38)
at
org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:32)
at
org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:128)
at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:79)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:27)
at $iwC$$iwC$$iwC$$iwC.init(console:29)
at $iwC$$iwC$$iwC.init(console:31)
at $iwC$$iwC.init(console:33)
at $iwC.init(console:35)
at init(console:37)
at .init(console:41)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:777)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:795)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:840)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:752)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:607)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:610)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:935)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:883)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:981)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)



On Sun, Mar 30, 2014 at 11:04 AM, Aaron Davidson ilike...@gmail.com wrote:

 Well, the error is coming from this case statement not matching on the
 BigDecimal type:
 https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L41

 This seems to be a bug because there is a corresponding Catalyst DataType
 for BigDecimal, just no way to produce a schema for it. A patch should be
 straightforward enough to match against typeOf[BigDecimal] assuming this
 was not for some reason intentional.


 On Sun, Mar 30, 2014 at 10:43 AM, smallmonkey...@hotmail.com 
 smallmonkey...@hotmail.com wrote:

  can I get the whole operation? then i can try to locate  the error

 --
  smallmonkey...@hotmail.com

  *From:* Manoj Samel manojsamelt...@gmail.com
 *Date:* 2014-03-31 01:16
 *To:* user user@spark.apache.org
 *Subject:* SparkSQL where with BigDecimal type gives stacktrace
  Hi,

 If I do a where on BigDecimal, I get a stack trace. Changing