Re: scala.Double vs java.lang.Double in RDD

2015-03-04 Thread Imran Rashid
This doesn't involve spark at all, I think this is entirely an issue with
how scala deals w/ primitives and boxing.  Often it can hide the details
for you, but IMO it just leads to far more confusing errors when things
don't work out.  The issue here is that your map has value type Any, which
leads scala to leave it as a boxed java.lang.Double.

scala val x = 1.5
 x: Double = 1.5
 scala x.getClass()
 res0: Class[Double] = double
 scala x.getClass() == classOf[java.lang.Double]
 res1: Boolean = false
 scala x.getClass() == classOf[Double]
 res2: Boolean = true
 scala val arr = Array(1.5,2.5)
 arr: Array[Double] = Array(1.5, 2.5)
 scala arr.getClass().getComponentType() == x.getClass()
 res5: Boolean = true
 scala arr.getClass().getComponentType() == classOf[java.lang.Double]
 res6: Boolean = false

//this map has java.lang.Double
 scala val map: Map[String, Any] = arr.map{x = x.toString - x}.toMap
 map: Map[String,Any] = Map(1.5 - 1.5, 2.5 - 2.5)
 scala map(1.5).getClass()
 res15: Class[_] = class java.lang.Double
 scala map(1.5).getClass() == x.getClass()
 res10: Boolean = false
 scala map(1.5).getClass() == classOf[java.lang.Double]
 res11: Boolean = true
 //this one has Double
 scala val map2: Map[String, Double] = arr.map{x = x.toString - x}.toMap
 map2: Map[String,Double] = Map(1.5 - 1.5, 2.5 - 2.5)
 scala map2(1.5).getClass()
 res12: Class[Double] = double
 scala map2(1.5).getClass() == x.getClass()
 res13: Boolean = true
 scala map2(1.5).getClass() == classOf[java.lang.Double]
 res14: Boolean = false


On Wed, Mar 4, 2015 at 3:17 AM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 I have a function with signature

   def aggFun1(rdd: RDD[(Long, (Long, Double))]):
 RDD[(Long, Any)]

 and one with

   def aggFun2[_Key: ClassTag, _Index](rdd: RDD[(_Key, (_Index, Double))]):
 RDD[(_Key, Double)]

 where all Double classes involved are scala.Double classes (according
 to IDEA) and my implementation of aggFun1 is just calling aggFun2 (type
 parameters _Key and _Index are inferred by the Scala compiler).

 Now I am writing a test as follows:

   val result: Map[Long, Any] = aggFun1(input).collect().toMap
   result.values.foreach(v = println(v.getClass))
   result.values.foreach(_ shouldBe a[Double])

 and I get the following output:

   class java.lang.Double
   class java.lang.Double
   [info] avg
   [info] - should compute the average *** FAILED ***
   [info]   1.75 was not an instance of double, but an instance of
 java.lang.Double

 So I am wondering about what magic is going on here. Are scala.Double
 values in RDDs automatically converted to java.lang.Doubles or am I just
 missing the implicit back-conversion etc.?

 Any help appreciated,
 Tobias




Re: scala.Double vs java.lang.Double in RDD

2015-03-04 Thread Tobias Pfeiffer
Hi,

On Thu, Mar 5, 2015 at 12:20 AM, Imran Rashid iras...@cloudera.com wrote:

 This doesn't involve spark at all, I think this is entirely an issue with
 how scala deals w/ primitives and boxing.  Often it can hide the details
 for you, but IMO it just leads to far more confusing errors when things
 don't work out.  The issue here is that your map has value type Any, which
 leads scala to leave it as a boxed java.lang.Double.


I see, thank you very much for your explanation and the code examples!
Helps very much!

Thanks
Tobias


scala.Double vs java.lang.Double in RDD

2015-03-04 Thread Tobias Pfeiffer
Hi,

I have a function with signature

  def aggFun1(rdd: RDD[(Long, (Long, Double))]):
RDD[(Long, Any)]

and one with

  def aggFun2[_Key: ClassTag, _Index](rdd: RDD[(_Key, (_Index, Double))]):
RDD[(_Key, Double)]

where all Double classes involved are scala.Double classes (according
to IDEA) and my implementation of aggFun1 is just calling aggFun2 (type
parameters _Key and _Index are inferred by the Scala compiler).

Now I am writing a test as follows:

  val result: Map[Long, Any] = aggFun1(input).collect().toMap
  result.values.foreach(v = println(v.getClass))
  result.values.foreach(_ shouldBe a[Double])

and I get the following output:

  class java.lang.Double
  class java.lang.Double
  [info] avg
  [info] - should compute the average *** FAILED ***
  [info]   1.75 was not an instance of double, but an instance of
java.lang.Double

So I am wondering about what magic is going on here. Are scala.Double
values in RDDs automatically converted to java.lang.Doubles or am I just
missing the implicit back-conversion etc.?

Any help appreciated,
Tobias