Here's a way to debug something like this:

scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => {
           println("v1: " + v1)
           println("v2: " + v2)
           (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString
       }).collect

You get:
v1: 1 2 3 4 5
v2: 1 2 3 4 5
v1: 4
v2: 1 2 3 4 5
java.lang.ArrayIndexOutOfBoundsException: 1

reduceByKey() works kind of like regular Scala reduce(). So it will call
the function on the first two values, then on the result of that and the
next value, then the result of that and the next value, and so on. First
you add 2+2 and get 4. Then your function is called with v1="4" and v2 is
the third line.

What you could do instead:

scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split("
")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect


On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote:

> HI,
>
> I am new to spark,when try to write some simple tests in spark shell, I
> met following problem.
>
> I create a very small text file,name it as 5.txt
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5
>
> and experiment in spark shell:
>
> scala> val d5 = sc.textFile("5.txt").cache()
> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at
> <console>:12
>
> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split("
> ")(1).toInt + v2.split(" ")(1).toInt).toString).first
>
> then error occurs:
> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36
> java.lang.ArrayIndexOutOfBoundsException: 1
> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
>  at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
> at
> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120)
>
> when I delete 1 line in the file, and make it 2 lines,the result is
> correct, I don't understand what's the problem, please help me,thanks.
>
>

Reply via email to