hi,Cheng, thank you for let me know this. so what do you think is better way to debug?
On Fri, Apr 18, 2014 at 9:27 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > A tip: using println is only convenient when you are working with local > mode. When running Spark in clustering mode (standalone/YARN/Mesos), output > of println goes to executor stdout. > > > On Fri, Apr 18, 2014 at 6:53 AM, 诺铁 <noty...@gmail.com> wrote: > >> yeah, I got it.! >> using println to debug is great for me to explore spark. >> thank you very much for your kindly help. >> >> >> >> On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos < >> daniel.dara...@lynxanalytics.com> wrote: >> >>> Here's a way to debug something like this: >>> >>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => { >>> println("v1: " + v1) >>> println("v2: " + v2) >>> (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString >>> }).collect >>> >>> You get: >>> v1: 1 2 3 4 5 >>> v2: 1 2 3 4 5 >>> v1: 4 >>> v2: 1 2 3 4 5 >>> java.lang.ArrayIndexOutOfBoundsException: 1 >>> >>> reduceByKey() works kind of like regular Scala reduce(). So it will call >>> the function on the first two values, then on the result of that and the >>> next value, then the result of that and the next value, and so on. First >>> you add 2+2 and get 4. Then your function is called with v1="4" and v2 is >>> the third line. >>> >>> What you could do instead: >>> >>> scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split(" >>> ")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect >>> >>> >>> On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote: >>> >>>> HI, >>>> >>>> I am new to spark,when try to write some simple tests in spark shell, I >>>> met following problem. >>>> >>>> I create a very small text file,name it as 5.txt >>>> 1 2 3 4 5 >>>> 1 2 3 4 5 >>>> 1 2 3 4 5 >>>> >>>> and experiment in spark shell: >>>> >>>> scala> val d5 = sc.textFile("5.txt").cache() >>>> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at >>>> <console>:12 >>>> >>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split(" >>>> ")(1).toInt + v2.split(" ")(1).toInt).toString).first >>>> >>>> then error occurs: >>>> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36 >>>> java.lang.ArrayIndexOutOfBoundsException: 1 >>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >>>> at >>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120) >>>> >>>> when I delete 1 line in the file, and make it 2 lines,the result is >>>> correct, I don't understand what's the problem, please help me,thanks. >>>> >>>> >>> >> >