Hi Cheng, Thank you for your response. While I tried your solution, .mapValues { positions => for { a <- positions.iterator b <- positions.iterator if lessThan(a, b) && distance(a, b) < 100 } yield { (a, b) } }
I got the result *res29: org.apache.spark.rdd.RDD[(String, Iterator[((Double, Double), (Double, Double))])] = MappedValuesRDD[30] at mapValues at <console>:33* But when I try to print the first element of the result say, *res29.first* I get the following exception: /java.io.NotSerializableException: scala.collection.Iterator$$anon$13 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1181) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1375) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1171) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/06/05 07:09:53 WARN TaskSetManager: Lost TID 15 (task 26.0:0) 14/06/05 07:09:53 ERROR TaskSetManager: Task 26.0:0 had a not serializable result: java.io.NotSerializableException: scala.collection.Iterator$$anon$13; not retrying/ Can you please let me know how I can get over this problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7027.html Sent from the Apache Spark User List mailing list archive at Nabble.com.