For a class project, I am trying to utilize 2 spark Applications communicate with each other by passing an RDD object that was created from one application to another Spark application. The first application is developed in Scala and creates an RDD and sends it to the 2nd application over the network as follows:
val logFile = "../../spark-1.3.0/README.md" // Should be some file on your system val conf = new SparkConf(); conf.setAppName("Simple Application").setMaster("local[2]") val sc = new SparkContext(conf) val nums = sc.parallelize(1 to 100, 2).toJavaRDD(); val s = new Socket("127.0.0.1", 8000); val objectOutput = new ObjectOutputStream(s.getOutputStream()); objectOutput.writeObject(nums); s.close(); The second Spark application is a Java application, which tries to receive the RDD object and then perform some operations on it. At the moment, I am trying to see if I have properly obtained the object. ServerSocket listener = null; Socket client; try{ listener = new ServerSocket(8000); }catch(Exception e){ e.printStackTrace(); } System.out.println("Listening"); try{ client = listener.accept(); ObjectInputStream objectInput = new ObjectInputStream(client.getInputStream()); Object object =(JavaRDD) objectInput.readObject(); JavaRDD tmp = (JavaRDD) object; if(tmp != null){ System.out.println(tmp.getStorageLevel().toString()); List<Partition> p = tmp.partitions(); } else{ System.out.println("variable is null"); } }catch(Exception e){ e.printStackTrace(); } The output I get is: StorageLevel(false, false, false, false, 1) java.lang.NullPointerException at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:154) at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:56) at org.apache.spark.api.java.JavaRDD.partitions(JavaRDD.scala:32) at SimpleApp.main(SimpleApp.java:35) So, System.out.println(tmp.getStorageLevel().toString()); prints out properly. But, List<Partition> p = tmp.partitions(); throws the NullPointerException. I can't seem to figure out why this is happening. In a nutshell, I am basically trying to create an RDD object in one Spark application and then send the object to another application. After receiving the object I try to make sure I received it properly by accessing its methods. Invoking the partitions() method in the original Spark application does not throw any errors either. I would greatly appreciate any suggestion on how I can solve my problem, or an alternative solution for what I am trying to accomplish. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Sending-RDD-object-over-the-network-tp22382.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org