For a class project, I am trying to utilize 2 spark Applications communicate
with each other by passing an RDD object that was created from one
application to another Spark application. The first application is developed
in Scala and creates an RDD and sends it to the 2nd application over the
network as follows:

    val logFile = "../../spark-1.3.0/README.md" // Should be some file on
your system
    val conf = new SparkConf();
    conf.setAppName("Simple Application").setMaster("local[2]")
    val sc = new SparkContext(conf)
    val nums = sc.parallelize(1 to 100, 2).toJavaRDD();
    val s = new Socket("127.0.0.1", 8000);
    val objectOutput = new ObjectOutputStream(s.getOutputStream());
    objectOutput.writeObject(nums);
    s.close();
The second Spark application is a Java application, which tries to receive
the RDD object and then perform some operations on it. At the moment, I am
trying to see if I have properly obtained the object.

        ServerSocket listener = null;
        Socket client;

        try{
            listener = new ServerSocket(8000);
        }catch(Exception e){
            e.printStackTrace();
        }
            System.out.println("Listening");
            try{
                client = listener.accept();
                ObjectInputStream objectInput = new
ObjectInputStream(client.getInputStream());
                Object object =(JavaRDD) objectInput.readObject();
                JavaRDD tmp = (JavaRDD) object;

                if(tmp != null){ 
                    System.out.println(tmp.getStorageLevel().toString());       
                                                
                    List<Partition> p = tmp.partitions();
                }
                else{
                    System.out.println("variable is null");
                }

            }catch(Exception e){
                e.printStackTrace();
            }
The output I get is:

StorageLevel(false, false, false, false, 1)
java.lang.NullPointerException
    at
org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:154)
    at
org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at
org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:56)
    at org.apache.spark.api.java.JavaRDD.partitions(JavaRDD.scala:32)
    at SimpleApp.main(SimpleApp.java:35)
So, System.out.println(tmp.getStorageLevel().toString()); prints out
properly. But, List<Partition> p = tmp.partitions(); throws the
NullPointerException. I can't seem to figure out why this is happening.

In a nutshell, I am basically trying to create an RDD object in one Spark
application and then send the object to another application. After receiving
the object I try to make sure I received it properly by accessing its
methods. Invoking the partitions() method in the original Spark application
does not throw any errors either. I would greatly appreciate any suggestion
on how I can solve my problem, or an alternative solution for what I am
trying to accomplish.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sending-RDD-object-over-the-network-tp22382.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to