I can't seem to get Spark to run the tasks in parallel. My spark code is the
following:

//Create commands to be piped into a C++ program
List<String> commandList =
makeCommandList(Integer.parseInt(step.first()),100);

JavaRDD<String> commandListRDD = ctx.parallelize(commandList,
commandList.size());

//Run the C++ application
JavaRDD<String> workerOutput = commandListRDD.pipe("RandomC++Application");
workerOutput.saveAsTextFile("output");

Running this code appears to make the system run all the tasks in series as
opposed to in parallel: any ideas as to what could be wrong? I'm guessing
that there is an issue with the serializer, due to the sample output below:
14/05/12 17:17:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 14 tasks
14/05/12 17:17:32 INFO TaskSetManager: Starting task 1.0:0 as TID 0 on
executor 2: neuro-1-3.local (PROCESS_LOCAL)
14/05/12 17:17:32 INFO TaskSetManager: Serialized task 1.0:0 as 4888 bytes
in 9 ms
14/05/12 17:17:32 INFO TaskSetManager: Starting task 1.0:1 as TID 1 on
executor 5: neuro-2-0.local (PROCESS_LOCAL)
14/05/12 17:17:32 INFO TaskSetManager: Serialized task 1.0:1 as 4890 bytes
in 1 ms
14/05/12 17:17:32 INFO TaskSetManager: Starting task 1.0:2 as TID 2 on
executor 12: neuro-1-4.local (PROCESS_LOCAL)
14/05/12 17:17:32 INFO TaskSetManager: Serialized task 1.0:2 as 4890 bytes
in 1 ms





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-spark-to-send-exactly-one-element-to-each-worker-node-tp5605p5616.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to