Re: Spark streaming action running the same work in parallel
I was able to get it working. Instead of using customers.flatMap to return alerts. I had to use the following: customers.foreachRDD(new FunctionJavaPairRDDlt;String, Iterablelt;QueueEvent, Void() { @Override public Void call(final JavaPairRDDString, Iterablelt;QueueEvent rdd) throws Exception { rdd.foreachPartition(new VoidFunctionIteratorlt;Tuple2lt;String, Iterablelt;QueueEvent() { @Override public void call(final IteratorTuple2lt;String, Iterablelt;QueueEvent i) throws Exception { } } } } This made sure that we only sent one alert per event for a customer. My unit test showed that there was one RDD that had both customers with their events as partitions. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-action-running-the-same-work-in-parallel-tp22613p22665.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark streaming action running the same work in parallel
Hi, I'm running a unit test that keeps failing to work with the code I wrote in Spark. Here is the output logs from my test that I ran that gets the customers from incoming events in the JSON called QueueEvent and I am trying to convert the incoming events for each customer to an alert. From the logs you can see that there is one RDD in the stream with 6 events (3 for each customer). Here is the code snippet that gets the customers and gets the alerts for all the customers. Spark is doing the same work but on different threads as you can see in the logs Executor task launch worker-x. This is throwing off the results when comparing against the expected results. Here is some of the code in the test for troubleshooting -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-action-running-the-same-work-in-parallel-tp22613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org