Re: Spark streaming action running the same work in parallel

2015-04-27 Thread ColinMc
I was able to get it working. Instead of using customers.flatMap to return
alerts. I had to use the following:

customers.foreachRDD(new FunctionJavaPairRDDlt;String,
Iterablelt;QueueEvent, Void() {
@Override
public Void call(final JavaPairRDDString,
Iterablelt;QueueEvent rdd) throws Exception {
rdd.foreachPartition(new
VoidFunctionIteratorlt;Tuple2lt;String, Iterablelt;QueueEvent() {
@Override
public void call(final IteratorTuple2lt;String,
Iterablelt;QueueEvent i)
throws Exception {
}
   }
   }
}

This made sure that we only sent one alert per event for a customer. My unit
test showed that there was one RDD that had both customers with their events
as partitions.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-action-running-the-same-work-in-parallel-tp22613p22665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark streaming action running the same work in parallel

2015-04-22 Thread ColinMc
Hi,

I'm running a unit test that keeps failing to work with the code I wrote in
Spark.

Here is the output logs from my test that I ran that gets the customers from
incoming events in the JSON called QueueEvent and I am trying to convert the
incoming events for each customer to an alert.



From the logs you can see that there is one RDD in the stream with 6 events
(3 for each customer).

Here is the code snippet that gets the customers and gets the alerts for all
the customers.



Spark is doing the same work but on different threads as you can see in the
logs Executor task launch worker-x. This is throwing off the results when
comparing against the expected results.

Here is some of the code in the test for troubleshooting




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-action-running-the-same-work-in-parallel-tp22613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org