Also, this is covered in the streaming programming guide in bits and pieces. http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
On Thu, Dec 11, 2014 at 4:55 AM, Ashic Mahtab <as...@live.com> wrote: > That makes sense. I'll try that. > > Thanks :) > >> From: tathagata.das1...@gmail.com >> Date: Thu, 11 Dec 2014 04:53:01 -0800 >> Subject: Re: "Session" for connections? >> To: as...@live.com >> CC: user@spark.apache.org > >> >> You could create a lazily initialized singleton factory and connection >> pool. Whenever an executor starts running the firt task that needs to >> push out data, it will create the connection pool as a singleton. And >> subsequent tasks running on the executor is going to use the >> connection pool. You will also have to intelligently shutdown the >> connections because there is not a obvious way to shut them down. You >> could have a usage timeout - shutdown connection after not being used >> for 10 x batch interval. >> >> TD >> >> On Thu, Dec 11, 2014 at 4:28 AM, Ashic Mahtab <as...@live.com> wrote: >> > Hi, >> > I was wondering if there's any way of having long running session type >> > behaviour in spark. For example, let's say we're using Spark Streaming >> > to >> > listen to a stream of events. Upon receiving an event, we process it, >> > and if >> > certain conditions are met, we wish to send a message to rabbitmq. Now, >> > rabbit clients have the concept of a connection factory, from which you >> > create a connection, from which you create a channel. You use the >> > channel to >> > get a queue, and finally the queue is what you publish messages on. >> > >> > Currently, what I'm doing can be summarised as : >> > >> > dstream.foreachRDD(x => x.forEachPartition(y => { >> > val factory = .. >> > val connection = ... >> > val channel = ... >> > val queue = channel.declareQueue(...); >> > >> > y.foreach(z => Processor.Process(z, queue)); >> > >> > cleanup the queue stuff. >> > })); >> > >> > I'm doing the same thing for using Cassandra, etc. Now in these cases, >> > the >> > session initiation is expensive, so foing it per message is not a good >> > idea. >> > However, I can't find a way to say "hey...do this per worker once and >> > only >> > once". >> > >> > Is there a better pattern to do this? >> > >> > Regards, >> > Ashic. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org