Thanks for the response. The fact that they'll get killed when the sc is closed 
is quite useful in this case. I'm looking at a cluster of four workers trying 
to send messages to rabbitmq, which can have many sessions open without much 
penalty. For other stores (like say SQL) and larger clusters, the idle 
connections would be a bigger issues. One last question...if I do leave them 
open till the end of the job, does that mean one per worker or one per rdd 
partition? I'd imagine the former, but wanted to confirm.
Regards,Ashic.

> From: tathagata.das1...@gmail.com
> Date: Sat, 13 Dec 2014 15:16:46 +0800
> Subject: Re: "Session" for connections?
> To: as...@live.com
> CC: user@spark.apache.org
> 
> That is your call. If you think it is not a problem to have large
> number of open but idle connections to your data store, then it is
> probably okay to let them hang around until the executor is killed
> (when the sparkContext is closed).
> 
> TD
> 
> On Fri, Dec 12, 2014 at 11:51 PM, Ashic Mahtab <as...@live.com> wrote:
> > Looks like the way to go.
> >
> > Quick question regarding the connection pool approach - if I have a
> > connection that gets lazily instantiated, will it automatically die if I
> > kill the driver application? In my scenario, I can keep a connection open
> > for the duration of the app, and aren't that concerned about having idle
> > connections as long as the app is running. For this specific scenario, do I
> > still need to think of the timeout, or would it be shut down when the driver
> > stops? (Using a stand alone cluster btw).
> >
> > Regards,
> > Ashic.
> >
> >> From: tathagata.das1...@gmail.com
> >> Date: Thu, 11 Dec 2014 06:33:49 -0800
> >
> >> Subject: Re: "Session" for connections?
> >> To: as...@live.com
> >> CC: user@spark.apache.org
> >>
> >> Also, this is covered in the streaming programming guide in bits and
> >> pieces.
> >>
> >> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
> >>
> >> On Thu, Dec 11, 2014 at 4:55 AM, Ashic Mahtab <as...@live.com> wrote:
> >> > That makes sense. I'll try that.
> >> >
> >> > Thanks :)
> >> >
> >> >> From: tathagata.das1...@gmail.com
> >> >> Date: Thu, 11 Dec 2014 04:53:01 -0800
> >> >> Subject: Re: "Session" for connections?
> >> >> To: as...@live.com
> >> >> CC: user@spark.apache.org
> >> >
> >> >>
> >> >> You could create a lazily initialized singleton factory and connection
> >> >> pool. Whenever an executor starts running the firt task that needs to
> >> >> push out data, it will create the connection pool as a singleton. And
> >> >> subsequent tasks running on the executor is going to use the
> >> >> connection pool. You will also have to intelligently shutdown the
> >> >> connections because there is not a obvious way to shut them down. You
> >> >> could have a usage timeout - shutdown connection after not being used
> >> >> for 10 x batch interval.
> >> >>
> >> >> TD
> >> >>
> >> >> On Thu, Dec 11, 2014 at 4:28 AM, Ashic Mahtab <as...@live.com> wrote:
> >> >> > Hi,
> >> >> > I was wondering if there's any way of having long running session
> >> >> > type
> >> >> > behaviour in spark. For example, let's say we're using Spark
> >> >> > Streaming
> >> >> > to
> >> >> > listen to a stream of events. Upon receiving an event, we process it,
> >> >> > and if
> >> >> > certain conditions are met, we wish to send a message to rabbitmq.
> >> >> > Now,
> >> >> > rabbit clients have the concept of a connection factory, from which
> >> >> > you
> >> >> > create a connection, from which you create a channel. You use the
> >> >> > channel to
> >> >> > get a queue, and finally the queue is what you publish messages on.
> >> >> >
> >> >> > Currently, what I'm doing can be summarised as :
> >> >> >
> >> >> > dstream.foreachRDD(x => x.forEachPartition(y => {
> >> >> > val factory = ..
> >> >> > val connection = ...
> >> >> > val channel = ...
> >> >> > val queue = channel.declareQueue(...);
> >> >> >
> >> >> > y.foreach(z => Processor.Process(z, queue));
> >> >> >
> >> >> > cleanup the queue stuff.
> >> >> > }));
> >> >> >
> >> >> > I'm doing the same thing for using Cassandra, etc. Now in these
> >> >> > cases,
> >> >> > the
> >> >> > session initiation is expensive, so foing it per message is not a
> >> >> > good
> >> >> > idea.
> >> >> > However, I can't find a way to say "hey...do this per worker once and
> >> >> > only
> >> >> > once".
> >> >> >
> >> >> > Is there a better pattern to do this?
> >> >> >
> >> >> > Regards,
> >> >> > Ashic.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> >> For additional commands, e-mail: user-h...@spark.apache.org
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to