Thats, a good question. My first reach is timeout. Timing out after 10s of seconds should be sufficient. So there should be a timer in the singleton that runs a check every second, on when the singleton was last used, and closes the connections after a time out. Any attempts to use the connection again will create a new connection.
TD On Fri, Jul 18, 2014 at 7:59 PM, Gino Bustelo <lbust...@gmail.com> wrote: > I get TD's recommendation of sharing a connection among tasks. Now, is > there a good way to determine when to close connections? > > Gino B. > > On Jul 17, 2014, at 7:05 PM, Yan Fang <yanfang...@gmail.com> wrote: > > Hi Sean, > > Thank you. I see your point. What I was thinking is that, do computation > in a distributed fashion and do the storing from a single place. But you > are right, having multiple DB connections actually is fine. > > Thanks for answering my questions. That helps me understand the system. > > Cheers, > > Fang, Yan > yanfang...@gmail.com > +1 (206) 849-4108 > > > On Thu, Jul 17, 2014 at 2:53 PM, Sean Owen <so...@cloudera.com> wrote: > >> On Thu, Jul 17, 2014 at 10:39 PM, Yan Fang <yanfang...@gmail.com> wrote: >> > Thank you for the help. If I use TD's approache, it works and there is >> no >> > exception. Only drawback is that it will create many connections to the >> DB, >> > which I was trying to avoid. >> >> Connection-like objects aren't data that can be serialized. What would >> it mean to share one connection with N workers? that they all connect >> back to the driver, and through one DB connection there? this defeats >> the purpose of distributed computing. You want multiple DB >> connections. You can limit the number of partitions if needed. >> >> >> > Here is a snapshot of my code. Mark as red for the important code. What >> I >> > was thinking is that, if I call the collect() method, Spark Streaming >> will >> > bring the data to the driver and then the db object does not need to be >> sent >> >> The Function you pass to foreachRDD() has a reference to db though. >> That's what is making it be serialized. >> >> > to executors. My observation is that, thought exceptions are thrown, the >> > insert function still works. Any thought about that? Also paste the log >> in >> > case it helps .http://pastebin.com/T1bYvLWB >> >> Any executors that run locally might skip the serialization and >> succeed (?) but I don't think the remote executors can be succeeding. >> > >