Thats, a good question. My first reach is timeout. Timing out after 10s of
seconds should be sufficient. So there should be a timer in the singleton
that runs a check every second, on when the singleton was last used, and
closes the connections after a time out. Any attempts to use the connection
again will create a new connection.

TD


On Fri, Jul 18, 2014 at 7:59 PM, Gino Bustelo <lbust...@gmail.com> wrote:

> I get TD's recommendation of sharing a connection among tasks. Now, is
> there a good way to determine when to close connections?
>
> Gino B.
>
> On Jul 17, 2014, at 7:05 PM, Yan Fang <yanfang...@gmail.com> wrote:
>
> Hi Sean,
>
> Thank you. I see your point. What I was thinking is that, do computation
> in a distributed fashion and do the storing from a single place. But you
> are right, having multiple DB connections actually is fine.
>
> Thanks for answering my questions. That helps me understand the system.
>
> Cheers,
>
> Fang, Yan
> yanfang...@gmail.com
> +1 (206) 849-4108
>
>
> On Thu, Jul 17, 2014 at 2:53 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> On Thu, Jul 17, 2014 at 10:39 PM, Yan Fang <yanfang...@gmail.com> wrote:
>> > Thank you for the help. If I use TD's approache, it works and there is
>> no
>> > exception. Only drawback is that it will create many connections to the
>> DB,
>> > which I was trying to avoid.
>>
>> Connection-like objects aren't data that can be serialized. What would
>> it mean to share one connection with N workers? that they all connect
>> back to the driver, and through one DB connection there? this defeats
>> the purpose of distributed computing. You want multiple DB
>> connections. You can limit the number of partitions if needed.
>>
>>
>> > Here is a snapshot of my code. Mark as red for the important code. What
>> I
>> > was thinking is that, if I call the collect() method, Spark Streaming
>> will
>> > bring the data to the driver and then the db object does not need to be
>> sent
>>
>> The Function you pass to foreachRDD() has a reference to db though.
>> That's what is making it be serialized.
>>
>> > to executors. My observation is that, thought exceptions are thrown, the
>> > insert function still works. Any thought about that? Also paste the log
>> in
>> > case it helps .http://pastebin.com/T1bYvLWB
>>
>> Any executors that run locally might skip the serialization and
>> succeed (?) but I don't think the remote executors can be succeeding.
>>
>
>

Reply via email to