On Thu, Jul 17, 2014 at 10:39 PM, Yan Fang <yanfang...@gmail.com> wrote:
> Thank you for the help. If I use TD's approache, it works and there is no
> exception. Only drawback is that it will create many connections to the DB,
> which I was trying to avoid.

Connection-like objects aren't data that can be serialized. What would
it mean to share one connection with N workers? that they all connect
back to the driver, and through one DB connection there? this defeats
the purpose of distributed computing. You want multiple DB
connections. You can limit the number of partitions if needed.


> Here is a snapshot of my code. Mark as red for the important code. What I
> was thinking is that, if I call the collect() method, Spark Streaming will
> bring the data to the driver and then the db object does not need to be sent

The Function you pass to foreachRDD() has a reference to db though.
That's what is making it be serialized.

> to executors. My observation is that, thought exceptions are thrown, the
> insert function still works. Any thought about that? Also paste the log in
> case it helps .http://pastebin.com/T1bYvLWB

Any executors that run locally might skip the serialization and
succeed (?) but I don't think the remote executors can be succeeding.

Reply via email to