And if Marcelo's guess is correct, then the right way to do this would be
to lazily  / dynamically create the jdbc connection server as a singleton
in the workers/executors and use that. Something like this.


dstream.foreachRDD(rdd => {
   rdd.foreachPartition((iterator: Iterator[...]) => {
       val driver = JDBCDriver.getSingleton()   // this will create the
single jdbc server in the worker, if it does not exist
       // loop through iterator to get the records in the partition and use
the driver to push them out to the DB
   }
}

This will avoid the JDBC server being serialized as part of the closure /
DStream checkpoint.

TD


On Thu, Jul 17, 2014 at 1:42 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Could you share some code (or pseudo-code)?
>
> Sounds like you're instantiating the JDBC connection in the driver,
> and using it inside a closure that would be run in a remote executor.
> That means that the connection object would need to be serializable.
> If that sounds like what you're doing, it won't work.
>
>
> On Thu, Jul 17, 2014 at 1:37 PM, Yan Fang <yanfang...@gmail.com> wrote:
> > Hi guys,
> >
> > need some help in this problem. In our use case, we need to continuously
> > insert values into the database. So our approach is to create the jdbc
> > object in the main method and then do the inserting operation in the
> DStream
> > foreachRDD operation. Is this approach reasonable?
> >
> > Then the problem comes: since we are using com.mysql.jdbc.java, which is
> > unserializable, we keep seeing the notSerializableException. I think
> that is
> > because Spark Streaming is trying to serialize and then checkpoint the
> whole
> > class which contains the StreamingContext, not only the StreamingContext
> > object, right? Or other reason to trigger the serialize operation? Any
> > workaround for this? (except not using the com.mysql.jdbc.java)
> >
> > Thank you.
> >
> > Cheers,
> > Fang, Yan
> > yanfang...@gmail.com
> > +1 (206) 849-4108
>
>
>
> --
> Marcelo
>

Reply via email to