Hello,

I’m using Spark streaming to aggregate data from a Kafka topic in sliding 
windows.  Usually we want to persist this aggregated data to a MongoDB cluster, 
or republish to a different Kafka topic.  When I include these 3rd party 
drivers, I usually get a NotSerializableException due to the parallelization of 
the job.  To side step this, I’ve used static class variables which seem to 
help, e.g., I can run my jobs.  

Is this the proper way to provide 3rd party libs to Spark jobs?  
Does having these drivers declared as static prohibit me from parallelizing my 
job?  
Is this even a proper way to design jobs?  

An alternative (I assume) would be to aggregate my data into HDFS and have 
another process (perhaps non-Spark?) consume it and republish/persist?

Thanks,
Matt
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to