*Environment:* Spark 2.2.0 *Kafka:* 0.10.0 *Language:* Java *UseCase:* Streaming data from Kafka using JavaDStreams and storing into a downstream database.
*Issue:* I have a use case, where in I have to launch a thread in the background that would connect to a DB and Cache the retrieved resultset for every 30 mins. This component individually is working fine. When integrated with Spark Streaming, my tasks use the data from this MySQL thread because of which I have to run this thread on all the worker nodes rather than the driver nodes before the actual data processing starts. Something like a setup() below should be run on all the worker nodes, public static void setup() { try { new Util(FSFactory.getFSHandle(null)); *--> This class has the implementation for the thread* } catch (IOException e) { logger.error("IOException: error to create Util"); } } *What did I try:* I tried passing the above method as a broadcast variable in spark. But, from my understanding, broadcast variable is only a read-only value. So, as I am running a threaded program in the background for the broadcasted variable, I didn't see anything related to my code in the logs and the thread did not run. I have some knowledge on other streaming frameworks, where I can setup any dependencies in setup() and close the dependencies in the terminate() for every container. Is there something like that in Spark? Am I missing any concept here? I googled around, looked on SO but couldn't find anything useful. Any help would be grateful. Thanks, Ravi -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org