Hi, Reading through the Spark Streaming Programming Guide, I read in the "Design Patterns for using foreachRDD":
"Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of multiple batches are pushed to the external system" I have this connection pool that might be more or less heavy to instantiate. I don't use it as part of a foreachRDD but as part of regular map operations to query some api service. I'd like to understand what "multiple batches" means here. Is this across RDDs on a single DStream? Across multiple DStreams? I'd like to understand what's the context sharability across DStreams over time. Is it expected that the executor initializing my Factory will keep getting batches from my streaming job while using the same singleton connection pool over and over? Or Spark resets executors states after each DStream is completed to allocated executors to other streaming job potentially? Thanks,