Hi,

Reading through the Spark Streaming Programming Guide, I read in the
"Design Patterns for using foreachRDD":

"Finally, this can be further optimized by reusing connection objects
across multiple RDDs/batches.
One can maintain a static pool of connection objects than can be reused as
RDDs of multiple batches are pushed to the external system"

I have this connection pool that might be more or less heavy to
instantiate. I don't use it as part of a foreachRDD but as part of regular
map operations to query some api service. I'd like to understand what
"multiple batches" means here. Is this across RDDs on a single DStream?
Across multiple DStreams?

I'd like to understand what's the context sharability across DStreams over
time. Is it expected that the executor initializing my Factory will keep
getting batches from my streaming job while using the same singleton
connection pool over and over? Or Spark resets executors states after each
DStream is completed to allocated executors to other streaming job
potentially?

Thanks,

Reply via email to