whats the difference between foreachPartition vs mapPartitions for a Dtstream both works at partition granularity?
One is an operation and another is action but if I call an opeartion afterwords mapPartitions also, which one is more efficient and recommeded? On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <[email protected]> wrote: > Yeah, creating a new producer at the granularity of partitions may not be > that costly. > > On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <[email protected]> wrote: > >> Use foreachPartition, and allocate whatever the costly resource is once >> per partition. >> >> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <[email protected] >> > wrote: >> >>> I have a requirement to write in kafka queue from a spark streaming >>> application. >>> >>> I am using spark 1.2 streaming. Since different executors in spark are >>> allocated at each run so instantiating a new kafka producer at each run >>> seems a costly operation .Is there a way to reuse objects in processing >>> executors(not in receivers)? >>> >>> >>> >> >
