whats the difference between foreachPartition vs mapPartitions for a Dtstream both works at partition granularity?
One is an operation and another is action but if I call an opeartion afterwords mapPartitions also, which one is more efficient and recommeded? On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <t...@databricks.com> wrote: > Yeah, creating a new producer at the granularity of partitions may not be > that costly. > > On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <c...@koeninger.org> wrote: > >> Use foreachPartition, and allocate whatever the costly resource is once >> per partition. >> >> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <shushantaror...@gmail.com >> > wrote: >> >>> I have a requirement to write in kafka queue from a spark streaming >>> application. >>> >>> I am using spark 1.2 streaming. Since different executors in spark are >>> allocated at each run so instantiating a new kafka producer at each run >>> seems a costly operation .Is there a way to reuse objects in processing >>> executors(not in receivers)? >>> >>> >>> >> >