whats the difference between foreachPartition vs mapPartitions for a
Dtstream both works at partition granularity?

One is an operation and another is action but if I call an opeartion
afterwords mapPartitions  also, which one is more efficient and recommeded?

On Tue, Jul 7, 2015 at 12:21 AM, Tathagata Das <t...@databricks.com> wrote:

> Yeah, creating a new producer at the granularity of partitions may not be
> that costly.
>
> On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Use foreachPartition, and allocate whatever the costly resource is once
>> per partition.
>>
>> On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora <shushantaror...@gmail.com
>> > wrote:
>>
>>> I have a requirement to write in kafka queue from a spark streaming
>>> application.
>>>
>>> I am using spark 1.2 streaming. Since different executors in spark are
>>> allocated at each run so instantiating a new kafka producer at each run
>>> seems a costly operation .Is there a way to reuse objects in processing
>>> executors(not in receivers)?
>>>
>>>
>>>
>>
>

Reply via email to