Hi,

looks like you have answered some questions whcih I generally ask. Another
thing, can you please let me know the environment? Is it AWS, GCP, Azure,
Databricks, HDP, etc?

Regards,
Gourav

On Sun, Apr 11, 2021 at 8:39 AM András Kolbert <kolbertand...@gmail.com>
wrote:

> Hi,
>
> Sure!
>
> Application:
> - Spark version 2.4
> - Kafka Stream (DStream, from a kafka 0.8 brokers)
> - 7 executors, 2cores, 3700M memory size
>
> Logic:
> - Process initialises a dataframe that contains metrics for an
> account/product metrics (e.g. {"account":A, "product": X123, "metric"; 51}
> - After initialisation, the dataframe is persisted on HDFS (dataframe is
> around 1GB total size in memory)
> - Streaming:
> - each bach, processes incoming data, unions the main dataframe with the
> new account/product/metric interaction dataframe, aggregates the total, and
> then persist on HDFS again (each batch we save the total dataframe again)
> - The screenshot I sent earlier, was after this aggregation, and how all
> the data seems to be ended up on the same executor. That could explain why
> the executor periodically dies with OOM.
>
> Mich, I hope this provides extra information :)
>
> Thanks
> Andras
>
>
>
>
>
>
> On Sat, 10 Apr 2021 at 16:42, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can you provide a bit more info please?
>>
>> How are you running this job and what is the streaming framework (kafka,
>> files etc)?
>>
>> HTH
>>
>>
>> Mich
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 10 Apr 2021 at 14:28, András Kolbert <kolbertand...@gmail.com>
>> wrote:
>>
>>> hi,
>>>
>>> I have a streaming job and quite often executors die (due to memory
>>> errors/ "unable to find location for shuffle etc) during the processing. I
>>> started digging and found that some of the tasks are concentrated to one
>>> executor, just as below:
>>> [image: image.png]
>>>
>>> Can this be the reason?
>>> Should I repartition the underlying data before I execute a groupby on
>>> the top of it?
>>>
>>> Any advice is welcome
>>>
>>> Thanks
>>> Andras
>>>
>>

Reply via email to