What happens if I can't fit data into memory while doing stream-stream join.

kant kodali Fri, 23 Feb 2018 02:46:59 -0800

Hi All,

I am experimenting with Spark 2.3.0 stream-stream join feature to see if I
can leverage it to replace some of our existing services.


Imagine I have 3 worker nodes with *each node* having (16GB RAM and 100GB
SSD). My input dataset which is in Kafka is about 250GB per day. Now I want
to do a stream-stream join across 8 data frames with a watermark set to 24
hours of Tumbling window. so, I need to hold state for 24 hours and then I
can clear all the data.

Questions:

1) What happens if I can't fit data into memory while doing stream-stream
join?
2) What Storage Level should I choose here for near optimal performance?
3) Any other suggestions?

Thanks!

What happens if I can't fit data into memory while doing stream-stream join.

Reply via email to