Re: Structured Streaming Dataframe Size

2019-08-29 Thread Tathagata Das
Responses inline. On Wed, Aug 28, 2019 at 8:42 AM Nick Dawes wrote: > Thank you, TD. Couple of follow up questions please. > > 1) "It only keeps around the minimal intermediate state data" > > How do you define "minimal" here? Is there a configuration property to > control the time or size of St

Re: Structured Streaming Dataframe Size

2019-08-28 Thread Nick Dawes
Thank you, TD. Couple of follow up questions please. 1) "It only keeps around the minimal intermediate state data" How do you define "minimal" here? Is there a configuration property to control the time or size of Streaming Dataframe? 2) I'm not writing anything out to any database or S3. My req

Re: Structured Streaming Dataframe Size

2019-08-27 Thread Tathagata Das
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts *Note that Structured Streaming does not materialize the entire table*. It > reads the latest available data from the streaming data source, processes > it incrementally to update the result, and then d

Structured Streaming Dataframe Size

2019-08-27 Thread Nick Dawes
I have a quick newbie question. Spark Structured Streaming creates an unbounded dataframe that keeps appending rows to it. So what's the max size of data it can hold? What if the size becomes bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So will it write temp data on S3 or