Can you use partitioning ( by day ) ? That will make it easier to drop data older than x days outside streaming job.
Sunil Parmar On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <jiangok2...@gmail.com> wrote: > I have a spark structured streaming job which dump data into a parquet > file. To avoid the parquet file grows infinitely, I want to discard 3 month > old data. Does spark streaming supports this? Or I need to stop the > streaming job, trim the parquet file and restart the streaming job? Thanks > for any hints. >