It is already partitioned by timestamp. But is it right retention policy
process to stop the streaming job, trim the parquet file and restart the
streaming job? Thanks.
On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar
wrote:
> Can you use partitioning ( by day ) ? That will
Can you use partitioning ( by day ) ? That will make it easier to drop
data older than x days outside streaming job.
Sunil Parmar
On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang wrote:
> I have a spark structured streaming job which dump data into a parquet
> file. To
I have a spark structured streaming job which dump data into a parquet
file. To avoid the parquet file grows infinitely, I want to discard 3 month
old data. Does spark streaming supports this? Or I need to stop the
streaming job, trim the parquet file and restart the streaming job? Thanks
for any