Hi,

I’ve noticed something peculiar about the relationship between state size and 
cluster size and was wondering if anyone here knows of the reason. I am running 
a job with 1 hour tumbling event time windows which have an allowed lateness of 
7 days. When I run on a 20-node cluster with FsState I can process 
approximately 1.5 days’ worth of data in an hour with the most recent 
checkpoint being ~20gb.  Now if I run the same job with the same configurations 
on a 40-node cluster I can process 2 days’ worth of data in 20 min (expected) 
but the state size is only ~8gb. Because allowed lateness is 7 days no windows 
should be purged yet and I would expect the larger cluster which has processed 
more data to have a larger state. Is there some why a slower running job or a 
smaller cluster would require more state?

This is more of a curiosity than an issue. Thanks’ in advance for any insights 
you may have.

Seth Wiesman

Reply via email to