Hi, I’ve noticed something peculiar about the relationship between state size and cluster size and was wondering if anyone here knows of the reason. I am running a job with 1 hour tumbling event time windows which have an allowed lateness of 7 days. When I run on a 20-node cluster with FsState I can process approximately 1.5 days’ worth of data in an hour with the most recent checkpoint being ~20gb. Now if I run the same job with the same configurations on a 40-node cluster I can process 2 days’ worth of data in 20 min (expected) but the state size is only ~8gb. Because allowed lateness is 7 days no windows should be purged yet and I would expect the larger cluster which has processed more data to have a larger state. Is there some why a slower running job or a smaller cluster would require more state?
This is more of a curiosity than an issue. Thanks’ in advance for any insights you may have. Seth Wiesman