Hi,
how are you generating your watermarks? Could it be that they advance
faster when the job is processing more data?

Cheers,
Aljoscha

On Fri, 16 Dec 2016 at 21:01 Seth Wiesman <swies...@mediamath.com> wrote:

> Hi,
>
>
>
> I’ve noticed something peculiar about the relationship between state size
> and cluster size and was wondering if anyone here knows of the reason. I am
> running a job with 1 hour tumbling event time windows which have an allowed
> lateness of 7 days. When I run on a 20-node cluster with FsState I can
> process approximately 1.5 days’ worth of data in an hour with the most
> recent checkpoint being ~20gb.  Now if I run the same job with the same
> configurations on a 40-node cluster I can process 2 days’ worth of data in
> 20 min (expected) but the state size is only ~8gb. Because allowed lateness
> is 7 days no windows should be purged yet and I would expect the larger
> cluster which has processed more data to have a larger state. Is there some
> why a slower running job or a smaller cluster would require more state?
>
>
>
> This is more of a curiosity than an issue. Thanks’ in advance for any
> insights you may have.
>
>
>
> Seth Wiesman
>

Reply via email to