Re: state size in relation to cluster size and processing speed

Aljoscha Krettek Fri, 23 Dec 2016 10:44:24 -0800

Hi,
how are you generating your watermarks? Could it be that they advance
faster when the job is processing more data?


Cheers,
Aljoscha

On Fri, 16 Dec 2016 at 21:01 Seth Wiesman <swies...@mediamath.com> wrote:

> Hi,
>
>
>
> I’ve noticed something peculiar about the relationship between state size
> and cluster size and was wondering if anyone here knows of the reason. I am
> running a job with 1 hour tumbling event time windows which have an allowed
> lateness of 7 days. When I run on a 20-node cluster with FsState I can
> process approximately 1.5 days’ worth of data in an hour with the most
> recent checkpoint being ~20gb.  Now if I run the same job with the same
> configurations on a 40-node cluster I can process 2 days’ worth of data in
> 20 min (expected) but the state size is only ~8gb. Because allowed lateness
> is 7 days no windows should be purged yet and I would expect the larger
> cluster which has processed more data to have a larger state. Is there some
> why a slower running job or a smaller cluster would require more state?
>
>
>
> This is more of a curiosity than an issue. Thanks’ in advance for any
> insights you may have.
>
>
>
> Seth Wiesman
>

Re: state size in relation to cluster size and processing speed

Reply via email to