Hi,

I don't think this will immediately degrade performance. State is essentially stored in a HashMap (for the FileStateBackend) or RocksDB (for the RocksDB backend). If these data structures don't degrade with size then your performance also shouldn't degrade.

There are of course some effects that would come into play. For example if the data grows so much that it negatively affects GC this would of course have an effect, but that doesn't come from the data structures as they are.

I hope that helps!

Best,
Aljoscha

On 22.05.20 06:23, Tzu-Li (Gordon) Tai wrote:
Hi Joe,

The main effect this should have is more state to be kept until the windows
can be fired (and state purged).
This would of course increase the time it takes to checkpoint the operator.

I'm not sure if there will be significant runtime per-record impact caused
by how windows are bookkeeped in data structures in the WindowOperator,
maybe Aljoscha (cc'ed) can chime in here for anything.
If it is certain that these windows will never fire (until far into the
future) because the event-timestamps are in the first place corrupted, then
it might make sense to have a way to drop windows based on some criteria.
I'm not sure if that is supported in any way without triggers (since you
mentioned that those windows might not receive any data), again Aljoscha
might be able to provide more info here.

Cheers,
Gordon

On Thu, May 21, 2020 at 7:02 PM Joe Malt <jm...@yelp.com> wrote:

Hi all,

I'm looking into what happens when messages are ingested with timestamps
far into the future (e.g. due to corruption or a wrong clock at the sender).

I'm aware of the effect on watermarking, but another thing I'm concerned
about is the performance impact of the extra windows this will create.

If a Flink operator has many (perhaps hundreds or thousands) of windows
open but not receiving any data (and never firing), will this degrade
performance?

Thanks,
Joe



Reply via email to