Re: Flink not outputting windows before all data is seen

2020-09-01 Thread David Anderson
Teodor, I've concluded this is a bug, and have reported it: https://issues.apache.org/jira/browse/FLINK-19109 Best regards, David On Sun, Aug 30, 2020 at 3:01 PM Teodor Spæren wrote: > Hey again David! > > I tried your proposed change of setting the paralilism higher. This > worked, but why

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey again David! I tried your proposed change of setting the paralilism higher. This worked, but why does this fix the behavior? I don't understand why this would fix it. The only thing that happens to the query plan is that a "remapping" node is added. Thanks for the fix, and for any

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey David! I tried what you said, but it did not solve the problem. The job still has to wait until the very end before outputting anything. I mentioned in my original email that I had set the parallelism to 1 job wide, but when I reran the task, I added your line. Are there any

Re: Flink not outputting windows before all data is seen

2020-08-29 Thread David Anderson
Teodor, This is happening because of the way that readTextFile works when it is executing in parallel, which is to divide the input file into a bunch of splits, which are consumed in parallel. This is making it so that the watermark isn't able to move forward until much or perhaps all of the file

Flink not outputting windows before all data is seen

2020-08-29 Thread Teodor Spæren
Hey! Second time posting to a mailing lists, lets hope I'm doing this correctly :) My usecase is to take data from the mediawiki dumps and stream it into Flink via the `readTextFile` method. The dumps are TSV files with an event per line, each event have a timestamp and a type. I want to