Hi,
I’m not sure how can we help you here. To my eye, your code looks ok, what you
figured about pushing the keyBy in front of ContinuousFileReader is also valid
and makes sense if you indeed can correctly perform the keyBy based on the
input splits. The problem should be somewhere in your
I'm running a massive file sifting by timestamp DataSteam job from s3.
The basic job is:
FileMonitor -> ContinuousFileReader -> MultipleFileOutputSink
The MultipleFileOutputSink sifts data based on timestamp to date-hour
directories
It's a lot of data, so I'm using high parallelism, but I want