Re: Flink 1.9.1 KafkaConnector missing data (1M+ records)

2019-12-12 Thread Kostas Kloudas
Hi Harrison, Really sorry for the late reply. Do you have any insight on whether the missing records were read by the consumer and just the StreamingFileSink failed to write their offsets, or the Kafka consumer did not even read them or dropped them for some reason? I asking this in order to

Re: Flink 1.9.1 KafkaConnector missing data (1M+ records)

2019-12-02 Thread Harrison Xu
Thank you for your reply, Some clarification: We have configured the BucketAssigner to use the *Kafka record timestamp*. Exact bucketing behavior as follows: private static final DateTimeFormatter formatter = DateTimeFormatter .ofPattern("-MM-dd'T'HH"); @Override public String

Re: Flink 1.9.1 KafkaConnector missing data (1M+ records)

2019-11-28 Thread Kostas Kloudas
Hi Harrison, One thing to keep in mind is that Flink will only write files if there is data to write. If, for example, your partition is not active for a period of time, then no files will be written. Could this explain the fact that dt 2019-11-24T01 and 2019-11-24T02 are entirely skipped? In

Flink 1.9.1 KafkaConnector missing data (1M+ records)

2019-11-25 Thread Harrison Xu
Hello, We're seeing some strange behavior with flink's KafkaConnector010 (Kafka 0.10.1.1) arbitrarily skipping data. *Context* KafkaConnector010 is used as source, and StreamingFileSink/BulkPartWriter (S3) as sink with no intermediate operators. Recently, we noticed that millions of Kafka