Re: Batch writing from Flink streaming job

2018-05-14 Thread Fabian Hueske
Hi, Avro provides schema for data and can be used to serialize individual records in a binary format. It does not compress the data (although this can be put on top) but is more space efficient due to the binary serialization. I think you can implement a Writer for the BucketingSink that writes r

Re: Batch writing from Flink streaming job

2018-05-13 Thread Jörn Franke
If you want to write in batches from a streaming source you always will need some state ie a state database (here a NoSQL database such as a key value store makes sense). Then you can grab the data at certain points in time and convert it to Avro. You need to make sure that the state is logicall