Hi, 

Without knowing too much about flink serialization, I know that Flinks states 
that it serializes POJOtypes much faster than even the fast Kryo for Java. I 
further know that it supports schema evolution in the same way as avro. 

In our project, we have a star architecture, where one flink job produces 
results into a kafka topic and where we have multiple downstream consumers from 
that kafka topic (Mostly other flink jobs). 
For fast development cycles, we currently use JSON as output format for the 
kafka topic due to easy debugging capabilities and best migration 
possibilities. However, when scaling up, we need to switch to a more efficient 
format. Most often, Avro is mentioned in combination with a schema registry, as 
its much more efficient then JSON where essentially, each message contains the 
schema as well. However, in most benchmarks, avro turns out to be rather slow 
in terms of CPU cycles ( e.g. [ https://github.com/eishay/jvm-serializers/wiki 
| [1] ] ) 

My question(s) now: 
1. Is it reasonable to use flink serializers as message format in Kafka? 
2. Are there any downsides in using flinks serialization result as output 
format to kafka? 
3. Can downstream consumers, written in Java, but not flink components, also 
easily deserialize flink serialized POJOs? Or do they have a dependency to at 
least full flink-core? 
4. Do you have benchmarks comparing flink (de-)serialization performance to 
e.g. kryo and avro? 

The only thing I come up with why I wouldn't use flink serialization is that we 
wouldn't have a schema registry, but in our case, we share all our POJOs in a 
jar which is used by all components, so that is kind of a schema registry 
already and if we only make avro compatible changes, which are also well 
treated by flink, that shouldn't be any limitation compared to like 
avro+registry? 

Best regards 
Theo 

[1] [ https://github.com/eishay/jvm-serializers/wiki | 
https://github.com/eishay/jvm-serializers/wiki ] 

Reply via email to