Hi all, We are trying to move our pipeline from beam and create them in HOP, We have a few simple working beam pipelines which are running on flink and have the following basic flow:
1. Read Avro messages from kafka topic into Java objects using KafkaIO.read and serializers/deserializers 2. Extract some data and create different object 3. Write object to kafka topic in Avro format We are using ‘ApiCurio’ as a schema registry. So far, we were able to do a simple pipeline in HOP which writes and reads from kafka using: “Beam Kafka Produce” and “Beam Kafka Consume” However, we need some help to figure out: 1. How to integrate the same logic and reading/writing AVRO messages 2. Is it possible to use ‘ApiCurio’ as a Schema registry, do we need to develop a new plugin for that? Is a different schema registry supported (like confluent Schema registry)? 3. After reading from Kafka transform could be used to process the message and extract some of the data before writing it back to kafka? Another thing we were wondering about is dependency management: To use the ‘Beam Kafka Consume’ I had to change multiple ‘pom.xml’ files and exclude an ‘org.apache.avro’ dependency of an older version (1.7.7) – this is brought in form Hadoop dependency. Before excluding that jars the pipeline failed trying to load the class ‘org.apache.avro.Conversions’ Is this a bug? Or am I building the project wrong? Also, is there a recommended procedure to pack and run the pipeline, only keeping the dependencies we need? For example, if we create a jar, to run on a ‘Flink’ cluster, is there a way to only package the required dependencies from our pipeline (not include, spark, Hadoop, etc..) Sorry for the long email Thanks for your help Noa
