Some HOP pipelines questions

Koffman, Noa (Nokia - IL/Kfar Sava) Mon, 17 Jan 2022 00:52:20 -0800

Hi all,
We are trying to move our pipeline from beam and create them in HOP,
We have a few simple working beam pipelines which are running on flink and have 
the following basic flow:


  1.  Read Avro messages from kafka topic into Java objects using KafkaIO.read 
and serializers/deserializers
  2.  Extract some data and create different object
  3.  Write object to kafka topic in Avro format
We are using ‘ApiCurio’ as a schema registry.

So far, we were able to do a simple pipeline in HOP which writes and reads from 
kafka using: “Beam Kafka Produce” and “Beam Kafka Consume”
However, we need some help to figure out:

  1.  How to integrate the same logic and reading/writing AVRO messages
  2.  Is it possible to use ‘ApiCurio’ as a Schema registry, do we need to 
develop a new plugin for that? Is a different schema registry supported (like 
confluent Schema registry)?
  3.  After reading from Kafka transform could be used to process the message 
and extract some of the data before writing it back to kafka?

Another thing we were wondering about is dependency management:
To use the ‘Beam Kafka Consume’ I had to change multiple ‘pom.xml’ files and 
exclude an ‘org.apache.avro’ dependency of an older version (1.7.7) – this is 
brought in form Hadoop dependency.
Before excluding that jars the pipeline failed trying to load the class 
‘org.apache.avro.Conversions’
Is this a bug? Or am I building the project wrong?
Also,  is there a recommended procedure to pack and run the pipeline, only 
keeping the dependencies we need?
For example, if we create a jar, to run on a ‘Flink’ cluster, is there a way to 
only package the required dependencies from our pipeline (not include, spark, 
Hadoop, etc..)

Sorry for the long email
Thanks for your help
Noa

Some HOP pipelines questions

Reply via email to