Hi Yi,

one option is to use Avro, where you define one global Avro schema as the
source of truth. Then you add aliases [1] to this schema for each source
where the fields are named differently. You use the same schema to read the
Avro messages from Kafka and Avro automatically converts the data with its
write schema (stored in some schema registry) to your global schema.

If you pay close attention to the requirements of schema evolution of Avro
[2], you could easily add and remove some fields in the different sources
without changing anything programmatically in your Flink ingestion job.

[1] https://avro.apache.org/docs/1.8.1/spec.html#Aliases
[2] https://avro.apache.org/docs/1.8.1/spec.html#Schema+Resolution

On Tue, Jun 2, 2020 at 8:35 AM <uuu...@protonmail.com> wrote:

> Hi all,
>
> I have a user case where I want to merge several upstream data source
> (Kafka topics). The data are essential the same,
> but they have different field names.
>
> I guess I can say my problem is not so much about flink itself. It is
> about how to deserialize data and merge different data effectively with
> flink.
> I can define different schemas and then deserialize data and merge them
> manually. I wonder if there is any dynamical way to do such thing, that is,
> I want to changing field names works like changing pandas dataframe column
> names. I see there is already
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame
> but resorting to pandas implies I need to work with python, which is
> something I prefer not to do.
>
> What is your practice on dynamically changing sources and merging them?
> I'd love to here your opinion.
>
> Bests,
> Yi
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to