Hi Patrick,

You can check this thread, probably


https://mail-archives.apache.org/mod_mbox/beam-user/202107.mbox/%3CCABgQGCAZGoZWDW74RHeqtGpthS-DZRP8BFhwmVBNyXaGAkDSOA%40mail.gmail.com%3E

Looks like you are figuring out how to deal with similar issue.

Disclaimer: I just got started with Beam with FlinkRunner, so, can not
claim this approach as "best practice".

Will be following this thread, since I am interested in alternative
approaches of evolving state schemas as well.

Cheers!

Pavel

On Thursday, 15 July 2021, Patrick Lucas <[email protected]> wrote:
> Hello,
> I'm building a streaming data platform using Beam on Dataflow, Pub/Sub,
and Avro for message schemas and serialization.
> My plan is to maintain full schema compatibility for messages on the same
topic, but because Avro requires the writer schema in order to deserialize
and convert between compatible schema versions, the encoded input/output
messages need to use Avro's single object encoding (or an equivalent
mechanism) which includes a schema fingerprint that can be dereferenced in
a schema registry or cache at runtime to deserialize and convert the
payload.
>
> Does Beam have any built-in support for this pattern? PubsubIO, for
example, uses AvroCoder, which appears to have no support for this (though
it may have in the past, based on some Git archaeology), using Avro binary
encoding which does not include the header.
> If not, how do other users handle schema changes in their input data
streams? Do you just avoid them altogether, migrating consumers on each
schema change, or do you solve it manually with the above pattern?
> Tangentially: what about state schema evolution? I've had trouble finding
any documentation about how to tackle this, whether for AvroCoder-coded
state or when using a Beam row schema.
>
> Thanks,
> Patrick Lucas
>

-- 
Best Regards,
Pavel Solomin

Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
<https://www.linkedin.com/in/pavelsolomin>

Reply via email to