Hi everyone, I'm still trying to figure out what the best path forward is on my project. While doing some research, I came across the confluent interoperability page [1]. Beam is currently using version 5.3.2 of the confluent libs. They have an end of support on July 19, 2022. It's the last version that supports Avro 1.8 (actually 1.8.1). The 5.4.x confluent libs passes to Avro 1.9.1 and 6.2.x (latest) goes to Avro 1.10.1 (latest being 1.10.2).
What's the plan in this case, where dependencies reach their end of life? Does the Beam plan on staying on the 5.3.2 version even if they are not supported anymore? Thanks, Cristian [1] https://docs.confluent.io/platform/current/installation/versions-interoperability.html On Mon, Oct 18, 2021 at 2:42 PM Brian Hulette <[email protected]> wrote: > Note there was some work done to make Beam work with Avro 1.9 [1] > (presumably this only works with joda time though? certainly the bytebuddy > generated code would). Avro 1.9 compatibility is not verified continuously > though, there was an effort a while ago to test Beam against multiple > versions of Avro [2] but I don't think it went anywhere. There's also some > discussion about API compatibility concerns in [2]. > > Brian > > [1] https://issues.apache.org/jira/browse/BEAM-9144 > [2] > https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E > > On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins <[email protected]> > wrote: > >> > I see a lot of static calls >> >> A lot of these calls bottom out in usage of ServiceLoader >> <http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html>, >> effectively acting as an ergonomic API on top of it, with AutoService >> <https://github.com/google/auto/tree/master/service> used to register >> new handlers. If they're not currently, and there's some extension point >> which would be helpful to you, its quite likely that you could get buy in >> to adding another such extension point. >> >> > after I find a workaround >> >> I think if the issue is that AvroUtils currently uses an old version of >> avro, forking it into your own package and version bumping it might be a >> good place to start for a workaround. Any changes committed to the beam >> repo will take 1-2 months to make it to a non-snapshot build even if you do >> find a long term solution acceptable to all interested parties. >> >> -Daniel >> >> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu <[email protected]> >> wrote: >> >>> I will have a look after I find a workaround as I really need to deliver >>> some things and using Avro 1.8 isn't really an option. >>> >>> But once that's done, I'd love to find ways to make Beam less dependent >>> on Avro 1.8 considering it was released in 2017. >>> >>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <[email protected]> wrote: >>> >>>> Do you know if it's easy to detect which version of Avro is being used? >>>> >>>> On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu < >>>> [email protected]> wrote: >>>> >>>>> If I had to change things, I would: >>>>> >>>>> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE >>>>> or something along those lines). >>>>> 2. RowCoderGenerator around line 159 calls >>>>> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));" >>>>> which eventually gets to SchemaCoderHelpers.coderForFieldType. There, >>>>> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that >>>>> map >>>>> can be augmented (possibly dynamically) with new >>>>> fieldtypes-coder combinations to take care of the new types from #1. >>>>> >>>>> I would also like to ask. Looking through the Beam code, I see a lot >>>>> of static calls. Just wondering why it's done this way. I'm used to >>>>> projects having some form of dependency injection involved and static >>>>> calls >>>>> being frowned upon (lack of mockability, hidden dependencies, tight >>>>> coupling etc). The only reason I can think of is serializability given >>>>> Beam's multi-node processing? >>>>> >>>>> >>>>> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <[email protected]> wrote: >>>>> >>>>>> Is the Schema inference the only reason we can't upgrade Avro, or are >>>>>> there other blockers? Is there any way we can tell at runtime which >>>>>> version >>>>>> of Avro is running? Since we generate the conversion code at runtime with >>>>>> ByteBuddy, we could potentially just generate different conversions >>>>>> depending on the Avro version. >>>>>> >>>>>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Those are fair points. However please consider that there might be >>>>>>> new users who will decide that Beam isn't suitable because of things >>>>>>> like >>>>>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I >>>>>>> started >>>>>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 >>>>>>> now). >>>>>>> >>>>>>> I guess what I'm saying is that there's definitely a non-negligible >>>>>>> cost associated with old 3rd party libs in Beam's code (even if efforts >>>>>>> are >>>>>>> put in to minimize them). >>>>>>> >>>>>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <[email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> To my knowledge and reading through AVRO's Jira[1], it does not >>>>>>>>> support jodatime anymore. >>>>>>>>> >>>>>>>>> It seems everything related to this Avro 1.8 dependency is tricky. >>>>>>>>> If you recall, it also prevents us from upgrading to the latest >>>>>>>>> Confluent >>>>>>>>> libs... for enabling Beam to use protobufs schemas with the schema >>>>>>>>> registry. (I was also the one who brought that issue up, also made an >>>>>>>>> exploratory PR to move AVRO outside of Beam core.) >>>>>>>>> >>>>>>>>> I understand that Beam tries to maintain public APIs stable, but >>>>>>>>> I'd like to put forward two points: >>>>>>>>> 1) Schemas are experimental, hence there shouldn't be any API >>>>>>>>> stability guarantees there. >>>>>>>>> >>>>>>>> >>>>>>>> Unfortunately at this point, they aren't really. As a community >>>>>>>> we've been bad about removing the Experimental label - many core, core >>>>>>>> parts of Beam are still labeled experimental (sources, triggering, >>>>>>>> state, >>>>>>>> timers). Realistically they are no longer experimental. >>>>>>>> >>>>>>>> 2) Maybe this is the perfect opportunity for the Beam community to >>>>>>>>> think about the long term effects of old dependencies within Beam's >>>>>>>>> codebase, and especially how to deal with them. Perhaps >>>>>>>>> starting/maintaining an "experimental" >>>>>>>>> branch/maven-published-artifacts >>>>>>>>> where Beam does not guarantee backwards compatibility (or maintains >>>>>>>>> it for >>>>>>>>> a shorter period of time) is something to think about. >>>>>>>>> >>>>>>>> >>>>>>>> This is why we usually try to prevent third-party libraries from >>>>>>>> being in our public API. However in this case, that's tricky. >>>>>>>> >>>>>>>> The beam community can of course decide to break backwards >>>>>>>> compatibility. However as stated today, it is maintained. The last >>>>>>>> time we >>>>>>>> broke backwards compatibility was when the old Dataflow API was >>>>>>>> transitioned to Beam, and it was very painful. It took multiple years >>>>>>>> to >>>>>>>> get some users onto the Beam API due to the code changes required to >>>>>>>> migrate (and those required code changes weren't terribly invasive). >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335 >>>>>>>>> >>>>>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Does this mean more recent versions of avro aren't backwards >>>>>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since >>>>>>>>>> Beam >>>>>>>>>> maintains backwards compatibility on its public API. >>>>>>>>>> >>>>>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I've created a small demo project to show the issue[1]. >>>>>>>>>>> >>>>>>>>>>> I've looked at the beam code and all the avro tests use avro >>>>>>>>>>> 1.8... which is hardcoded to Joda, hence why all the tests pass. >>>>>>>>>>> Avro >>>>>>>>>>> changed to java.time (as most other recent java projects) after >>>>>>>>>>> 1.8. Is >>>>>>>>>>> there a plan for Beam to do the same? >>>>>>>>>>> >>>>>>>>>>> Does anyone use Avro with java.time instead of joda.time that >>>>>>>>>>> could give me ideas how to make it work? >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Cristian >>>>>>>>>>> >>>>>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> +dev <[email protected]> >>>>>>>>>>>> >>>>>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* >>>>>>>>>>>> the intention is that we generate logic for converting Date >>>>>>>>>>>> to/from Instant >>>>>>>>>>>> when making a getters for a RowWithGetters backed by Avro. >>>>>>>>>>>> >>>>>>>>>>>> Brian >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> A little bit more color on this. >>>>>>>>>>>>> >>>>>>>>>>>>> That field is nested inside a avro record like so (not >>>>>>>>>>>>> syntactically correct): >>>>>>>>>>>>> { >>>>>>>>>>>>> type: record, >>>>>>>>>>>>> fields: [ >>>>>>>>>>>>> { >>>>>>>>>>>>> name: whatever, >>>>>>>>>>>>> type: record { >>>>>>>>>>>>> fields: [ >>>>>>>>>>>>> { >>>>>>>>>>>>> "name": "somedate", >>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"} >>>>>>>>>>>>> } >>>>>>>>>>>>> ] >>>>>>>>>>>>> } >>>>>>>>>>>>> ] >>>>>>>>>>>>> >>>>>>>>>>>>> The outer layer record uses >>>>>>>>>>>>> SchemaCoderHelper.coderForFieldType on all it's fields to create >>>>>>>>>>>>> the coder. >>>>>>>>>>>>> However, that method when called for the inner record field just >>>>>>>>>>>>> returns a >>>>>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into >>>>>>>>>>>>> account >>>>>>>>>>>>> LogicalTypes. >>>>>>>>>>>>> >>>>>>>>>>>>> I think that's where the problem is. If anyone who knows that >>>>>>>>>>>>> code could have a look and let me know their thoughts, I can try >>>>>>>>>>>>> to fix the >>>>>>>>>>>>> issue if we agree that there is one. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have the following field in one of my avro schemas: >>>>>>>>>>>>>> >>>>>>>>>>>>>> { >>>>>>>>>>>>>> "name": "somedate", >>>>>>>>>>>>>> "type: {"type": "int", "logicalType": "date"} >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> This generates a java.time.LocalDate field in the >>>>>>>>>>>>>> corresponding java class (call it Foo). >>>>>>>>>>>>>> >>>>>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that >>>>>>>>>>>>>> field as DATETIME in the Beam schema. because of >>>>>>>>>>>>>> AvroUtils.toFieldType >>>>>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored >>>>>>>>>>>>>> as >>>>>>>>>>>>>> DATETIME). >>>>>>>>>>>>>> >>>>>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to >>>>>>>>>>>>>> use InstantCoder which expects a joda Instant as input not a >>>>>>>>>>>>>> LocalDate. So >>>>>>>>>>>>>> when my PTransform returns Foo objects it crashes with "class >>>>>>>>>>>>>> java.time.LocalDate cannot be cast to class >>>>>>>>>>>>>> org.joda.time.Instant..." when >>>>>>>>>>>>>> trying to encode that field using InstantCoder.encode(). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a workaround for this issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>> Cristian >>>>>>>>>>>>>> >>>>>>>>>>>>>> PS: I did search the mailing list and google, but didn't find >>>>>>>>>>>>>> anything related except a thread on >>>>>>>>>>>>>> AvroCoder.JodaTimestampConversion, >>>>>>>>>>>>>> which I don't think it applies to here. >>>>>>>>>>>>>> >>>>>>>>>>>>>
