I will have a look after I find a workaround as I really need to deliver some things and using Avro 1.8 isn't really an option.
But once that's done, I'd love to find ways to make Beam less dependent on Avro 1.8 considering it was released in 2017. On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax <[email protected]> wrote: > Do you know if it's easy to detect which version of Avro is being used? > > On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <[email protected]> > wrote: > >> If I had to change things, I would: >> >> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or >> something along those lines). >> 2. RowCoderGenerator around line 159 calls >> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));" >> which eventually gets to SchemaCoderHelpers.coderForFieldType. There, >> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map >> can be augmented (possibly dynamically) with new >> fieldtypes-coder combinations to take care of the new types from #1. >> >> I would also like to ask. Looking through the Beam code, I see a lot of >> static calls. Just wondering why it's done this way. I'm used to projects >> having some form of dependency injection involved and static calls being >> frowned upon (lack of mockability, hidden dependencies, tight coupling >> etc). The only reason I can think of is serializability given Beam's >> multi-node processing? >> >> >> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax <[email protected]> wrote: >> >>> Is the Schema inference the only reason we can't upgrade Avro, or are >>> there other blockers? Is there any way we can tell at runtime which version >>> of Avro is running? Since we generate the conversion code at runtime with >>> ByteBuddy, we could potentially just generate different conversions >>> depending on the Avro version. >>> >>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu < >>> [email protected]> wrote: >>> >>>> Those are fair points. However please consider that there might be new >>>> users who will decide that Beam isn't suitable because of things like >>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started >>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now). >>>> >>>> I guess what I'm saying is that there's definitely a non-negligible >>>> cost associated with old 3rd party libs in Beam's code (even if efforts are >>>> put in to minimize them). >>>> >>>> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu < >>>>> [email protected]> wrote: >>>>> >>>>>> To my knowledge and reading through AVRO's Jira[1], it does not >>>>>> support jodatime anymore. >>>>>> >>>>>> It seems everything related to this Avro 1.8 dependency is tricky. If >>>>>> you recall, it also prevents us from upgrading to the latest Confluent >>>>>> libs... for enabling Beam to use protobufs schemas with the schema >>>>>> registry. (I was also the one who brought that issue up, also made an >>>>>> exploratory PR to move AVRO outside of Beam core.) >>>>>> >>>>>> I understand that Beam tries to maintain public APIs stable, but I'd >>>>>> like to put forward two points: >>>>>> 1) Schemas are experimental, hence there shouldn't be any API >>>>>> stability guarantees there. >>>>>> >>>>> >>>>> Unfortunately at this point, they aren't really. As a community we've >>>>> been bad about removing the Experimental label - many core, core parts of >>>>> Beam are still labeled experimental (sources, triggering, state, timers). >>>>> Realistically they are no longer experimental. >>>>> >>>>> 2) Maybe this is the perfect opportunity for the Beam community to >>>>>> think about the long term effects of old dependencies within Beam's >>>>>> codebase, and especially how to deal with them. Perhaps >>>>>> starting/maintaining an "experimental" branch/maven-published-artifacts >>>>>> where Beam does not guarantee backwards compatibility (or maintains it >>>>>> for >>>>>> a shorter period of time) is something to think about. >>>>>> >>>>> >>>>> This is why we usually try to prevent third-party libraries from being >>>>> in our public API. However in this case, that's tricky. >>>>> >>>>> The beam community can of course decide to break backwards >>>>> compatibility. However as stated today, it is maintained. The last time we >>>>> broke backwards compatibility was when the old Dataflow API was >>>>> transitioned to Beam, and it was very painful. It took multiple years to >>>>> get some users onto the Beam API due to the code changes required to >>>>> migrate (and those required code changes weren't terribly invasive). >>>>> >>>>> >>>>>> >>>>>> [1] https://issues.apache.org/jira/browse/AVRO-2335 >>>>>> >>>>>> On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax <[email protected]> wrote: >>>>>> >>>>>>> Does this mean more recent versions of avro aren't backwards >>>>>>> compatible with avro 1.8? If so, this might be tricky to fix, since Beam >>>>>>> maintains backwards compatibility on its public API. >>>>>>> >>>>>>> On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I've created a small demo project to show the issue[1]. >>>>>>>> >>>>>>>> I've looked at the beam code and all the avro tests use avro 1.8... >>>>>>>> which is hardcoded to Joda, hence why all the tests pass. Avro changed >>>>>>>> to >>>>>>>> java.time (as most other recent java projects) after 1.8. Is there a >>>>>>>> plan >>>>>>>> for Beam to do the same? >>>>>>>> >>>>>>>> Does anyone use Avro with java.time instead of joda.time that could >>>>>>>> give me ideas how to make it work? >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Cristian >>>>>>>> >>>>>>>> [1] https://github.com/zeidoo/beamjavadates-poc >>>>>>>> >>>>>>>> On Thu, Oct 14, 2021 at 5:35 PM Brian Hulette <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +dev <[email protected]> >>>>>>>>> >>>>>>>>> This sounds like a bug in the Avro schema mapping. I *think* the >>>>>>>>> intention is that we generate logic for converting Date to/from >>>>>>>>> Instant >>>>>>>>> when making a getters for a RowWithGetters backed by Avro. >>>>>>>>> >>>>>>>>> Brian >>>>>>>>> >>>>>>>>> On Thu, Oct 14, 2021 at 4:43 AM Cristian Constantinescu < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> A little bit more color on this. >>>>>>>>>> >>>>>>>>>> That field is nested inside a avro record like so (not >>>>>>>>>> syntactically correct): >>>>>>>>>> { >>>>>>>>>> type: record, >>>>>>>>>> fields: [ >>>>>>>>>> { >>>>>>>>>> name: whatever, >>>>>>>>>> type: record { >>>>>>>>>> fields: [ >>>>>>>>>> { >>>>>>>>>> "name": "somedate", >>>>>>>>>> "type: {"type": "int", "logicalType": "date"} >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> >>>>>>>>>> The outer layer record uses SchemaCoderHelper.coderForFieldType >>>>>>>>>> on all it's fields to create the coder. However, that method when >>>>>>>>>> called >>>>>>>>>> for the inner record field just returns a >>>>>>>>>> SchemaCoder.of(fieldType.getRowSchema()) which doesn't take into >>>>>>>>>> account >>>>>>>>>> LogicalTypes. >>>>>>>>>> >>>>>>>>>> I think that's where the problem is. If anyone who knows that >>>>>>>>>> code could have a look and let me know their thoughts, I can try to >>>>>>>>>> fix the >>>>>>>>>> issue if we agree that there is one. >>>>>>>>>> >>>>>>>>>> On Thu, Oct 14, 2021 at 7:12 AM Cristian Constantinescu < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have the following field in one of my avro schemas: >>>>>>>>>>> >>>>>>>>>>> { >>>>>>>>>>> "name": "somedate", >>>>>>>>>>> "type: {"type": "int", "logicalType": "date"} >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> This generates a java.time.LocalDate field in the corresponding >>>>>>>>>>> java class (call it Foo). >>>>>>>>>>> >>>>>>>>>>> AvroUtils.toBeamSchema(FooClass.getSchema()) will return that >>>>>>>>>>> field as DATETIME in the Beam schema. because of >>>>>>>>>>> AvroUtils.toFieldType >>>>>>>>>>> (around line 275, where TimestampMillis and Date are both stored as >>>>>>>>>>> DATETIME). >>>>>>>>>>> >>>>>>>>>>> My problem is that in SchemaCoderHelpers, DATETIME is set to use >>>>>>>>>>> InstantCoder which expects a joda Instant as input not a LocalDate. >>>>>>>>>>> So when >>>>>>>>>>> my PTransform returns Foo objects it crashes with "class >>>>>>>>>>> java.time.LocalDate cannot be cast to class >>>>>>>>>>> org.joda.time.Instant..." when >>>>>>>>>>> trying to encode that field using InstantCoder.encode(). >>>>>>>>>>> >>>>>>>>>>> Is there a workaround for this issue? >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Cristian >>>>>>>>>>> >>>>>>>>>>> PS: I did search the mailing list and google, but didn't find >>>>>>>>>>> anything related except a thread on >>>>>>>>>>> AvroCoder.JodaTimestampConversion, >>>>>>>>>>> which I don't think it applies to here. >>>>>>>>>>> >>>>>>>>>>
