Thanks a lot Ryan !

That's exactly that I was looking for.

I have not yet gone through the SchemaCompatibility APIs and now discover
that they're very powerful and able to bring back a list of
incompatibilities as well. Great !

Kind regards,

Laurent

Le mar. 5 janv. 2021 à 18:17, Ryan Skraba <[email protected]> a écrit :

> Hello!
>
> As you noticed, the validate method deliberately ignores the actual schema
> of a record datum, and validates the field values by position.  It's
> answering a slightly different question --> whether the datum (and it's
> contents) could fit in the given schema.
>
> For your use case, you might want to use the rules for schema
> compatibility:
>
> SchemaCompatibility.SchemaPairCompatibility compatibility =
>    SchemaCompatibility.checkReaderWriterCompatibility(userv1.getSchema(),
> v2Schema);
> assertThat(compatibility.getType(),
> is(SchemaCompatibility.SchemaCompatibilityType.INCOMPATIBLE));
>
> In your test, the built-in Avro schema resolution can't be used to convert
> the userv1 datum to the v2Schema, so it reports INCOMPATIBLE.
>
> If the V2 change were non-breaking (like adding a field with a default),
> then the schemas would still be reported COMPATIBLE with that method.
>
> Of course, if you just want to enforce that incoming records are strictly
> and only the reference schema, you could simply check the two for equality:
>
> user.getSchema().equals(v2Schema)
>
> Is this what you're looking for?  I'm not familiar enough with records
> produced using the Confluent Schema Registry!  I'm surprised this isn't
> available in Kafka message metadata, you might want to check into their
> implementation.
>
> All my best, Ryan
>
>
>
> On Tue, Jan 5, 2021 at 2:44 PM laurent broudoux <
> [email protected]> wrote:
>
>> Hello,
>>
>> I need to validate that a GenericRecord (read from a Kafka Topic) is
>> valid regarding an Avro Schema. This reference schema
>> is not necessarily the one used for Kafka message deserialization as this
>> one was acquired through a Schema Registry.
>>
>> I had a look at GenericData.get().validate(schema, datum) but it does not
>> behave as expected because it does not seem
>> to validate record field names but only positions.
>>
>> Here's below a test case that represents the weird behaviour I am
>> observing. I have used Avro 1.10.0 and 1.10.1 and both
>> versions behave the same:
>>
>> @Test
>> public void testGenericDataValidate() {
>>    Schema v1Schema = SchemaBuilder.record("User").fields()
>>          .requiredString("name")
>>          .requiredInt("age")
>>          .endRecord();
>>    Schema v2Schema = SchemaBuilder.record("User").fields()
>>          .requiredString("fullName")
>>          .requiredInt("age")
>>          .endRecord();
>>
>>    GenericRecord userv1 = new GenericData.Record(v1Schema);
>>    userv1.put("name", "Laurent");
>>    userv1.put("age", 42);
>>
>>    // The validate method succeeds because it does not validate the field
>> name just the position... So the test fails.
>>    assertFalse(GenericData.get().validate(v2Schema, userv1));
>> }
>>
>> This test corresponds to a real life scenario I want to detect : Kafka
>> producer is still sending messages using the v1 schema but
>> we expect records following v2 schema that introduced breaking change
>> (field rename).
>>
>> Is it a known / desired limitation of the validate() method of
>> GenericData ? Is there another way of achieving what I want to check ?
>>
>> Thanks!
>>
>>
>>
>>

Reply via email to