This does look like a bug in GenericData.Record#equals(). It should
return false when the schemas are not equal. It currently only checks
the schema names as a performance optimization, but that optimization is
not a good one. Can you please file a bug report in Jira?
Thanks,
Doug
On 02/10/2012 04:26 AM, Andrew Kenworthy wrote:
> Hallo Doug,
>
> Thank you for your feedback. I agree that implicitly using Order.IGNORE
> to ignore differences in records makes sense, as that is the criteria
> used to define distinction when sorting. But it looks as though only the
> schema name is checked when deciding whether to examine each field or
> not. This can, as the test below shows, result in a lack of symmetry
> when using equals if one is not careful (i.e. the example is a "bad" one
> as it's not a good idea to have two schemas with the same name and
> namespace yet with different contents, but shows how one might
> inadvertently make a wrong assumption about equality):-
>
> @Test
> public void test() {
> Schema schema1 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields1 = new ArrayList<Field>();
> fields1.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.IGNORE));
> schema1.setFields(fields1);
> Schema schema2 = Schema.createRecord("test_record", null,
> "my.namespace", false);
> List<Field> fields2 = new ArrayList<Field>();
> fields2.add(new Field("attribute1", Schema.create(Schema.Type.STRING),
> null, null, Order.ASCENDING));
> schema2.setFields(fields2);
> GenericRecord record1 = new GenericData.Record(schema1);
> record1.put("attribute1", "1");
> GenericRecord record2 = new GenericData.Record(schema2);
> record2.put("attribute1", "2");
> System.out.println(record1.equals(record2)); // returns TRUE
> System.out.println(record2.equals(record1)); // returns FALSE
> }
>
> Andrew
>
> ------------------------------------------------------------------------
> *From:* Doug Cutting <[email protected]>
> *To:* [email protected]
> *Sent:* Thursday, February 9, 2012 8:49 PM
> *Subject:* Re: Does Avro GenericData.Record violate the .equals
> contract?
>
> On 02/09/2012 07:02 AM, Andrew Kenworthy wrote:
> > This means that if I have no sorting defined in my schema, that all
> > records are treated as being equal to one another.
>
> If you specify "order":"ignore" for all fields in a record, then, yes,
> all instances of that record would be equal. I cannot imagine a case
> where this would be useful, but I also don't see how this would violate
> the equals() contract.
>
> The default for fields is to behave as if "order":"ascending" is
> specified. Records are equal if all of their fields that are not
> specified as "order":"ignore" are equal.
>
> Doug
>
>