Ordering in collections

Bryan Duxbury Tue, 09 Sep 2008 10:21:32 -0700

We use Thrift structures within Hadoop Map/Reduce. Occasionally, aThrift object will be our grouping or join key. Usually, this worksgreat, but occasionally, there are some issues. In particular, wehave trouble with maps and sets. The problem is that the ordering ofthe map/set internally is arbitrary, and we serialize in thatarbitrary order. The result is that two 'equal' objects might notserialize into the same byte array, and therefore fail equalitychecks based only on the serialized data.

I was wondering if it would make sense to enforce some sort ofordering scheme for collections where order might be arbitrary, atleast during serialization. This would necessitate implementing adecent compareTo on generated Thrift structs so we could sort beforewriting, and obviously, it would include sorting overhead.


Are other people interested in making this use case work acceptably?

-Bryan

Ordering in collections

Reply via email to