Thanks for the information, Harsh. Further comments inline below: On Thu, Mar 28, 2013 at 4:01 AM, Harsh J <[email protected]> wrote:
> On Thu, Mar 28, 2013 at 5:15 AM, Jeremy Kahn <[email protected]> wrote: > > I can read "ordered lexicographically by field" in two ways: > > > > 1. the names of the fields are sorted lexicographically, and the field > that > > goes lexicographically first (not marked as "order":"ignore") dominates. > > > > 2. the records are sorted by the sort order of each field, with the first > > fields (not marked "order": "ignore") taking sort priority. > > The second one is correct. The field's order in the defined schema is > not changed but only walked through. > > [...] that's true from my use of it in Hadoop MR as well. > Okay, this is very helpful to know: it's working the way I had hoped. > > Behavior (2) -- relative to behavior (1) -- offers the ability to adjust > the > > order of the schema to express a different sort order, but might present > > problems for schema negotiation. > > What kind of problems are you describing here? Sorry if I'm not > getting it by the words "schema negotiation" alone. > Suppose I sort a sequence of ZooInventory objects by the sort order implied by this schema, and I send them to you in sorted order over a protocol with an IDL type specification of array<ZooInventory>. You *read* the sequence with a different ZooInventory schema with the same fields, but which contains a different ordering. The objects in the array will not (necessarily) appear to be sorted *to you*. This isn't necessarily a problem -- it might actually be a feature. It is worth noting that two schemas may be compatible under schema negotiation but have different sort order for reader and writer. --jeremy
