Hi David, I take it you're referring to the ordering of the two Collections returned within the value Pair of a cogroup result?
As you probably know, there isn't any kind of guaranteed ordering on these collections, although I would expect that given the same input and cluster layout, it's perfectly possible that you would get the same iteration order on the results each time. However, there are also probably quite a few underlying factors which could change the iteration order on these Collections; for example, just having a different number of partitions used by the reducers, or different settings which would influence when spills are done during the shuffle phase (assuming we're talking about MR-based Crunch here) could influence the iteration order of the collections. Note that these are things that impact overall ordering of output in MapReduce itself, and nothing specific to Crunch. - Gabriel On Wed, Aug 3, 2016 at 4:40 PM, David Ortiz <[email protected]> wrote: > Hey everyone, > > Just curious based on something I'm seeing as we move a job around > between different ec2 cluster types. Does the underlying architecture of > the system have an effect on the sort order in a cogroup? It's looking like > moving from the cc2 architecture we were using to an m4 based system, that > our job output changes. The changes I am seeing line up with the order in > which the iterator returns records being different, so was curious. > > Thanks, > Dave
