Maybe im mistaken about what is meant by map-only. Does a map-only job still result in standard shuffle-sort ? Or does that get cut short?
hmmm i think I see what you mean, i guess a map-only sort is possible as long as you use a custom partitioner and you let the shuffle/sort run to completion. i think the shuffle/sort, if you use a partitioner that partitions the sorting in order (i.e. part-0 is all lines starting with "a", part-1 is all starting with "b", etc...), does still run inspite of the fact that your not running reducers. On Fri, Feb 15, 2013 at 3:09 PM, Michael Segel <[email protected]>wrote: > Why do you need a 1TB block? > > On Feb 15, 2013, at 1:29 PM, Jay Vyas <[email protected]> wrote: > > well.. ok... i guess you could have a 1TB block do an in place sort on the > file, write it to a tmp directory, and then spill the records in order or > something. at that point might as well not use hadoop. > > > Michael Segel <[email protected]> | (m) 312.755.9623**** > > Segel and Associates**** > > -- Jay Vyas http://jayunit100.blogspot.com
