Re: Sorting huge text files in Hadoop

Jay Vyas Fri, 15 Feb 2013 12:24:24 -0800

Maybe im mistaken about what is meant by map-only.  Does a map-only job
still result in standard shuffle-sort ?  Or does that get cut short?

hmmm i think I see what you mean, i guess a map-only sort is possible as
long as you use a custom partitioner and you let the shuffle/sort run to
completion.

i think the shuffle/sort, if you use a partitioner that partitions the
sorting in order (i.e. part-0 is all lines starting with "a", part-1 is all
starting with "b", etc...),
does still run inspite of the fact that your not running reducers.

On Fri, Feb 15, 2013 at 3:09 PM, Michael Segel <[email protected]>wrote:

> Why do you need a 1TB block?
>
> On Feb 15, 2013, at 1:29 PM, Jay Vyas <[email protected]> wrote:
>
> well.. ok... i guess you could have a 1TB block do an in place sort on the
> file, write it to a tmp directory, and then spill the records in order or
> something.  at that point might as well not use hadoop.
>
>
> Michael Segel  <[email protected]> | (m) 312.755.9623****
>
> Segel and Associates****
>
>

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Sorting huge text files in Hadoop

Reply via email to