Re: Sorting huge text files in Hadoop

Sandy Ryza Fri, 15 Feb 2013 13:07:42 -0800

A map-only job does not result in the standard shuffle-sort.  Map outputs
are written directly to HDFS.


-Sandy

On Fri, Feb 15, 2013 at 12:23 PM, Jay Vyas <[email protected]> wrote:

> Maybe im mistaken about what is meant by map-only.  Does a map-only job
> still result in standard shuffle-sort ?  Or does that get cut short?
>
> hmmm i think I see what you mean, i guess a map-only sort is possible as
> long as you use a custom partitioner and you let the shuffle/sort run to
> completion.
>
> i think the shuffle/sort, if you use a partitioner that partitions the
> sorting in order (i.e. part-0 is all lines starting with "a", part-1 is all
> starting with "b", etc...),
> does still run inspite of the fact that your not running reducers.
>
>
>
>
> On Fri, Feb 15, 2013 at 3:09 PM, Michael Segel 
> <[email protected]>wrote:
>
>> Why do you need a 1TB block?
>>
>> On Feb 15, 2013, at 1:29 PM, Jay Vyas <[email protected]> wrote:
>>
>> well.. ok... i guess you could have a 1TB block do an in place sort on
>> the file, write it to a tmp directory, and then spill the records in order
>> or something.  at that point might as well not use hadoop.
>>
>>
>> Michael Segel  <[email protected]> | (m) 312.755.9623****
>>
>> Segel and Associates****
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: Sorting huge text files in Hadoop

Reply via email to