Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Sofia Georgiakaki Tue, 23 Apr 2013 00:45:52 -0700

Hello,

Sorting is done by the SortingComparator which performs sorting based on the 
value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends 
WritableComparable (lets call it MyCompositeFieldWritableComparable), that will 
store your current key and the part of the value that you want your sorting to 
be based on. As I understand from your description, this writable class will 
have 2 IntWritable fields, e.g
(FieldA, fieldB)


(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to 
override the defaults. Sorting before the reduce phase will be performed based 
on the compareTo() implementation of your custom writable, so you can write it 
in a way that will compare only fieldB. 

Be careful in the way you will implement methods 
MyCompositeFieldWritableComparable.equals() -it will be used to group <key, 
list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo() 
and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the 
MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner 
class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) 
key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg 
http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>________________________________
> From: Vikas Jadhav <[email protected]>
>To: [email protected] 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of 
>VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e 
>sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam     (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Reply via email to