Also, another advantage in trying to make use of the shuffle/sort is that your sorted list can grow beyond the size of memory. A risk in trying to pack this data into a sorted ArrayWritable is that the list would grow too large to fit in memory.
Thanks, --Chris On Mon, Oct 15, 2012 at 11:37 AM, Chris Nauroth <cnaur...@hortonworks.com>wrote: > I think it would work, but I'm wondering if it would be easier for your > application to restructure the keys emitted from the mapper tasks so that > you can take advantage of the sorting inherently done during the shuffle. > > For each reduce task, your reducer code will receive keys emitted from > mappers in sorted order. Therefore, if the keys emitted from your mapper > contain the item's priority, then the shuffle would provide the sort order > that you need. This might lead you down the path of writing a custom > WritableComparable to use as the map output key, but this is usually pretty > trivial. > > Also, keep in mind that if you run multiple reduce tasks, then each > reducer receives a subset of the keys emitted from the mapper. Depending > on your application logic, this may or may not be a problem. > > Thanks, > --Chris > > > On Mon, Oct 15, 2012 at 11:07 AM, Aseem Anand <aseem.ii...@gmail.com>wrote: > >> Hi Chris, >> I had a few PriorityQueue's at the mappers which I wished to send to some >> reducers. After this each reducer(receiving PriorityQueues from each >> mapper) would perform some operations on these by removing the top and >> hence accessing the elements in sorted order(which is very essential to my >> application). Even I thought of pushing them in an ArrayWritable but was >> wondering if there would be an existing implementation of PriorityQueue. >> Would it be advisable to insert elements into ArrayWritable in sorted >> order and reconstruction of merged PriorityQueues at the other end now ? >> >> Thanks, >> Aseem >> >> >> On Mon, Oct 15, 2012 at 11:07 PM, Chris Nauroth <cnaur...@hortonworks.com >> > wrote: >> >>> Hello Aseem, >>> >>> I'm aware of nothing in Hadoop or related projects that provides a >>> PriorityQueueWritable. You could achieve this by taking some existing >>> priority queue class and subclassing it or wrapping it to implement the >>> Writable.write and Writable.readFields methods. >>> >>> If you could give us some additional context around what you want to >>> solve, then we might be able to offer some other suggestions. For example, >>> depending on the problem, maybe you could sort values and wrap them in >>> ArrayWritable (which already exists), which would save you the trouble of >>> coding your own custom Writable. >>> >>> Thank you, >>> --Chris >>> >>> On Mon, Oct 15, 2012 at 9:56 AM, Aseem Anand <aseem.ii...@gmail.com>wrote: >>> >>>> Hi, >>>> Is anyone familiar with a PriorityQueueWritable to be used to pass data >>>> from mapper to reducers ? >>>> >>>> Regards, >>>> Aseem >>>> >>> >>> >> >