This is not super easy to do, but possible.
1) You would probably need to use the range partitioner to partition
the vertices (or else they will be interspersed across partitions).
2) You would be to add a partition store implementation that kept the
vertices sorted (i.e. TreeMap)
Alternatively, you can write a simple map-reduce job to do the sort of
course.
On 3/21/13 5:58 PM, Ameet Kini wrote:
Is it possible to save the final output sorted by vertex id? My
vertices have their id of type long, and I am using
SequenceFileOutputFormat, where the key of the sequence file is the
vertex id of type long. If the vertices were somehow written in sorted
order, I could even switch to using Hadoop's MapFileOutputFormat,
which expects sorted keys. I understand that if there are multiple
workers, there won't be a total order on the keys, and that's fine. As
long as each worker writes its output sorted by vertex id.
I was looking at the code and looks like the call to writeVertex is
made in BspServiceWorker.saveVertices, Looks like there is no way to
control the order of vertices, but I may be missing something. Any
pointers or examples would help.
Thanks,
Ameet