I upgraded my client to 0.90.1 per the suggestion (although the
server is still 0.89). I no longer get a NullPointerException when I try
to use TotalOrderPartitioner. However, I cannot get the
TotalOrderPartitioner to actually create the partition file, even though
a message is printed "hadoopbackport.InputSampler: Using 64 samples"
which indicated my custom sampler is running and generating partition
points. Can someone take a quick grock at my code below and tell me what
I'm missing? I use a fully qualified path name, I even set
"total.order.partitioner.path". All my println statements indicate the
partitions file is created, but it isn't created. Even more strangely,
when the mapreduce job starts, it complains that "File _partition.lst
does not exist" (even though I've explicitly told it to use a file named
"partitions-file" as opposed to the default "_partition.lst").
Path input = FileInputFormat.getInputPaths(job)[0];
input = input.makeQualified(input.getFileSystem(config));
Path partitionFilePath = new Path(input, "partitions-file");
TotalOrderPartitioner.setPartitionFile(config,
partitionFilePath);
job.getConfiguration().set("total.order.partitioner.path",
partitionFilePath.toString());
job.setPartitionerClass(TotalOrderPartitioner.class);
System.out.println("TotalOrderPartitioner thinks it's partition
file is: " + TotalOrderPartitioner.getPartitionFile(config));
job.setNumReduceTasks(100);
InputSampler.Sampler randomSampler = new RandomKeySampler<Text,
HitList>(100);
InputSampler.writePartitionFile(job, randomSampler);
System.out.println("wrote partition file: " +
TotalOrderPartitioner.getPartitionFile(config));
Any help greatly appreciated since I spent the day looking through
TotalOrderPartitioner and can't find what I'm doing wrong. Thanks!
-geoff