Hi all, I am doing a very simple Map that determines an integer value to assign to an input (1-64000). The reduction does nothing, but I then use this output formatter to put the data in a file per Key.
public class CellBasedOutputFormat extends MultipleTextOutputFormat<WritableComparable, Writable> { @Override protected String generateFileNameForKeyValue(WritableComparable key,Writable value, String name) { return "cell_" + key.toString(); } } I get an out of memory error: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:633) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:95) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) at org.apache.hadoop.io.compress.zlib.ZlibCompressor.(ZlibCompressor.java:198) at org.apache.hadoop.io.compress.zlib.ZlibCompressor.(ZlibCompressor.java:211) at org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressor(ZlibFactory.java:83) at org.apache.hadoop.io.compress.DefaultCodec.createCompressor(DefaultCodec.java:59) at org.apache.hadoop.io.compress.DefaultCodec.createOutputStream(DefaultCodec.java:43) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:131) at org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44) at org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300) at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209) I will keep this alive for about 24 hours but you can see the errors here: http://ec2-67-202-42-36.compute-1.amazonaws.com:50030/jobtasks.jsp?jobid=job_200811250345_0001&type=reduce&pagenum=1 Please can you offer some advice? Are my tuning parameters (Map tasks, Reduce tasks) perhaps wrong? My configuration is: JobConf conf = new JobConf(); conf.setJobName("OccurrenceByCellSplitter"); conf.setNumMapTasks(10); conf.setNumReduceTasks(5); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(OccurrenceBy1DegCellMapper.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(CellBasedOutputFormat.class); FileInputFormat.setInputPaths(conf, inputFile); FileOutputFormat.setOutputPath(conf, outputDirectory); long time = System.currentTimeMillis(); conf.setJarByClass(OccurrenceBy1DegCellMapper.class); JobClient.runJob(conf); Many thanks for any advice, Tim