Hi all,

I am doing a very simple Map that determines an integer value to
assign to an input (1-64000).
The reduction does nothing, but I then use this output formatter to
put the data in a file per Key.

public class CellBasedOutputFormat extends
MultipleTextOutputFormat<WritableComparable, Writable> {
        @Override
        protected String generateFileNameForKeyValue(WritableComparable
key,Writable value, String name) {
                return "cell_" + key.toString();
        }
}

I get an out of memory error:
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:633)
        at java.nio.DirectByteBuffer.(DirectByteBuffer.java:95)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        at 
org.apache.hadoop.io.compress.zlib.ZlibCompressor.(ZlibCompressor.java:198)
        at 
org.apache.hadoop.io.compress.zlib.ZlibCompressor.(ZlibCompressor.java:211)
        at 
org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressor(ZlibFactory.java:83)
        at 
org.apache.hadoop.io.compress.DefaultCodec.createCompressor(DefaultCodec.java:59)
        at 
org.apache.hadoop.io.compress.DefaultCodec.createOutputStream(DefaultCodec.java:43)
        at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:131)
        at 
org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44)
        at 
org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300)
        at 
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

I will keep this alive for about 24 hours but you can see the errors
here: 
http://ec2-67-202-42-36.compute-1.amazonaws.com:50030/jobtasks.jsp?jobid=job_200811250345_0001&type=reduce&pagenum=1

Please can you offer some advice?
Are my tuning parameters (Map tasks, Reduce tasks) perhaps wrong?

My configuration is:
                JobConf conf = new JobConf();
                conf.setJobName("OccurrenceByCellSplitter");
                conf.setNumMapTasks(10);
                conf.setNumReduceTasks(5);
                
                conf.setOutputKeyClass(Text.class);
                conf.setOutputValueClass(Text.class);
                conf.setMapperClass(OccurrenceBy1DegCellMapper.class);
                conf.setInputFormat(TextInputFormat.class);
                conf.setOutputFormat(CellBasedOutputFormat.class);
                
                FileInputFormat.setInputPaths(conf, inputFile);
                FileOutputFormat.setOutputPath(conf, outputDirectory);
                
                long time = System.currentTimeMillis();
                conf.setJarByClass(OccurrenceBy1DegCellMapper.class);
                JobClient.runJob(conf);


Many thanks for any advice,

Tim

Reply via email to