Hi Jens, Please read this old thread at http://search-hadoop.com/m/WHvZDCfVsD which covers the issue, the solution and more.
On Fri, May 31, 2013 at 1:39 AM, Jens Scheidtmann <[email protected]> wrote: > Dear list, > > I have created a sequence file like this: > > seqWriter = SequenceFile.createWriter(fs, getConf(), new Path(hdfsPath), > IntWritable.class, BytesWritable.class, SequenceFile.CompressionType.NONE); > seqWriter.append(new IntWritable(index++), new BytesWritable(buf)); > > (with buf a byte array.) > > Now, when reading the same sequence file in a map reduce job, I specify the > mapper like this: > > public static class NoOfMovesMapper > extends Mapper<IntWritable, BytesWritable, IntWritable, IntWritable> > { > > and configure the SequenceFile as: > > SequenceFileAsBinaryInputFormat.addInputPath(jobConf, new > Path(args[i])); > > This job fails with: > > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot > be cast to org.apache.hadoop.io.IntWritable > at > org.gostats.hadoop.NoOfMoves$NoOfMovesMapper.map(NoOfMoves.java:1) > > I have to specify the mapper as > > extends Mapper<LongWritable, Text, IntWritable, IntWritable> { > > to read the sequence file. But then the number of records and invocations of > the map is much larger than I would expect. I thought that I will have as > many invocations of map as records in the sequence file. > > What am I doing wrong? Were am I wrong? > > Thanks in advance, > > Jens -- Harsh J
