Dear list,
I have created a sequence file like this:
seqWriter = SequenceFile.createWriter(fs, getConf(), new
Path(hdfsPath), IntWritable.class, BytesWritable.class,
SequenceFile.CompressionType.NONE);
seqWriter.append(new IntWritable(index++), new BytesWritable(buf));
(with buf a byte array.)
Now, when reading the same sequence file in a map reduce job, I specify the
mapper like this:
public static class NoOfMovesMapper
extends Mapper<IntWritable, BytesWritable, IntWritable,
IntWritable> {
and configure the SequenceFile as:
SequenceFileAsBinaryInputFormat.addInputPath(jobConf, new
Path(args[i]));
This job fails with:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.IntWritable
at
org.gostats.hadoop.NoOfMoves$NoOfMovesMapper.map(NoOfMoves.java:1)
I have to specify the mapper as
extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
to read the sequence file. But then the number of records and invocations
of the map is much larger than I would expect. I thought that I will have
as many invocations of map as records in the sequence file.
What am I doing wrong? Were am I wrong?
Thanks in advance,
Jens