Hi,
I am using Hadoop 1.0.2. I have written a map reduce job. I have a requirement
to process the whole file without splitting. So I have written a new input
format to process the file as a whole by overriding the isSplittable() method.
I have also created a new Record reader implementation to read the whole file.
I followed the sample in Chapter 7 of "Hadoop- The Definitive Guide" book. In
my map reduce job, my mapper emits BytesWritable as value. I want to get the
bytes and read some specific information from the bytes. I use
ByteArrayInputStream and do further processing. But strangely the following
code shows different numbers. Because of this I am getting errors.
//value -> BytesWritable
System.out.println("Bytes length " + value.getLength()); // Bytes length 1931650
byte[] bytes = value.getBytes();
System.out.println("Bytes array length"+bytes.length); //Bytes array length
2897340
My file size is 1931650 bytes. I don't know why byte array is bigger than the
original file.
Any idea what is going wrong. Please help. Thanks in advance.
Regards,
Anand.C