Hi,

I am using Hadoop 1.0.2. I have written a map reduce job. I have a requirement 
to process the whole file without splitting. So I have written a new input 
format to process the file as a whole by overriding the isSplittable() method. 
I have also created a new Record reader implementation to read the whole file. 
I followed the sample in Chapter 7 of "Hadoop- The Definitive Guide" book. In 
my map reduce job, my mapper emits BytesWritable as value. I want to get the 
bytes and read some specific information from the bytes. I use 
ByteArrayInputStream and do further processing. But strangely the following 
code shows different numbers. Because of this I am getting errors.

//value -> BytesWritable
System.out.println("Bytes length " + value.getLength()); // Bytes length 1931650
byte[] bytes = value.getBytes();
System.out.println("Bytes array length"+bytes.length); //Bytes array length 
2897340

My file size is 1931650 bytes. I don't know why byte array is bigger than the 
original file.

Any idea what is going wrong. Please help. Thanks in advance.

Regards,
Anand.C

Reply via email to