Folks, I think this might be due to the default TextInputFormat in Hadoop. Any pointers to solutions much appreciated. >> More powerfully, you can define your own *InputFormat* implementations to format the input to your programs however you want. For example, the default TextInputFormat reads lines of text files. The key it emits for each record is the byte offset of the line read (as a LongWritable), and the value is the contents of the line up to the terminating '\n' character (as a Text object). If you have multi-line records each separated by a $character, you could write your own *InputFormat* that parses files into records split on this character instead. >>
Thanks, Mohit