Hi, I have a 20G gzip compressed log file on HDFS. Because log format of file is complex, I use to create SerDe for parsing. But, while parse the log file, occurred the parsing exception. The parser is read as a* ^D^H*, not a line.
127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD" 127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD" 127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD" 127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD" 127.0.0.1 [2012-08-20] "ABCDE *^D^H* The file of small size (about 40M) dose not occur parsing error. I read that hadoop don't split gzip compressed file, but it seems to be splitted. Am i doing anything wrong ? Plz. help me....