Hi Look at the code for the class ParseStatusCodes. This simply indicates that the parsing failed and is not the cause for the failing itself. Do you get the entire text for the document or just what the parser managed to process until it failed? Did you set the content limit to -1?
Thanks Julien On 29 October 2012 19:17, kiran chitturi <[email protected]> wrote: > Hi! > > I am debugging nutch with eclipse and i have found out that some pdf files > which are not succesfully parsed have majorCode as 2 and minorCode as 200 > and files which are succesfully parsed have majorCode 1 and minorCode 0. > > Can someone please explain me or point to what these codes mean ? > > Actually, the title, text and everything is parsed in the failed parses but > somehow because of the codes it not saving the fields and returning as > failed parsing. > > Thanks for your help. > > Regards, > -- > Kiran Chitturi > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

