Hi I did not sent the content limit to -1 but i have set it high enough to be able to go through the documents that i am parsing. I could see some title and text but i am not sure how much it is able to do. I am gonna try using tika separately and try to process the documents. If all of it goes through tika-1.2 separately then i have to try to debug where i am getting the error here.
Many Thanks, Kiran. On Tue, Oct 30, 2012 at 4:37 AM, Julien Nioche < [email protected]> wrote: > Hi > > Look at the code for the class ParseStatusCodes. This simply indicates that > the parsing failed and is not the cause for the failing itself. Do you get > the entire text for the document or just what the parser managed to process > until it failed? Did you set the content limit to -1? > > Thanks > > Julien > > > On 29 October 2012 19:17, kiran chitturi <[email protected]> > wrote: > > > Hi! > > > > I am debugging nutch with eclipse and i have found out that some pdf > files > > which are not succesfully parsed have majorCode as 2 and minorCode as 200 > > and files which are succesfully parsed have majorCode 1 and minorCode 0. > > > > Can someone please explain me or point to what these codes mean ? > > > > Actually, the title, text and everything is parsed in the failed parses > but > > somehow because of the codes it not saving the fields and returning as > > failed parsing. > > > > Thanks for your help. > > > > Regards, > > -- > > Kiran Chitturi > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > -- Kiran Chitturi

