Re: Nutch 2.x parse MajorCode, MinorCode

kiran chitturi Tue, 30 Oct 2012 05:58:40 -0700

Hi

I did not sent the content limit to -1 but i have set it high enough to be
able to go through the documents that i am parsing. I could see some title
and text but i am not sure how much it is able to do. I am gonna try using
tika separately and try to process the documents. If all of it goes through
tika-1.2 separately then i have to try to debug where i am getting the
error here.


Many Thanks,
Kiran.

On Tue, Oct 30, 2012 at 4:37 AM, Julien Nioche <
[email protected]> wrote:

> Hi
>
> Look at the code for the class ParseStatusCodes. This simply indicates that
> the parsing failed and is not the cause for the failing itself. Do you get
> the entire text for the document or just what the parser managed to process
> until it failed? Did you set the content limit to -1?
>
> Thanks
>
> Julien
>
>
> On 29 October 2012 19:17, kiran chitturi <[email protected]>
> wrote:
>
> > Hi!
> >
> > I am debugging nutch with eclipse and i have found out that some pdf
> files
> > which are not succesfully parsed have majorCode as 2 and minorCode as 200
> > and files which are succesfully parsed have majorCode 1 and minorCode 0.
> >
> > Can someone please explain me or point to what these codes mean ?
> >
> > Actually, the title, text and everything is parsed in the failed parses
> but
> > somehow because of the codes it not saving the fields and returning as
> > failed parsing.
> >
> > Thanks for your help.
> >
> > Regards,
> > --
> > Kiran Chitturi
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 
Kiran Chitturi

Re: Nutch 2.x parse MajorCode, MinorCode

Reply via email to