hi Kiran

> Does anyone know why i am having this conflict ? I feel thats because of
> the Tika parser parsecodes (Major Code and Minor code) but i have not been
> able to figure out why this happened.


as explained earlier you are confusing cause and consequences here. the
parsing does not fail because of the codes but the codes indicate that it
fails

there is no point in bothering people in the Tika list as the codes are not
related to tika but are 100% Nutch

please give more info about your setup : local? psuedo -distributed?
running from the command line? Have you checked that the content limit is
really taken into account? What messages are you getting in the logs?
etc....

Thanks

Julien


On 31 October 2012 13:53, kiran chitturi <chitturikira...@gmail.com> wrote:

> Hi,
>
> I have mailed the list previously about Tika parse Codes (major code and
> minor code) and as Julien pointed out here
> http://www.mail-archive.com/user%40nutch.apache.org/msg07950.html 'sh
> bin/nutch parsechecker -dumpText
> http://scholar.lib.vt.edu/ejournals/ALAN/v29n3/pdf/watson.pdf' works but
> when i do 'sh bin/nutch parse <crawlId> that includes the above pdf file
> then i see this message in the logs
>
> 'WARN  parse.ParseUtil - Unable to successfully parse content
> http://scholar.lib.vt.edu/ejournals/ALAN/v29n3/pdf/watson.pdf of type
> application/pdf'
>
> Does anyone know why i am having this conflict ? I feel thats because of
> the Tika parser parsecodes (Major Code and Minor code) but i have not been
> able to figure out why this happened.
>
> Did anyone encounter this problem before ? I am also gonna post in tika
> mailing list about what the codes mean ?
>
>
> Regards,
>
> --
> Kiran Chitturi
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to