The log you sent earlier indicated that Tika had no parser for the that mime
type, which means it not used for it.
It might be hanging but that would be on a different document and possibly
mimetype

Try setting in log4j.properties
log4j.logger.org.apache.nutch=DEBUG

and check the logs again

On 12 July 2010 23:57, AJ Chen <[email protected]> wrote:

> I set mime.type.magic=false, parsed the segment again. the parser got hung
> up at the same place. maybe tika is trapped into a endless loop after
> seeing
> mime-type application/x-sh.  is there a way to configure tika to skip
> mime-type application/x-sh?
> thanks,
> -aj
>
> On Mon, Jul 12, 2010 at 3:36 PM, AJ Chen <[email protected]> wrote:
>
> > there is another thread reporting hanging during tika parsing. I'm seeing
> > similar problem now. not sure the cause is the same or not, but what to
> show
> > the message at the point of hanging.
> > 2010-07-12 14:36:33,645 ERROR tika.TikaParser - Can't retrieve Tika
> parser
> > for mime-type application/x-sh
> > 2010-07-12 14:36:33,645 WARN  parse.Parser - Error parsing:
> > http://rsb.info.nih.gov/ij/download/linux/unix-script.txt: failed(2,0):
> > Can't retrieve Tika parser for mime-type application/x-sh
> > 2010-07-12 14:36:33,650 INFO  parse.ParserFactory - The parsing plugins:
> > [org.apache.nutch.parse.tika.Parser -
> > org.apache.nutch.parse.text.TextParser] are enabled via the
> plugin.includes
> > system property, and all claim to support the content type text/plain,
> but
> > they are not mapped to it  in the parse-plugins.xml file
> >
> > my setting:
> > mime.type.magic=true
> > plugin.includes=...parse-(text|html|js|tika)...
> >
> > any idea?
> > thanks,
> > --
> > AJ Chen, PhD
> > Chair, Semantic Web SIG, sdforum.org
> > http://web2express.org
> > twitter @web2express
> > Palo Alto, CA, USA
> >
>
>
>
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>



-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to