--- On Tue, 13/7/10, AJ Chen <[email protected]> wrote:
From: AJ Chen <[email protected]> Subject: Re: parse step hangs To: "nutch-user" <[email protected]> Date: Tuesday, 13 July, 2010, 4:27 AM I set mime.type.magic=false, parsed the segment again. the parser got hung up at the same place. maybe tika is trapped into a endless loop after seeing mime-type application/x-sh. is there a way to configure tika to skip mime-type application/x-sh? thanks, -aj On Mon, Jul 12, 2010 at 3:36 PM, AJ Chen <[email protected]> wrote: > there is another thread reporting hanging during tika parsing. I'm seeing > similar problem now. not sure the cause is the same or not, but what to show > the message at the point of hanging. > 2010-07-12 14:36:33,645 ERROR tika.TikaParser - Can't retrieve Tika parser > for mime-type application/x-sh > 2010-07-12 14:36:33,645 WARN parse.Parser - Error parsing: > http://rsb.info.nih.gov/ij/download/linux/unix-script.txt: failed(2,0): > Can't retrieve Tika parser for mime-type application/x-sh > 2010-07-12 14:36:33,650 INFO parse.ParserFactory - The parsing plugins: > [org.apache.nutch.parse.tika.Parser - > org.apache.nutch.parse.text.TextParser] are enabled via the plugin.includes > system property, and all claim to support the content type text/plain, but > they are not mapped to it in the parse-plugins.xml file > > my setting: > mime.type.magic=true > plugin.includes=...parse-(text|html|js|tika)... > > any idea? > thanks, > -- > AJ Chen, PhD > Chair, Semantic Web SIG, sdforum.org > http://web2express.org > twitter @web2express > Palo Alto, CA, USA > -- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA

