there is another thread reporting hanging during tika parsing. I'm seeing similar problem now. not sure the cause is the same or not, but what to show the message at the point of hanging. 2010-07-12 14:36:33,645 ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type application/x-sh 2010-07-12 14:36:33,645 WARN parse.Parser - Error parsing: http://rsb.info.nih.gov/ij/download/linux/unix-script.txt: failed(2,0): Can't retrieve Tika parser for mime-type application/x-sh 2010-07-12 14:36:33,650 INFO parse.ParserFactory - The parsing plugins: [org.apache.nutch.parse.tika.Parser - org.apache.nutch.parse.text.TextParser] are enabled via the plugin.includes system property, and all claim to support the content type text/plain, but they are not mapped to it in the parse-plugins.xml file
my setting: mime.type.magic=true plugin.includes=...parse-(text|html|js|tika)... any idea? thanks, -- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA

