I tried configuring my instance to fetch and parse your page with the following result
lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local/bin$ ./nutch parsechecker http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js fetching: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js parsing: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js contentType: application/javascript signature: 4bf7aa15c0e79cb2330bc80c417f0a55 --------- Url --------------- http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js --------- ParseData --------- Version: 5 Status: UNKNOWN!(-53,0): Content not JavaScript: 'application/javascript' Title: Outlinks: 0 Content Metadata: Parse Metadata: So I tried a small experiment to see if I could hack a solution but unfortunately as far as I got was to find that beginning on line 152 of the JSParserFilter class we see public ParseResult getParse(Content c) { String type = c.getContentType(); if (type != null && !type.trim().equals("") && !type.toLowerCase().startsWith("application/x-javascript")) return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT, "Content not JavaScript: '" + type + "'").getEmptyParseResult(c.getUrl(), getConf()); It appears from the ParserChecker that ParseStatus is returning the FAILED_INVALID_FORMAT message which we get. If you are going to focus on getting the plugin to actually parse your files, I would begin there, however I wouldn't expect miracles from the Parser if it is geared specifically for mimeType application/x-javascript hth Lewis On Fri, May 18, 2012 at 6:12 AM, forwardswing <[email protected]> wrote: > First of all,thank you very much for your reply. > > I have followed your suggestion and did the following modification: > > <mimeType name="application/javascript"> > <plugin id="parse-js" /> > </mimeType> > <mimeType name="text/javascript"> > <plugin id="parse-js" /> > </mimeType> > > <alias name="parse-js" extension-id="org.apache.nutch.parse.js.JSParser" /> > > There is still an error: > dtree.js: failed(2,0): Can't retrieve Tika parser for mime-type > text/javascript > here is the js file to be parse,could you please have a try in your > environment ? > > http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js dtree.js > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-t-retrieve-Tika-parser-for-mime-type-text-javascript-tp3983599p3984604.html > Sent from the Nutch - User mailing list archive at Nabble.com. -- Lewis

