One final poin there which I forgot. The point of the parse-js plugin is to extract outlinks from JS pages. The page you supplied contained only one outlink to a page which no longer exists, so depending on what your purposes are you may not find the parse-js plugin of much help
Lewis On Fri, May 18, 2012 at 11:09 AM, Lewis John Mcgibbney <[email protected]> wrote: > I tried configuring my instance to fetch and parse your page with the > following result > > lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local/bin$ > ./nutch parsechecker > http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js > fetching: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js > parsing: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js > contentType: application/javascript > signature: 4bf7aa15c0e79cb2330bc80c417f0a55 > --------- > Url > --------------- > http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js > --------- > ParseData > --------- > Version: 5 > Status: UNKNOWN!(-53,0): Content not JavaScript: 'application/javascript' > Title: > Outlinks: 0 > Content Metadata: > Parse Metadata: > > So I tried a small experiment to see if I could hack a solution but > unfortunately as far as I got was to find that beginning on line 152 > of the JSParserFilter class we see > > public ParseResult getParse(Content c) { > String type = c.getContentType(); > if (type != null && !type.trim().equals("") && > !type.toLowerCase().startsWith("application/x-javascript")) > return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT, > "Content not JavaScript: '" + type + > "'").getEmptyParseResult(c.getUrl(), getConf()); > > It appears from the ParserChecker that ParseStatus is returning the > FAILED_INVALID_FORMAT message which we get. If you are going to focus > on getting the plugin to actually parse your files, I would begin > there, however I wouldn't expect miracles from the Parser if it is > geared specifically for mimeType application/x-javascript > > hth > > Lewis > > On Fri, May 18, 2012 at 6:12 AM, forwardswing <[email protected]> wrote: >> First of all,thank you very much for your reply. >> >> I have followed your suggestion and did the following modification: >> >> <mimeType name="application/javascript"> >> <plugin id="parse-js" /> >> </mimeType> >> <mimeType name="text/javascript"> >> <plugin id="parse-js" /> >> </mimeType> >> >> <alias name="parse-js" extension-id="org.apache.nutch.parse.js.JSParser" /> >> >> There is still an error: >> dtree.js: failed(2,0): Can't retrieve Tika parser for mime-type >> text/javascript >> here is the js file to be parse,could you please have a try in your >> environment ? >> >> http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js dtree.js >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Can-t-retrieve-Tika-parser-for-mime-type-text-javascript-tp3983599p3984604.html >> Sent from the Nutch - User mailing list archive at Nabble.com. > > > > -- > Lewis -- Lewis

