I tried configuring my instance to fetch and parse your page with the
following result

lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local/bin$
./nutch parsechecker
http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
fetching: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
parsing: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
contentType: application/javascript
signature: 4bf7aa15c0e79cb2330bc80c417f0a55
---------
Url
---------------
http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
---------
ParseData
---------
Version: 5
Status: UNKNOWN!(-53,0): Content not JavaScript: 'application/javascript'
Title:
Outlinks: 0
Content Metadata:
Parse Metadata:

So I tried a small experiment to see if I could hack a solution but
unfortunately as far as I got was to find that beginning on line 152
of the JSParserFilter class we see

public ParseResult getParse(Content c) {
    String type = c.getContentType();
    if (type != null && !type.trim().equals("") &&
!type.toLowerCase().startsWith("application/x-javascript"))
      return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT,
              "Content not JavaScript: '" + type +
"'").getEmptyParseResult(c.getUrl(), getConf());

It appears from the ParserChecker that ParseStatus is returning the
FAILED_INVALID_FORMAT message which we get. If you are going to focus
on getting the plugin to actually parse your files, I would begin
there, however I wouldn't expect miracles from the Parser if it is
geared specifically for mimeType application/x-javascript

hth

Lewis

On Fri, May 18, 2012 at 6:12 AM, forwardswing <[email protected]> wrote:
> First of all,thank you very much for your reply.
>
> I have followed your suggestion and did the following modification:
>
> <mimeType name="application/javascript">
>                <plugin id="parse-js" />
>        </mimeType>
>        <mimeType name="text/javascript">
>                <plugin id="parse-js" />
>        </mimeType>
>
> <alias name="parse-js" extension-id="org.apache.nutch.parse.js.JSParser" />
>
> There is still an error:
> dtree.js: failed(2,0): Can't retrieve Tika parser for mime-type
> text/javascript
> here is the js file to be parse,could you please have a try in your
> environment ?
>
> http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js dtree.js
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-t-retrieve-Tika-parser-for-mime-type-text-javascript-tp3983599p3984604.html
> Sent from the Nutch - User mailing list archive at Nabble.com.



-- 
Lewis

Reply via email to