One final poin there which I forgot.
The point of the parse-js plugin is to extract outlinks from JS pages.
The page you supplied contained only one outlink to a page which no
longer exists, so depending on what your purposes are you may not find
the parse-js plugin of much help

Lewis

On Fri, May 18, 2012 at 11:09 AM, Lewis John Mcgibbney
<[email protected]> wrote:
> I tried configuring my instance to fetch and parse your page with the
> following result
>
> lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local/bin$
> ./nutch parsechecker
> http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
> fetching: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
> parsing: http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
> contentType: application/javascript
> signature: 4bf7aa15c0e79cb2330bc80c417f0a55
> ---------
> Url
> ---------------
> http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js
> ---------
> ParseData
> ---------
> Version: 5
> Status: UNKNOWN!(-53,0): Content not JavaScript: 'application/javascript'
> Title:
> Outlinks: 0
> Content Metadata:
> Parse Metadata:
>
> So I tried a small experiment to see if I could hack a solution but
> unfortunately as far as I got was to find that beginning on line 152
> of the JSParserFilter class we see
>
> public ParseResult getParse(Content c) {
>    String type = c.getContentType();
>    if (type != null && !type.trim().equals("") &&
> !type.toLowerCase().startsWith("application/x-javascript"))
>      return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT,
>              "Content not JavaScript: '" + type +
> "'").getEmptyParseResult(c.getUrl(), getConf());
>
> It appears from the ParserChecker that ParseStatus is returning the
> FAILED_INVALID_FORMAT message which we get. If you are going to focus
> on getting the plugin to actually parse your files, I would begin
> there, however I wouldn't expect miracles from the Parser if it is
> geared specifically for mimeType application/x-javascript
>
> hth
>
> Lewis
>
> On Fri, May 18, 2012 at 6:12 AM, forwardswing <[email protected]> wrote:
>> First of all,thank you very much for your reply.
>>
>> I have followed your suggestion and did the following modification:
>>
>> <mimeType name="application/javascript">
>>                <plugin id="parse-js" />
>>        </mimeType>
>>        <mimeType name="text/javascript">
>>                <plugin id="parse-js" />
>>        </mimeType>
>>
>> <alias name="parse-js" extension-id="org.apache.nutch.parse.js.JSParser" />
>>
>> There is still an error:
>> dtree.js: failed(2,0): Can't retrieve Tika parser for mime-type
>> text/javascript
>> here is the js file to be parse,could you please have a try in your
>> environment ?
>>
>> http://lucene.472066.n3.nabble.com/file/n3984604/dtree.js dtree.js
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Can-t-retrieve-Tika-parser-for-mime-type-text-javascript-tp3983599p3984604.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>
> --
> Lewis



-- 
Lewis

Reply via email to