Hey there,
  Hope all has been going well for you. I noticed a small issue with the 
parse-oo plugin. It parses the documents correctly, however, when you 
find a open office document as a result and click "cached", it returns 
with a NullPointerException error. I looked into it and the line in 
cached.jsp that is throwing the NPE is below:

String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);

So apparently the parse-oo plugin does not store the CONTENT_TYPE of the 
document. I looked and modified around line 100 and changed:

    Outlink[] links = (Outlink[])outlinks.toArray(new 
Outlink[outlinks.size()]);
    ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS, 
title, links, metadata);
    return new ParseImpl(text, parseData);

to:

    Outlink[] links = (Outlink[])outlinks.toArray(new 
Outlink[outlinks.size()]);
    ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS, 
title, links, content.getMetadata(), metadata);
    parseData.setConf(this.conf);
    return new ParseImpl(text, parseData);

This fixes the problem of the cached.jsp throwing an exception, but 
instead it displays every document type as either [octet-stream] or 
[oleobject].

So it seems as if it's not interpreting the mime types correctly. Do you 
know how to fix both the cached.jsp issue and the mime-type issue 
concurrently??
 Thanks,
  Matt

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to