Hey there,
Hope all has been going well for you. I noticed a small issue with the
parse-oo plugin. It parses the documents correctly, however, when you
find a open office document as a result and click "cached", it returns
with a NullPointerException error. I looked into it and the line in
cached.jsp that is throwing the NPE is below:
String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
So apparently the parse-oo plugin does not store the CONTENT_TYPE of the
document. I looked and modified around line 100 and changed:
Outlink[] links = (Outlink[])outlinks.toArray(new
Outlink[outlinks.size()]);
ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
title, links, metadata);
return new ParseImpl(text, parseData);
to:
Outlink[] links = (Outlink[])outlinks.toArray(new
Outlink[outlinks.size()]);
ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
title, links, content.getMetadata(), metadata);
parseData.setConf(this.conf);
return new ParseImpl(text, parseData);
This fixes the problem of the cached.jsp throwing an exception, but
instead it displays every document type as either [octet-stream] or
[oleobject].
So it seems as if it's not interpreting the mime types correctly. Do you
know how to fix both the cached.jsp issue and the mime-type issue
concurrently??
Thanks,
Matt
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general