[ http://issues.apache.org/jira/browse/NUTCH-125?page=comments#action_12366063 ]
ilango gurusamy commented on NUTCH-125: --------------------------------------- Hi Andrezj I want to try out this plugin. What areas do you think this parser would need improvement or further additions, if any? I would be glad to help thanks ilango > OpenOffice Parser plugin > ------------------------ > > Key: NUTCH-125 > URL: http://issues.apache.org/jira/browse/NUTCH-125 > Project: Nutch > Type: New Feature > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Attachments: parse-oo.zip > > A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin > does not use the UNO bridge in OpenOffice , but rather uses standard > ZipInputStream, and parses content.xml and meta.xml inside these files to > extract metadata and plain text. > This plugin uses dom4j, because of easy XPath node selection, but this > dependency could be removed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
