OpenOffice Parser plugin
------------------------
Key: NUTCH-125
URL: http://issues.apache.org/jira/browse/NUTCH-125
Project: Nutch
Type: New Feature
Reporter: Andrzej Bialecki
Assigned to: Andrzej Bialecki
A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin does
not use the UNO bridge in OpenOffice , but rather uses standard ZipInputStream,
and parses content.xml and meta.xml inside these files to extract metadata and
plain text.
This plugin uses dom4j, because of easy XPath node selection, but this
dependency could be removed.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers