OpenOffice Parser plugin
------------------------

         Key: NUTCH-125
         URL: http://issues.apache.org/jira/browse/NUTCH-125
     Project: Nutch
        Type: New Feature
    Reporter: Andrzej Bialecki 
 Assigned to: Andrzej Bialecki  


A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin does 
not use the UNO bridge in OpenOffice , but rather uses standard ZipInputStream, 
and parses content.xml and meta.xml inside these files to extract metadata and 
plain text.

This plugin uses dom4j, because of easy XPath node selection, but this 
dependency could be removed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to