[ 
http://issues.apache.org/jira/browse/NUTCH-125?page=comments#action_12366063 ] 

ilango gurusamy commented on NUTCH-125:
---------------------------------------

Hi Andrezj
I want to try out this plugin. What areas do you think this parser would need 
improvement or further additions, if any?
I would be glad to help

thanks
ilango

> OpenOffice Parser plugin
> ------------------------
>
>          Key: NUTCH-125
>          URL: http://issues.apache.org/jira/browse/NUTCH-125
>      Project: Nutch
>         Type: New Feature
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: parse-oo.zip
>
> A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin 
> does not use the UNO bridge in OpenOffice , but rather uses standard 
> ZipInputStream, and parses content.xml and meta.xml inside these files to 
> extract metadata and plain text.
> This plugin uses dom4j, because of easy XPath node selection, but this 
> dependency could be removed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to