Dear Nutch developers,
 
during the last weeks I investigated into Nutch and found, that currently MS PowerPoint slides are not supported. I am not shure, but I have also not found any hint within the mailing lists, that someone already has implemented a parser plugin for this document type.
 
I created such a plugin based on POI which is working fine for latin char based text. There are currently some problems with slides containing other chars like chinese or cyrillic, but I want to solve this also in the next days.
 
After doing some more tests and improved javadocs I would be glad if I could overgive the sources on behalf of Sybit GmbH to the nutch-project.
 
Kind regards,
 
Stephan Strittmatter
Senior Developer
-----
Sybit GmbH, Waldstra�e 28, D-78315 Radolfzell
Fon: +49 (7732) 9508-00           Fax: -29
mailto:[EMAIL PROTECTED]
http://www.sybit.de
Sybit - a bit better

Reply via email to