Dear Nutch developers,
during the last weeks I investigated into Nutch
and found, that currently MS PowerPoint slides are not supported. I am not
shure, but I have also not found any hint within the mailing lists, that someone
already has implemented a parser plugin for this document type.
I created such a plugin based on POI which is
working fine for latin char based text. There are currently some problems with
slides containing other chars like chinese or cyrillic, but I want to solve this
also in the next days.
After doing some more tests and improved javadocs
I would be glad if I could overgive the sources on behalf of Sybit GmbH to the nutch-project.
Kind regards,
Stephan Strittmatter
Senior Developer
-----
Sybit GmbH, Waldstra�e 28, D-78315 Radolfzell
Fon: +49 (7732) 9508-00 Fax: -29
mailto:[EMAIL PROTECTED]
http://www.sybit.de
Sybit - a bit better
Sybit GmbH, Waldstra�e 28, D-78315 Radolfzell
Fon: +49 (7732) 9508-00 Fax: -29
mailto:[EMAIL PROTECTED]
http://www.sybit.de
Sybit - a bit better
- [Nutch-dev] [announce] Parser-Plugin for MS PowerPoi... Strittmatter, Stephan
- Re: [Nutch-dev] [announce] Parser-Plugin for MS... Stephan Lagraulet
