Hi,

I have made some experiments with the 3.0-alpha1 version of Jakarta POI
(used by parse-msword and parse-mspowerpoint).
Since this version contains the hwpf package it enables to parse msword
documents too (the actual version in lib-jakarta-poi plugin doesn't contain
this package).
The benefit is that we can remove the poi-2.1 jars bundled with parse-msword
and simply add a dependency to the lib-jakarta-poi plugin (like for
parse-mspowerpoint) : Just one version of POI libs is bundled in Nutch.
I had performed some tests on a lot of zipped doc files (cool to test two
plugins at the same time) from the 3GPP site and all is working fine.
I do not perform a lot of tests on powerpoints, but unit tests are ok.

If there is no objection, I will commit changes by the end of the week.

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to