[
http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12320763 ]
Stephan Strittmatter commented on NUTCH-21:
---
I will verify the Unit-Tests until next week!
parser plugin for MS PowerPoint slides
[ http://issues.apache.org/jira/browse/NUTCH-21?page=all ]
Stephan Strittmatter updated NUTCH-21:
--
Attachment: parse-mspowerpoint.zip
Updated plugin sources in respect of changed Nutch interface
parser plugin for MS PowerPoint slides
[ http://issues.apache.org/jira/browse/NUTCH-20?page=all ]
Stephan Strittmatter updated NUTCH-20:
--
Description:
Some parsers have no Outlinks returned. E.g. the Word-Parser.
This class is able to extract (absolute) hyperlinks from a plain String