[ http://issues.apache.org/jira/browse/NUTCH-20?page=history ]
Stephan Strittmatter updated NUTCH-20:
--------------------------------------
Attachment: OutlinkExtractor.java
TestOutlink.java
I am not shure if the patch is really correct format. So I attach also the
java-file and the associated JUnit test file.
> Extract urls from plain texts
> ------------------------------
>
> Key: NUTCH-20
> URL: http://issues.apache.org/jira/browse/NUTCH-20
> Project: Nutch
> Type: Improvement
> Components: fetcher
> Reporter: Stefan Grroschupf
> Priority: Trivial
> Attachments: OutlinkExtractor.java, TestOutlink.java, patch.txt
>
> transfered from:
> http://sourceforge.net/tracker/index.php?func=detail&aid=1109328&group_id=59548&atid=491356
> submitted by:
> Stephan Strittmatter
> Some parsers have no Outlinks returned. E.g. the
> Word-Parser.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers