[Nutch-dev] collecting outgoing links from a given page

Jungshik Shin Wed, 05 May 2004 22:53:11 -0700

Hi,

I'm trying to write a 'local' crawler over a small snapshot of the
whole web with about 10 million pages gathered by Nutch and stored
in nutch WebDB. I haven't managed to figure out how to extract the
list of outgoing links out of a given page in html. I wouldn't say
I have looked very hard, but about an hour-long search of API
documentation (in Javadoc) didn't lead me anywhere so that I'm
resorting to this mailing list for help.  It'd be nice for information
on extracting outgoing links (URLs) out of a given page in html using
Nutch APIs.


Thanks in advance for your help,

Jungshik


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] collecting outgoing links from a given page

Reply via email to