Hi, I'm trying to write a 'local' crawler over a small snapshot of the whole web with about 10 million pages gathered by Nutch and stored in nutch WebDB. I haven't managed to figure out how to extract the list of outgoing links out of a given page in html. I wouldn't say I have looked very hard, but about an hour-long search of API documentation (in Javadoc) didn't lead me anywhere so that I'm resorting to this mailing list for help. It'd be nice for information on extracting outgoing links (URLs) out of a given page in html using Nutch APIs.
Thanks in advance for your help, Jungshik ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
