Thanks, Sunhail.
I think the email below was not too clear that's why I sent the follow-up email. (I sent it at 9am but I didn't get a copy until almost 4pm...)
It's not the hyperlinks that link to a given url that I want, it's the url of the page those hyperlinks are on. Doug said this is not possible yet...
But I will play with WebDBReader.getLinks and see what I get.
Again, thanks.
-lucas
P.S. I am responding to this email now because I just got it at 4:10pm...
On May 17, 2005, at 9:46 AM, Suhail Ahmed wrote:
Hi,
Take a look at WebDBReader.java. I shows you how to do what you want. You should also look at HtmlParser.java if you want to get hold of the out links from a page whilst Nutch is performing the parse on the document.
Suhail
On May 17, 2005, at 4:46 AM, Lucas Rockwell wrote:
Hi all,
I am fairly new to nutch (but I have been wading through the code, docs and mailing lists) and I am wondering if there is a way to get the url of an anchor as well as the text of an anchor? I have a feeling there is, but I have not pulled things apart enough to really know for sure.
Any help would be much appreciated.
Thanks.
-lucas
p.s. nutch is a first-rate piece of software. Thanks to all who have labored over this amazing tool!
