i'm pretty new to Nutch and i'm trying to modify the code so it stores the words before and after a hyperlink as well as the anchor text.
i've ben looking through the nutch code for a couple of days and i'm still a little unclear as to the layout...
Nutch parses incoming webpages in HTMLParser.java right? i can't seem to find the code in here for url processing though - where exactly does it parse the anchor text and write it to the database?
any help greatly appreciated!
Brian
_______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
