Re: Multiple anchors on same site - what's better than making these unique?

2006-01-05 Thread Doug Cutting
David Wallace wrote: I've been grubbing around with Nutch for a while now, although I'm still working with 0.7 code. I notice that when anchors are collected for a document, they're made unique by domain and by anchor text. Note that this is only done when collecting anchor texts, not when

Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread David Wallace
Hi all, I've been grubbing around with Nutch for a while now, although I'm still working with 0.7 code. I notice that when anchors are collected for a document, they're made unique by domain and by anchor text. I'm using Nutch for an intranet style search engine, on a single site, so I don't

Re: Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread Stefan Groschupf
Hi, did you tried... property namedb.ignore.internal.links/name valuetrue/value descriptionIf true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the link database, keeping the only the highest quality links.

Re: Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread David Wallace
Thank you Stefan, for your speedy response. I have indeed changed that setting to false. However, that doesn't deal with my problem. The offending method is getAnchors in org.apache.nutch.db.WebDBAnchors, which is called from org.apache.nutch.tools.FetchListTool. This method makes the array