subject:"Multiple anchors on same site \- what's better than making these unique\?"

Re: Multiple anchors on same site - what's better than making these unique?

2006-01-05 Thread Doug Cutting

David Wallace wrote: I've been grubbing around with Nutch for a while now, although I'm still working with 0.7 code. I notice that when anchors are collected for a document, they're made unique by domain and by anchor text. Note that this is only done when collecting anchor texts, not when

Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread David Wallace

Hi all, I've been grubbing around with Nutch for a while now, although I'm still working with 0.7 code. I notice that when anchors are collected for a document, they're made unique by domain and by anchor text. I'm using Nutch for an intranet style search engine, on a single site, so I don't

Re: Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread Stefan Groschupf

Hi, did you tried... property namedb.ignore.internal.links/name valuetrue/value descriptionIf true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the link database, keeping the only the highest quality links.

Re: Multiple anchors on same site - what's better than making these unique?

2005-12-19 Thread David Wallace

Thank you Stefan, for your speedy response. I have indeed changed that setting to false. However, that doesn't deal with my problem. The offending method is getAnchors in org.apache.nutch.db.WebDBAnchors, which is called from org.apache.nutch.tools.FetchListTool. This method makes the array

Re: Multiple anchors on same site - what's better than making these unique?

Multiple anchors on same site - what's better than making these unique?

Re: Multiple anchors on same site - what's better than making these unique?

Re: Multiple anchors on same site - what's better than making these unique?

4 matches

Site Navigation

Mail list logo

Footer information