update crawldb

2006-12-19 Thread Aïcha
hello, I use the Prune tool to remove documents from segment indexes but it does not remove pages and links from WebDB. To prevent the presence of the unwanted URLs when new segments are created, it is advised to use our own link net.nutch.net.URLFilter, or PruneDBTool (under

Re: subcollections IT DOESN'T WORK!

2006-12-19 Thread kauu
hi , i'm new to nutch ,i want to know what's the useness of the subcollection plugin? where is the introduction? On 12/19/06, liv [EMAIL PROTECTED] wrote: I may be loosing all and every credit ... it's still in the same state - reindex doesn't change the subcollection field! I did a

Re: subcollections IT DOESN'T WORK!

2006-12-19 Thread liv
look here: http://issues.apache.org/jira/browse/NUTCH-201?page=all unfortunately it doesn't work as expected... yet kauu wrote: hi , i'm new to nutch ,i want to know what's the useness of the subcollection plugin? where is the introduction? -- View this message in context:

Re: subcollections

2006-12-19 Thread liv
I checked the patch for subcollections (http://issues.apache.org/jira/browse/NUTCH-201) - although I assumed it is included in the latest public release 0.8.1. Compared to the current source code, it looks like having has an extra file (which doesn't exist in version 0.8.1)

How best to add sponsored link support..??

2006-12-19 Thread RP
Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add this. It would be nice to utilize the Nutch engine to come up with the pages

Re: How best to add sponsored link support..??

2006-12-19 Thread RP
Let me qualify this - ad banner rotation is dealt with - I'm looking for something that will use our Nutch engine to serve up relevant links from people who pay for that privilege. We do not want to serve up ad's from someone else's system i.e. the big G or Y, but use our own Nutch search

Re: How best to add sponsored link support..??

2006-12-19 Thread Sami Siren
Are you looking for something like the google keymatch as described in [1] which was then more or less mimiced in nutch web2 module[1], and since also atleast as a lookalike released in google code [3] -- Sami Siren [1] http://www.google.com/enterprise/mini/end_user_features.html [2]

Re: How best to add sponsored link support..??

2006-12-19 Thread RP
Thanks Sami, This is closer from an initial look - does this do anything on the backend (i.e. defining the data flags sow e can get a match) as well or do we need to build that..?? Sami Siren wrote: Are you looking for something like the google keymatch as described in [1] which was then

Need help with deleteduplicates

2006-12-19 Thread sdeck
Hello, I am running nutch .8 against hadoop .4, just for reference I want to add a delete duplicate based on a similarity algorithm, as opposed to the hash method that is currently in there. I would have to say I am pretty lost as to how the delete duplicates class is working. I would guess