I do some processing on my master index. "dedup" does not guarantee that it will delete newly inserted record or old record.
I am looking for something that can be done through nutch API and that too at merge time. in case there is not ready to use way to do selective merge, then is there a way to parse through all URLs in the index. something that might return enum or array of all URLs. I might iterate through index A url list and later take decision as to insert it or not. Index A size will be small enough to iterate through each URL. so i dont have that issue with performance. Regards Chetan -----Original Message----- From: Jack Tang [mailto:[EMAIL PROTECTED] Sent: Monday, April 18, 2005 11:41 AM To: [email protected] Subject: Re: Index merging Hi Chetan Try "dedup" command in nutch:) /Jack On 4/18/05, Chetan Sahasrabudhe <[EMAIL PROTECTED]> wrote: > Hello, > > I have a small index say A and want to merge it with my main > index say B. > How can I perform A minus B so as to ensure that there is no redundancy in my > main index (B). > > Set Theory: " A \ B: A minus B are all elements from A that are not in B. " > > Regards > Chetan > >
