On Friday 14 October 2011 15:30:25 Sergey A Volkov wrote:
> Thanks for your quick reply.
> 
> I will try to use scoreupdater next time=)

Keep in mind that it relies on the WebGraph program. Another quick fix would 
be to patch CrawlDBFilter to reset score based on the presence of some 
configuration setting.

> 
> Unfortunately -addDays would not work for me because I want to refetch
> only specified domains, not all db (my first question was not correct).
> Another problem with -addDays  and FetchSchedule is that I have to use
> generate.topN lower than size of part for refetch (there are some time
> restrictions for index update)
> , so i can't determine when to stop using addDays

If you only want to generate fetch lists for specific domains you can use a 
custom domain URL filter with the generator. 

Take care of using a filter for a generator with DB updating as you'll loose 
all filtered URL's then.

> 
> On Fri 14 Oct 2011 04:52:33 PM MSK, Markus Jelsma wrote:
> > There are no tools for resetting the score but it would not be hard to
> > modify an existing tool for that e.g. WebGraph's scoreupdater tool. You
> > can force refetch by using the -addDays switch with the generator tool.
> > It'll add numDays to the current time to generate records that are not
> > yet due for fetch.
> > 
> > On Friday 14 October 2011 14:48:47 Sergey A Volkov wrote:
> >> Hi!
> >> 
> >> Is there any good way to modify all crawldb records? (e.g. drop score or
> >> force refetch).
> >> 
> >> I'm using now nutch 1.2 and as I see the only way to do this is writing
> >> own MapReduce task for every modification or changing CrawlDb updater
> >> and writing own extension point.
> >> 
> >> Sergey Volkov.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to