On Friday 14 October 2011 15:30:25 Sergey A Volkov wrote: > Thanks for your quick reply. > > I will try to use scoreupdater next time=)
Keep in mind that it relies on the WebGraph program. Another quick fix would be to patch CrawlDBFilter to reset score based on the presence of some configuration setting. > > Unfortunately -addDays would not work for me because I want to refetch > only specified domains, not all db (my first question was not correct). > Another problem with -addDays and FetchSchedule is that I have to use > generate.topN lower than size of part for refetch (there are some time > restrictions for index update) > , so i can't determine when to stop using addDays If you only want to generate fetch lists for specific domains you can use a custom domain URL filter with the generator. Take care of using a filter for a generator with DB updating as you'll loose all filtered URL's then. > > On Fri 14 Oct 2011 04:52:33 PM MSK, Markus Jelsma wrote: > > There are no tools for resetting the score but it would not be hard to > > modify an existing tool for that e.g. WebGraph's scoreupdater tool. You > > can force refetch by using the -addDays switch with the generator tool. > > It'll add numDays to the current time to generate records that are not > > yet due for fetch. > > > > On Friday 14 October 2011 14:48:47 Sergey A Volkov wrote: > >> Hi! > >> > >> Is there any good way to modify all crawldb records? (e.g. drop score or > >> force refetch). > >> > >> I'm using now nutch 1.2 and as I see the only way to do this is writing > >> own MapReduce task for every modification or changing CrawlDb updater > >> and writing own extension point. > >> > >> Sergey Volkov. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

