Hi, It is needed for scoring and inlink calculation. There are some tricks to make it faster though, such as not clearing the previous inlinks map before writing new one and not deleting any markers. (Because that is slow in HBase). You have to modify the code for that now.
Ferdy. On Fri, Aug 17, 2012 at 9:42 PM, <[email protected]> wrote: > Hi, > > I noticed that updatedb command goes over all urls, even if they have been > updated in the previous generate, fetch updatedb stages. > As a result updatedb takes long time depending on the number of rows in > the datastore. > I thought maybe this is redundant and it must be restricted to not updated > urls, only. > > Thanks. > Alex. >

