Hi, The inlinks are populated with the DbUpdaterJob, which does a couple of other things too. (Such as updating scores, fetchtime etc.)
On Mon, Oct 29, 2012 at 4:31 AM, Thilina Gunarathne <[email protected]>wrote: > Dear all, > I'm trying to extract the InLinks data from a not-so-large Nutch crawl > which uses HBase as the data store. First, I tried the 'il' column family, > but found only one page with inLinks listed in it. Then I used a simple > MapReduce program to invert the outlinks data in 'ol" column family and > found many more pages with inLinks. > I would like to know when the 'il' family get's populated? Also whether > using a simple MapReduce program to invert the outlinks data is the correct > way to extract any inLink information? > > thanks a lot in advance, > Thilina > > -- > https://www.cs.indiana.edu/~tgunarat/ > http://www.linkedin.com/in/thilina > http://thilina.gunarathne.org >

