Hi,

The inlinks are populated with the DbUpdaterJob, which does a couple of
other things too. (Such as updating scores, fetchtime etc.)

On Mon, Oct 29, 2012 at 4:31 AM, Thilina Gunarathne <[email protected]>wrote:

> Dear all,
> I'm trying to extract the InLinks data from a not-so-large Nutch crawl
> which uses HBase as the data store. First, I tried the 'il' column family,
> but found only one page with inLinks listed in it. Then I used a simple
> MapReduce program to invert the outlinks data in 'ol" column family and
> found many more pages with inLinks.
> I would like to know when the 'il' family get's populated? Also whether
> using a simple MapReduce program to invert the outlinks data is the correct
> way to extract any inLink information?
>
> thanks a lot in advance,
> Thilina
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
> http://thilina.gunarathne.org
>

Reply via email to