What do you call inlinks? I call inlink for mysite.com all urls as 
mysite.com/myhtml1.html, mysite.com/myhtml2.html and etc.
Currently they are saved as ol in hbase. from hbase shell do this

 get 'webpage', 'com.mysite:http/' and check what ol family looks like.

I have these config 
<property>
  <name>db.ignore.external.links</name>
  <value>true</value>
</property>
<property>
  <name>db.ignore.internal.links</name>
  <value>true</value>  
</property>



Alex.

 

 

 

-----Original Message-----
From: kiran chitturi <[email protected]>
To: user <[email protected]>
Sent: Wed, Jan 30, 2013 11:11 am
Subject: Re: Nutch 2.0 updatedb and gora query


 I have checked the database after the dbupdate job is ran and i could see
only markers, signature and fetch fields.

The initial seed which was crawled and parsed, has only outlinks. I notice
one of the outlink is actually the inlink.

Aren't inlinks supposed to be saved during the dbUpdatedJob ? When i tried
to debug, i could see in eclipse and in the dbUpdateReducer job that the
inlinks are being saved to the page object along with fetch fields, markers
but i did not understood where the data is going from there.

Is the data written to Hbase during the dbUpdateReducer job ?

Thanks,
Kiran.




On Wed, Jan 30, 2013 at 1:43 PM, <[email protected]> wrote:

> I see that inlinks are saved as ol in hbase.
>
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: kiran chitturi <[email protected]>
> To: user <[email protected]>
> Sent: Wed, Jan 30, 2013 9:31 am
> Subject: Re: Nutch 2.0 updatedb and gora query
>
>
> Link to the reference (
>
> http://lucene.472066.n3.nabble.com/Inlinks-not-being-saved-in-the-database-td4037067.html
> )
> and jira (https://issues.apache.org/jira/browse/NUTCH-1524)
>
>
> On Wed, Jan 30, 2013 at 12:25 PM, kiran chitturi
> <[email protected]>wrote:
>
> > Hi,
> >
> > I have posted a similar issue in dev list [0]. The problem comes with
> > inlinks not being saved to database even though they are added to the
> > webpage object.
> >
> > I am curious about what happens after the fields are saved in the webpage
> > object. How are they sent to Gora ? Which class is used to communicate
> with
> > Gora ?
> >
> > I have seen Storage Utils class but i want to know if its the only class
> > that is used to communicate with databases.
> >
> > Please let me know your suggestions. I feel, the inlinks are not being
> > saved due to small problem in the code.
> >
> >
> >
> > [0] -
> > http://mail-archives.apache.org/mod_mbox/nutch-dev/201301.mbox/browser
> >
> > Thanks,
> > --
> > Kiran Chitturi
> >
>
>
>
> --
> Kiran Chitturi
>
>
>


-- 
Kiran Chitturi

 

Reply via email to