What do you call inlinks? I call inlink for mysite.com all urls as mysite.com/myhtml1.html, mysite.com/myhtml2.html and etc. Currently they are saved as ol in hbase. from hbase shell do this
get 'webpage', 'com.mysite:http/' and check what ol family looks like. I have these config <property> <name>db.ignore.external.links</name> <value>true</value> </property> <property> <name>db.ignore.internal.links</name> <value>true</value> </property> Alex. -----Original Message----- From: kiran chitturi <[email protected]> To: user <[email protected]> Sent: Wed, Jan 30, 2013 11:11 am Subject: Re: Nutch 2.0 updatedb and gora query I have checked the database after the dbupdate job is ran and i could see only markers, signature and fetch fields. The initial seed which was crawled and parsed, has only outlinks. I notice one of the outlink is actually the inlink. Aren't inlinks supposed to be saved during the dbUpdatedJob ? When i tried to debug, i could see in eclipse and in the dbUpdateReducer job that the inlinks are being saved to the page object along with fetch fields, markers but i did not understood where the data is going from there. Is the data written to Hbase during the dbUpdateReducer job ? Thanks, Kiran. On Wed, Jan 30, 2013 at 1:43 PM, <[email protected]> wrote: > I see that inlinks are saved as ol in hbase. > > Alex. > > > > > > > > -----Original Message----- > From: kiran chitturi <[email protected]> > To: user <[email protected]> > Sent: Wed, Jan 30, 2013 9:31 am > Subject: Re: Nutch 2.0 updatedb and gora query > > > Link to the reference ( > > http://lucene.472066.n3.nabble.com/Inlinks-not-being-saved-in-the-database-td4037067.html > ) > and jira (https://issues.apache.org/jira/browse/NUTCH-1524) > > > On Wed, Jan 30, 2013 at 12:25 PM, kiran chitturi > <[email protected]>wrote: > > > Hi, > > > > I have posted a similar issue in dev list [0]. The problem comes with > > inlinks not being saved to database even though they are added to the > > webpage object. > > > > I am curious about what happens after the fields are saved in the webpage > > object. How are they sent to Gora ? Which class is used to communicate > with > > Gora ? > > > > I have seen Storage Utils class but i want to know if its the only class > > that is used to communicate with databases. > > > > Please let me know your suggestions. I feel, the inlinks are not being > > saved due to small problem in the code. > > > > > > > > [0] - > > http://mail-archives.apache.org/mod_mbox/nutch-dev/201301.mbox/browser > > > > Thanks, > > -- > > Kiran Chitturi > > > > > > -- > Kiran Chitturi > > > -- Kiran Chitturi

