Hi Kiran,

On Wed, Jan 30, 2013 at 11:10 AM, kiran chitturi
<[email protected]>wrote:

>  I have checked the database after the dbupdate job is ran and i could see
> only markers, signature and fetch fields.
>

Which Gora artifacts are you using?
We've recently fixed a bug in gora-cassandra [0] as the state for map
values was not being correctly recorded, this prevented us from writing the
values during the dbUpdaterJob.
I was not aware (and no-one flagged it up during either the Gora 0.2.1 or
Nutch 2.1 RC testing) that there was a problem with similar fields being
written to HBase.


>
> The initial seed which was crawled and parsed, has only outlinks. I notice
> one of the outlink is actually the inlink.
>

Can you reproduce? Is there any way of being more verbose here. This is
starting to sound like a bug. Unfortunately, I am not 100% on the HBase
module either!


>
> Aren't inlinks supposed to be saved during the dbUpdatedJob ?


Yes, specifically in the dbUpdaterReducerJob [1]


> When i tried
> to debug, i could see in eclipse and in the dbUpdateReducer job that the
> inlinks are being saved to the page object along with fetch fields, markers
> but i did not understood where the data is going from there.
>

We need to narrow this down and document it fully then.
I cannot look into this for a couple hours Kiran,
Lewis

[0] https://issues.apache.org/jira/browse/GORA-182
[1] http://wiki.apache.org/nutch/Nutch2Crawling#DbUpdate
[2]
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/java/org/apache/nutch/util/WebPageWritable.java

Reply via email to