The current code putting updb_mrk in dbUpdateReducer is as follows
Utf8 mark = Mark.PARSE_MARK.removeMarkIfExist(page);
if (mark != null) {
Mark.UPDATEDB_MARK.putMark(page, mark);
}
the mark is always null, independent if there is PARSE_MARK or not.
This function calls
public Utf8 removeFromMarkers(Utf8 key) {
if (markers == null) { return null; }
getStateManager().setDirty(this, 20);
return markers.remove(key);
}
it seems to me that getStateManager().setDirty(this, 20); removes marker and
that is why the last line returns null.
I tried to follow getStateManager().setDirty(this, 20) in the hierarchy of
classes, but did not find anything useful.
I have fixed the issue by replacing the above lines with
Utf8 parse_mark = Mark.PARSE_MARK.checkMark(page);
if (parse_mark != null)
{
Mark.UPDATEDB_MARK.putMark(page, parse_mark);
Mark.PARSE_MARK.removeMark(page);
}
Thanks.
Alex.
-----Original Message-----
From: Ferdy Galema <[email protected]>
To: user <[email protected]>
Sent: Thu, Aug 2, 2012 12:16 am
Subject: Re: Nutch 2 solrindex
Hi,
Do you want to open a Jira and attach the patch over there? Or just explain
what the problem is caused. I'm curious to what this might be.
Thanks,
Ferdy.
On Wed, Aug 1, 2012 at 9:27 PM, <[email protected]> wrote:
> This is directly related to the thread I have opened yesterday. I think
> this is a bug, since updatedb fails to put update mark.
> I have fixed it by modifying code. I have a patch, but not sure if I can
> send it as an attachment.
>
> Alex.
>
>
>
> -----Original Message-----
> From: Bai Shen <[email protected]>
> To: user <[email protected]>
> Sent: Wed, Aug 1, 2012 10:37 am
> Subject: Nutch 2 solrindex
>
>
> I'm trying to crawl using Nutch 2. However, I can't seem to get it to
> index to solr without adding -reindex to the command. And at that point it
> indexes everything I've crawled. I've tried both -all and the batch id,
> but neither one results in anything being indexed to solr.
>
> Any suggestions of what to look at?
>
> Thanks.
>
>
>