Hi,
I already came up with similar changes to the code as in this patch. Only
suggestion to this patch's code is that to move checking if url exists in the
datastore under
if (!additionsAllowed) {
return;
}
and close datastore.
Thanks.
Alex.
-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
To: user <[email protected]>
Sent: Tue, Jun 24, 2014 9:07 am
Subject: Re: updatedb deletes all metadata except _csh_
Hi Alex,
I am really sorry for not making the connection here.
On Tue, Jun 24, 2014 at 12:31 AM, <[email protected]> wrote:
>
> So far, this looks like a bug in updatedb when filtering with batchId.
>
> I could only found one solution, to check if new pages are in the datastore
> and if they are skip them.
> Otherwise updatedb with option -all will also work.
>
https://issues.apache.org/jira/browse/NUTCH-1679
If you can run with this patch, then please post your results here.