Thanks. I will see if I can reproduce and patch this. (In case you do not create a Jira).
On Thu, Aug 2, 2012 at 7:54 PM, <[email protected]> wrote: > The current code putting updb_mrk in dbUpdateReducer is as follows > > Utf8 mark = Mark.PARSE_MARK.removeMarkIfExist(page); > if (mark != null) { > Mark.UPDATEDB_MARK.putMark(page, mark); > } > the mark is always null, independent if there is PARSE_MARK or not. > > This function calls > > public Utf8 removeFromMarkers(Utf8 key) { > if (markers == null) { return null; } > getStateManager().setDirty(this, 20); > return markers.remove(key); > } > > it seems to me that getStateManager().setDirty(this, 20); removes marker > and that is why the last line returns null. > > I tried to follow getStateManager().setDirty(this, 20) in the hierarchy > of classes, but did not find anything useful. > > I have fixed the issue by replacing the above lines with > > Utf8 parse_mark = Mark.PARSE_MARK.checkMark(page); > if (parse_mark != null) > { > Mark.UPDATEDB_MARK.putMark(page, parse_mark); > Mark.PARSE_MARK.removeMark(page); > } > > Thanks. > Alex. > > > > -----Original Message----- > > From: Ferdy Galema <[email protected]> > To: user <[email protected]> > Sent: Thu, Aug 2, 2012 12:16 am > Subject: Re: Nutch 2 solrindex > > > Hi, > > Do you want to open a Jira and attach the patch over there? Or just explain > what the problem is caused. I'm curious to what this might be. > > Thanks, > Ferdy. > > On Wed, Aug 1, 2012 at 9:27 PM, <[email protected]> wrote: > > > This is directly related to the thread I have opened yesterday. I think > > this is a bug, since updatedb fails to put update mark. > > I have fixed it by modifying code. I have a patch, but not sure if I can > > send it as an attachment. > > > > Alex. > > > > > > > > -----Original Message----- > > From: Bai Shen <[email protected]> > > To: user <[email protected]> > > Sent: Wed, Aug 1, 2012 10:37 am > > Subject: Nutch 2 solrindex > > > > > > I'm trying to crawl using Nutch 2. However, I can't seem to get it to > > index to solr without adding -reindex to the command. And at that point > it > > indexes everything I've crawled. I've tried both -all and the batch id, > > but neither one results in anything being indexed to solr. > > > > Any suggestions of what to look at? > > > > Thanks. > > > > > > > > >

