Additionally, why do we log.DEBUG that there is a different batch id (" +
mark + ")", should we not log what the different batch id is, as oppose to
the FETCH_MARK mark? ...which in this case is null which is useless to us.This DEBUG logging is also present in the following classes and until I understand it, I am not really happy with it being present. IndexerJob ParserJob FetcherJob On Thu, Apr 25, 2013 at 3:20 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > Within ParserJob#map, I am keen to see how the situation arises where the > !NutchJob.shouldProcess returns true due to the fact that > Mark.FETCH_MARK.checkMark(page) returns value null. > > In what scenarios is it possible to have a page which we attempt to fetch, > which has a null value for FETCH_MARK? > > @Override > public void map(String key, WebPage page, Context context) > throws IOException, InterruptedException { > Utf8 mark = Mark.FETCH_MARK.checkMark(page); > String unreverseKey = TableUtil.unreverseUrl(key); > if (batchId.equals(REPARSE)) { > LOG.debug("Reparsing " + unreverseKey); > } else { > if (!NutchJob.shouldProcess(mark, batchId)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + "; > different batch id (" + mark + ")"); > } > return; > > Any ideas? Is this a bug? > > > > On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster <[email protected] > > wrote: > >> Hi Lewis, thank you very much, for your answer. I do not know how, but I >> solved it. No longer appear "different batch id (null)". In any case, I'm >> using Nutch 2.1 >> Good day, Carmine >> >> >> 2013/4/24 Lewis John Mcgibbney <[email protected]> >> >>> >>> Hi Carmine, >>> >>> CC: [email protected] >>> >>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster < >>> [email protected]> wrote: >>> >>>> I configured Nutch and mySql following this guide ( >>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at some >>>> point in the database I find all elements with baseUrl=null, content=null. >>>> Nutch not parsing, many url. I receive this message in Nutch console: >>>> Skipping http://myurlForParsing.it; different batch id (null) >>>> >>>> How can I fix? >>>> >>>> >>>> >>> This is actually something which I've wondered about for a while and it >>> was on my TODO list of things to address!!! >>> I want to know how to reproduce different batch id (null). >>> Which version of 2.x are you on? 2.1? >>> Thanks >>> Leewis >>> >> >> > > > -- > *Lewis* > -- *Lewis*

