Hi All, I went ahead and added some documentation to the wiki on this topic *http://s.apache.org/Jb6 * Please add to it where you see fit. I still think that the logging is incorrect on this one.* *
On Fri, Apr 26, 2013 at 12:47 AM, Lewis John Mcgibbney < [email protected]> wrote: > and the logging for null? it is literally useless as far as I can see. > although it is DEBUG it should (imo) show batch id for null mark (so one > can make, if they so require, necessary steps to resolve the null value) as > oppose to null value for mark... no? > Thank you Roland. > > > On Friday, April 26, 2013, Roland von Herget <[email protected]> > wrote: > > Hi Lewis, > > > > as you have to load all backend entries, as there are no filters ("where" > > clauses in SQL) in gora, you will see a lot of entries with wrong > fetchmark. > > null values are possible, too, think about these steps: inject -> > generate > > -> inject -> fetch > > The second inject will leave entries in the db without fetchmarks seen by > > the fetcher later. > > > > --Roland > > > > > > On Fri, Apr 26, 2013 at 12:30 AM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > >> Additionally, why do we log.DEBUG that there is a different batch id (" > + > >> mark + ")", should we not log what the different batch id is, as oppose > to > >> the FETCH_MARK mark? ...which in this case is null which is useless to > us. > >> > >> This DEBUG logging is also present in the following classes and until I > >> understand it, I am not really happy with it being present. > >> IndexerJob > >> ParserJob > >> FetcherJob > >> > >> > >> On Thu, Apr 25, 2013 at 3:20 PM, Lewis John Mcgibbney < > >> [email protected]> wrote: > >> > >> > Hi, > >> > Within ParserJob#map, I am keen to see how the situation arises where > the > >> > !NutchJob.shouldProcess returns true due to the fact that > >> > Mark.FETCH_MARK.checkMark(page) returns value null. > >> > > >> > In what scenarios is it possible to have a page which we attempt to > >> fetch, > >> > which has a null value for FETCH_MARK? > >> > > >> > @Override > >> > public void map(String key, WebPage page, Context context) > >> > throws IOException, InterruptedException { > >> > Utf8 mark = Mark.FETCH_MARK.checkMark(page); > >> > String unreverseKey = TableUtil.unreverseUrl(key); > >> > if (batchId.equals(REPARSE)) { > >> > LOG.debug("Reparsing " + unreverseKey); > >> > } else { > >> > if (!NutchJob.shouldProcess(mark, batchId)) { > >> > if (LOG.isDebugEnabled()) { > >> > LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + "; > >> > different batch id (" + mark + ")"); > >> > } > >> > return; > >> > > >> > Any ideas? Is this a bug? > >> > > >> > > >> > > >> > On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster < > >> [email protected] > >> > > wrote: > >> > > >> >> Hi Lewis, thank you very much, for your answer. I do not know how, > but I > >> >> solved it. No longer appear "different batch id (null)". In any case, > >> I'm > >> >> using Nutch 2.1 > >> >> Good day, Carmine > >> >> > >> >> > >> >> 2013/4/24 Lewis John Mcgibbney <[email protected]> > >> >> > >> >>> > >> >>> Hi Carmine, > >> >>> > >> >>> CC: [email protected] > >> >>> > >> >>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster < > >> >>> [email protected]> wrote: > >> >>> > >> >>>> I configured Nutch and mySql following this guide ( > >> >>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at > >> some > >> >>>> point in the database I find all elements with baseUrl=null, > >> content=null. > >> >>>> Nutch not parsing, many url. I receive this message in Nutch > console: > >> >>>> Skipping http://myurlForParsing.it; different batch id (null) > >> >>>> > >> >>>> How can I fix? > >> >>>> > >> >>>> > >> >>>> > >> >>> This is actually something which I've wondered about for a while > and it > >> >>> was on my TODO list of things to address!!! > >> >>> I want to know how to reproduce different batch id (null). > >> >>> Which version of 2.x are you on? 2.1? > >> >>> Thanks > >> >>> Leewis > >> >>> > >> >> > >> >> > >> > > >> > > >> > -- > >> > *Lewis* > >> > > >> > >> > >> > >> -- > >> *Lewis* > >> > > > > -- > *Lewis* > > -- *Lewis*

