Hi All,
I went ahead and added some documentation to the wiki on this topic
*http://s.apache.org/Jb6
*
Please add to it where you see fit.
I still think that the logging is incorrect on this one.*
*


On Fri, Apr 26, 2013 at 12:47 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> and the logging for null? it is literally useless as far as I can see.
> although it is DEBUG it should (imo) show batch id for null mark (so one
> can make, if they so require, necessary steps to resolve the null value) as
> oppose to null value for mark... no?
> Thank you Roland.
>
>
> On Friday, April 26, 2013, Roland von Herget <[email protected]>
> wrote:
> > Hi Lewis,
> >
> > as you have to load all backend entries, as there are no filters ("where"
> > clauses in SQL) in gora, you will see a lot of entries with wrong
> fetchmark.
> > null values are possible, too, think about these steps: inject ->
> generate
> > -> inject -> fetch
> > The second inject will leave entries in the db without fetchmarks seen by
> > the fetcher later.
> >
> > --Roland
> >
> >
> > On Fri, Apr 26, 2013 at 12:30 AM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> >> Additionally, why do we log.DEBUG that there is a different batch id ("
> +
> >> mark + ")", should we not log what the different batch id is, as oppose
> to
> >> the FETCH_MARK mark? ...which in this case is null which is useless to
> us.
> >>
> >> This DEBUG logging is also present in the following classes and until I
> >> understand it, I am not really happy with it being present.
> >> IndexerJob
> >> ParserJob
> >> FetcherJob
> >>
> >>
> >> On Thu, Apr 25, 2013 at 3:20 PM, Lewis John Mcgibbney <
> >> [email protected]> wrote:
> >>
> >> > Hi,
> >> > Within ParserJob#map, I am keen to see how the situation arises where
> the
> >> > !NutchJob.shouldProcess returns true due to the fact that
> >> > Mark.FETCH_MARK.checkMark(page) returns value null.
> >> >
> >> > In what scenarios is it possible to have a page which we attempt to
> >> fetch,
> >> > which has a null value for FETCH_MARK?
> >> >
> >> >     @Override
> >> >     public void map(String key, WebPage page, Context context)
> >> >         throws IOException, InterruptedException {
> >> >       Utf8 mark = Mark.FETCH_MARK.checkMark(page);
> >> >       String unreverseKey = TableUtil.unreverseUrl(key);
> >> >       if (batchId.equals(REPARSE)) {
> >> >         LOG.debug("Reparsing " + unreverseKey);
> >> >       } else {
> >> >         if (!NutchJob.shouldProcess(mark, batchId)) {
> >> >           if (LOG.isDebugEnabled()) {
> >> >             LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + ";
> >> > different batch id (" + mark + ")");
> >> >           }
> >> >           return;
> >> >
> >> > Any ideas? Is this a bug?
> >> >
> >> >
> >> >
> >> > On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster <
> >> [email protected]
> >> > > wrote:
> >> >
> >> >> Hi Lewis, thank you very much, for your answer. I do not know how,
> but I
> >> >> solved it. No longer appear "different batch id (null)". In any case,
> >> I'm
> >> >> using Nutch 2.1
> >> >> Good day, Carmine
> >> >>
> >> >>
> >> >> 2013/4/24 Lewis John Mcgibbney <[email protected]>
> >> >>
> >> >>>
> >> >>> Hi Carmine,
> >> >>>
> >> >>> CC: [email protected]
> >> >>>
> >> >>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster <
> >> >>> [email protected]> wrote:
> >> >>>
> >> >>>> I configured Nutch and mySql following this guide (
> >> >>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at
> >> some
> >> >>>> point in the database I find all elements with baseUrl=null,
> >> content=null.
> >> >>>> Nutch not parsing, many url. I receive this message in Nutch
> console:
> >> >>>> Skipping http://myurlForParsing.it; different batch id (null)
> >> >>>>
> >> >>>> How can I fix?
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>> This is actually something which I've wondered about for a while
> and it
> >> >>> was on my TODO list of things to address!!!
> >> >>> I want to know how to reproduce different batch id (null).
> >> >>> Which version of 2.x are you on? 2.1?
> >> >>> Thanks
> >> >>> Leewis
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > *Lewis*
> >> >
> >>
> >>
> >>
> >> --
> >> *Lewis*
> >>
> >
>
> --
> *Lewis*
>
>


-- 
*Lewis*

Reply via email to