Additionally, why do we log.DEBUG that there is a different batch id (" +
mark + ")", should we not log what the different batch id is, as oppose to
the FETCH_MARK mark? ...which in this case is null which is useless to us.

This DEBUG logging is also present in the following classes and until I
understand it, I am not really happy with it being present.
IndexerJob
ParserJob
FetcherJob


On Thu, Apr 25, 2013 at 3:20 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi,
> Within ParserJob#map, I am keen to see how the situation arises where the
> !NutchJob.shouldProcess returns true due to the fact that
> Mark.FETCH_MARK.checkMark(page) returns value null.
>
> In what scenarios is it possible to have a page which we attempt to fetch,
> which has a null value for FETCH_MARK?
>
>     @Override
>     public void map(String key, WebPage page, Context context)
>         throws IOException, InterruptedException {
>       Utf8 mark = Mark.FETCH_MARK.checkMark(page);
>       String unreverseKey = TableUtil.unreverseUrl(key);
>       if (batchId.equals(REPARSE)) {
>         LOG.debug("Reparsing " + unreverseKey);
>       } else {
>         if (!NutchJob.shouldProcess(mark, batchId)) {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + ";
> different batch id (" + mark + ")");
>           }
>           return;
>
> Any ideas? Is this a bug?
>
>
>
> On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster <[email protected]
> > wrote:
>
>> Hi Lewis, thank you very much, for your answer. I do not know how, but I
>> solved it. No longer appear "different batch id (null)". In any case, I'm
>> using Nutch 2.1
>> Good day, Carmine
>>
>>
>> 2013/4/24 Lewis John Mcgibbney <[email protected]>
>>
>>>
>>> Hi Carmine,
>>>
>>> CC: [email protected]
>>>
>>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster <
>>> [email protected]> wrote:
>>>
>>>> I configured Nutch and mySql following this guide (
>>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at some
>>>> point in the database I find all elements with baseUrl=null, content=null.
>>>> Nutch not parsing, many url. I receive this message in Nutch console:
>>>> Skipping http://myurlForParsing.it; different batch id (null)
>>>>
>>>> How can I fix?
>>>>
>>>>
>>>>
>>> This is actually something which I've wondered about for a while and it
>>> was on my TODO list of things to address!!!
>>> I want to know how to reproduce different batch id (null).
>>> Which version of 2.x are you on? 2.1?
>>> Thanks
>>> Leewis
>>>
>>
>>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Reply via email to