Hi,
Within ParserJob#map, I am keen to see how the situation arises where the
!NutchJob.shouldProcess returns true due to the fact that
Mark.FETCH_MARK.checkMark(page) returns value null.

In what scenarios is it possible to have a page which we attempt to fetch,
which has a null value for FETCH_MARK?

    @Override
    public void map(String key, WebPage page, Context context)
        throws IOException, InterruptedException {
      Utf8 mark = Mark.FETCH_MARK.checkMark(page);
      String unreverseKey = TableUtil.unreverseUrl(key);
      if (batchId.equals(REPARSE)) {
        LOG.debug("Reparsing " + unreverseKey);
      } else {
        if (!NutchJob.shouldProcess(mark, batchId)) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + ";
different batch id (" + mark + ")");
          }
          return;

Any ideas? Is this a bug?



On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster
<[email protected]>wrote:

> Hi Lewis, thank you very much, for your answer. I do not know how, but I
> solved it. No longer appear "different batch id (null)". In any case, I'm
> using Nutch 2.1
> Good day, Carmine
>
>
> 2013/4/24 Lewis John Mcgibbney <[email protected]>
>
>>
>> Hi Carmine,
>>
>> CC: [email protected]
>>
>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster <
>> [email protected]> wrote:
>>
>>> I configured Nutch and mySql following this guide (
>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at some
>>> point in the database I find all elements with baseUrl=null, content=null.
>>> Nutch not parsing, many url. I receive this message in Nutch console:
>>> Skipping http://myurlForParsing.it; different batch id (null)
>>>
>>> How can I fix?
>>>
>>>
>>>
>> This is actually something which I've wondered about for a while and it
>> was on my TODO list of things to address!!!
>> I want to know how to reproduce different batch id (null).
>> Which version of 2.x are you on? 2.1?
>> Thanks
>> Leewis
>>
>
>


-- 
*Lewis*

Reply via email to