and the logging for null? it is literally useless as far as I can see.
although it is DEBUG it should (imo) show batch id for null mark (so one
can make, if they so require, necessary steps to resolve the null value) as
oppose to null value for mark... no?
Thank you Roland.

On Friday, April 26, 2013, Roland von Herget <[email protected]>
wrote:
> Hi Lewis,
>
> as you have to load all backend entries, as there are no filters ("where"
> clauses in SQL) in gora, you will see a lot of entries with wrong
fetchmark.
> null values are possible, too, think about these steps: inject -> generate
> -> inject -> fetch
> The second inject will leave entries in the db without fetchmarks seen by
> the fetcher later.
>
> --Roland
>
>
> On Fri, Apr 26, 2013 at 12:30 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Additionally, why do we log.DEBUG that there is a different batch id (" +
>> mark + ")", should we not log what the different batch id is, as oppose
to
>> the FETCH_MARK mark? ...which in this case is null which is useless to
us.
>>
>> This DEBUG logging is also present in the following classes and until I
>> understand it, I am not really happy with it being present.
>> IndexerJob
>> ParserJob
>> FetcherJob
>>
>>
>> On Thu, Apr 25, 2013 at 3:20 PM, Lewis John Mcgibbney <
>> [email protected]> wrote:
>>
>> > Hi,
>> > Within ParserJob#map, I am keen to see how the situation arises where
the
>> > !NutchJob.shouldProcess returns true due to the fact that
>> > Mark.FETCH_MARK.checkMark(page) returns value null.
>> >
>> > In what scenarios is it possible to have a page which we attempt to
>> fetch,
>> > which has a null value for FETCH_MARK?
>> >
>> >     @Override
>> >     public void map(String key, WebPage page, Context context)
>> >         throws IOException, InterruptedException {
>> >       Utf8 mark = Mark.FETCH_MARK.checkMark(page);
>> >       String unreverseKey = TableUtil.unreverseUrl(key);
>> >       if (batchId.equals(REPARSE)) {
>> >         LOG.debug("Reparsing " + unreverseKey);
>> >       } else {
>> >         if (!NutchJob.shouldProcess(mark, batchId)) {
>> >           if (LOG.isDebugEnabled()) {
>> >             LOG.debug("Skipping " + TableUtil.unreverseUrl(key) + ";
>> > different batch id (" + mark + ")");
>> >           }
>> >           return;
>> >
>> > Any ideas? Is this a bug?
>> >
>> >
>> >
>> > On Thu, Apr 25, 2013 at 7:31 AM, Carmine Paternoster <
>> [email protected]
>> > > wrote:
>> >
>> >> Hi Lewis, thank you very much, for your answer. I do not know how,
but I
>> >> solved it. No longer appear "different batch id (null)". In any case,
>> I'm
>> >> using Nutch 2.1
>> >> Good day, Carmine
>> >>
>> >>
>> >> 2013/4/24 Lewis John Mcgibbney <[email protected]>
>> >>
>> >>>
>> >>> Hi Carmine,
>> >>>
>> >>> CC: [email protected]
>> >>>
>> >>> On Wed, Apr 24, 2013 at 3:13 AM, Carmine Paternoster <
>> >>> [email protected]> wrote:
>> >>>
>> >>>> I configured Nutch and mySql following this guide (
>> >>>> http://nlp.solutions.asia/?p=180). everything worked fine, but at
>> some
>> >>>> point in the database I find all elements with baseUrl=null,
>> content=null.
>> >>>> Nutch not parsing, many url. I receive this message in Nutch
console:
>> >>>> Skipping http://myurlForParsing.it; different batch id (null)
>> >>>>
>> >>>> How can I fix?
>> >>>>
>> >>>>
>> >>>>
>> >>> This is actually something which I've wondered about for a while and
it
>> >>> was on my TODO list of things to address!!!
>> >>> I want to know how to reproduce different batch id (null).
>> >>> Which version of 2.x are you on? 2.1?
>> >>> Thanks
>> >>> Leewis
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > *Lewis*
>> >
>>
>>
>>
>> --
>> *Lewis*
>>
>

-- 
*Lewis*

Reply via email to