Re: possibly wrong code in class org.apache.nutch.indexer.IndexerMapReduce , nutch-1.13

Sebastian Nagel Sun, 10 Sep 2017 04:01:22 -0700

Sorry, the right link to open an issue is
   https://issues.apache.org/jira/projects/NUTCH


Thanks,
Sebastian

On 09/10/2017 12:58 PM, Sebastian Nagel wrote:
> Hi Junqiang,
> 
> thanks for the careful code review.
> 
> Well, the answer isn't that trivial. In general, the code is right
> because any of the values being null will make the indexing or
> scoring filters called later fail with a NPE.
> However, the comment is wrong or incomplete:
> - "only have inlinks" (if links not yet added to CrawlDb,
>   then the most common case for sure)
> - but there are other possible ways the condition may become true:
>   * a URL / CrawlDatum removed from CrawlDb
>     (in combination with a parallelized workflow)
>   * parsing skipped or failed
>     (need to check whether this may happen)
> 
> Feel free to open an issue on http://issues.apache.org/jira/NUTCH
> to make the code better documented / commented.
> 
> Thanks,
> Sebastian
> 
> 
> On 09/09/2017 01:24 PM, Junqiang Zhang wrote:
>> Hello,
>>
>> I am using nutch version 1.13. There might be mistakenly used logical
>> operators in the code from line 259 to 262 of the class
>> org.apache.nutch.indexer.IndexerMapReduce.
>>
>> The logical operators used are OR ||. I think the correct logical
>> operators should be AND &&. The comment in line 261 says "only have
>> inlinks", which also indicates the logical operators should be AND.
>>
>> Line 259 to 262 of org.apache.nutch.indexer.IndexerMapReduce are copied 
>> below.
>>
>>     if (fetchDatum == null || dbDatum == null || parseText == null
>>
>>         || parseData == null) {
>>
>>       return; // only have inlinks
>>
>>     }
>>
>> Development team please have a look at the code and determine whether
>> it is wrong. Thanks.
>>
>> Best,
>> Junqiang
>>
>

Re: possibly wrong code in class org.apache.nutch.indexer.IndexerMapReduce , nutch-1.13

Reply via email to