Re: possibly wrong code in class org.apache.nutch.indexer.IndexerMapReduce , nutch-1.13

Sebastian Nagel Sun, 10 Sep 2017 03:58:43 -0700

Hi Junqiang,

thanks for the careful code review.


Well, the answer isn't that trivial. In general, the code is right
because any of the values being null will make the indexing or
scoring filters called later fail with a NPE.
However, the comment is wrong or incomplete:
- "only have inlinks" (if links not yet added to CrawlDb,
  then the most common case for sure)
- but there are other possible ways the condition may become true:
  * a URL / CrawlDatum removed from CrawlDb
    (in combination with a parallelized workflow)
  * parsing skipped or failed
    (need to check whether this may happen)

Feel free to open an issue on http://issues.apache.org/jira/NUTCH
to make the code better documented / commented.

Thanks,
Sebastian


On 09/09/2017 01:24 PM, Junqiang Zhang wrote:
> Hello,
> 
> I am using nutch version 1.13. There might be mistakenly used logical
> operators in the code from line 259 to 262 of the class
> org.apache.nutch.indexer.IndexerMapReduce.
> 
> The logical operators used are OR ||. I think the correct logical
> operators should be AND &&. The comment in line 261 says "only have
> inlinks", which also indicates the logical operators should be AND.
> 
> Line 259 to 262 of org.apache.nutch.indexer.IndexerMapReduce are copied below.
> 
>     if (fetchDatum == null || dbDatum == null || parseText == null
> 
>         || parseData == null) {
> 
>       return; // only have inlinks
> 
>     }
> 
> Development team please have a look at the code and determine whether
> it is wrong. Thanks.
> 
> Best,
> Junqiang
>

Re: possibly wrong code in class org.apache.nutch.indexer.IndexerMapReduce , nutch-1.13

Reply via email to