Sorry, the right link to open an issue is https://issues.apache.org/jira/projects/NUTCH
Thanks, Sebastian On 09/10/2017 12:58 PM, Sebastian Nagel wrote: > Hi Junqiang, > > thanks for the careful code review. > > Well, the answer isn't that trivial. In general, the code is right > because any of the values being null will make the indexing or > scoring filters called later fail with a NPE. > However, the comment is wrong or incomplete: > - "only have inlinks" (if links not yet added to CrawlDb, > then the most common case for sure) > - but there are other possible ways the condition may become true: > * a URL / CrawlDatum removed from CrawlDb > (in combination with a parallelized workflow) > * parsing skipped or failed > (need to check whether this may happen) > > Feel free to open an issue on http://issues.apache.org/jira/NUTCH > to make the code better documented / commented. > > Thanks, > Sebastian > > > On 09/09/2017 01:24 PM, Junqiang Zhang wrote: >> Hello, >> >> I am using nutch version 1.13. There might be mistakenly used logical >> operators in the code from line 259 to 262 of the class >> org.apache.nutch.indexer.IndexerMapReduce. >> >> The logical operators used are OR ||. I think the correct logical >> operators should be AND &&. The comment in line 261 says "only have >> inlinks", which also indicates the logical operators should be AND. >> >> Line 259 to 262 of org.apache.nutch.indexer.IndexerMapReduce are copied >> below. >> >> if (fetchDatum == null || dbDatum == null || parseText == null >> >> || parseData == null) { >> >> return; // only have inlinks >> >> } >> >> Development team please have a look at the code and determine whether >> it is wrong. Thanks. >> >> Best, >> Junqiang >> >