Hit limiter off-by-one bug
--------------------------
Key: NUTCH-5
URL: http://issues.apache.org/jira/browse/NUTCH-5
Project: Nutch
Type: Bug
Components: searcher
Reporter: Andy Liu
Priority: Minor
When re-searching for more raw hits, the first result of the next site is
skipped.
>From NutchBean.java
*snip*
// get the next raw hit
if (rawHitNum >= hits.getLength()) {
// optimize query by prohibiting more matches on some excluded
sites
Query optQuery = (Query) query.clone();
for (int i = 0; i < excludedSites.size(); i++) {
if (i == MAX_PROHIBITED_TERMS) {
break;
}
optQuery.addProhibitedTerm(((String) excludedSites.get(i)),
IndexSearcher.HIT_LIMIT_FIELD);
}
numHitsRaw = (int) (numHitsRaw * RAW_HITS_FACTOR);
LOG.info("re-searching for " + numHitsRaw +
" raw hits, query: " + optQuery);
hits = searcher.search(optQuery, numHitsRaw);
LOG.info("found " + hits.getTotal() + " raw hits");
rawHitNum = 0;
continue;
}
*snip*
rawHitNum is reset to 0, but the for loop increments it by one and skips the
next result.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers