Hi,

thanks for the hint with Luke. Using Luke I found the problem, but it has 
nothing to do with query translation, but with the boost a document got 
assigned. Most of the documents in the index had boost 0.0 which seems to be 
ignored by default at query evaluation. Using the Luke option "Return all 
matching results, even low scored (unsorted)" on the HitCollector tab of 
Search, I got all documents returned which I'd expect for such a query.

How can I tell Nutch to return low scored documents at standard search using 
the NutchBean class? Is this a configuration property?

Thanks in advance.

Kind regards,
Martina

-----Ursprüngliche Nachricht-----
Von: Andrzej Bialecki [mailto:[email protected]] 
Gesendet: Montag, 23. Februar 2009 13:41
An: [email protected]
Betreff: Re: Indexed terms are not found during search in current trunk

Koch Martina wrote:
> Hi,
> 
> 
> 
> since a couple of weeks we observe a strange behaviour when 
> indexing/searching with the current trunk (we use the trunk of Feb, 4th with 
> some of the major patches applied which were released afterwards).
> 
> 
> 
> In an index containing only German documents, we find only about 30 documents 
> (of 5.000 documents in the index) when searching for common German terms like 
> articles (der, die das). We don't do stop-word filtering, so we'd expect to 
> get almost all documents returned on such a search.
> 
> When using Luke or Limo we see that much more documents contain these terms 
> in the content field. That means the terms got indexed, but strangely they 
> are not returned on searching.
> 
> 
> 
> Did anybody observe something similiar or has an explanation for what goes 
> wrong?

This may be caused by some problem in query translation from Nutch query 
to Lucene query. Please add some logging in LuceneQueryOptimizer to log 
the Lucene query just before it's submitted to Lucene IndexSearcher. 
This should already help you to understand what query is really executed 
at the Lucene level. No matter how many results you get, please run the 
same query in Luke - you should get exactly the same number of results.




-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to