Koch Martina wrote:
Hi,



since a couple of weeks we observe a strange behaviour when indexing/searching 
with the current trunk (we use the trunk of Feb, 4th with some of the major 
patches applied which were released afterwards).



In an index containing only German documents, we find only about 30 documents 
(of 5.000 documents in the index) when searching for common German terms like 
articles (der, die das). We don't do stop-word filtering, so we'd expect to get 
almost all documents returned on such a search.

When using Luke or Limo we see that much more documents contain these terms in 
the content field. That means the terms got indexed, but strangely they are not 
returned on searching.



Did anybody observe something similiar or has an explanation for what goes 
wrong?

This may be caused by some problem in query translation from Nutch query to Lucene query. Please add some logging in LuceneQueryOptimizer to log the Lucene query just before it's submitted to Lucene IndexSearcher. This should already help you to understand what query is really executed at the Lucene level. No matter how many results you get, please run the same query in Luke - you should get exactly the same number of results.




--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to