Thanks for hint. I have changed my script and instead single "nutch crawl" step I use generate->fetch->updatedb->fetch->invertlinks->index commands. I don't use dedup command. Now it seems to be OK, search find out all occurrences. I think nutch removes duplicate pages even they are on different locations. But for me it is important to have information about every occurrence of a term.
Libor Alvaro Cabrerizo wrote: > I recommend you to check you index using luke. Whith luke you can manage > (query, see structure..) your lucene index in order to discover if you > have > a problem during indexation or during the search. > > 2007/1/16, kauu <[EMAIL PROTECTED]>: >> >> so ,u must show us the logs , >> and did u change the nutch-site.xml in the tomcat ? >> >> On 1/16/07, Libor Štefek <[EMAIL PROTECTED]> wrote: >> > >> > Hi, >> > I'm using nutch 0.8.1 to index several thousand text files (source >> code) >> > and I use >> > intranet crawling method to create an index. >> > >> > Everything looks fine, but when I try to search something, it often >> > doesn't find >> > what it should. I'm sure that the term is in several pages, but I got >> > result only >> > for some of them. >> > >> > I tried to set limits in properties like page sizes, number of links >> > etc. but nothing helped. >> > There aren't any error messages in logfile during crawl. >> > >> > Is there any way how to find a reason for this behavior ? >> > How to make nutch more reliable in results? >> > >> > Thanks for any hint. >> > Libor >> > >> > >> >> >> -- >> www.babatu.com >> >> > -- -- Libor Štefek LOGIS, s.r.o. tel. +420 556 841 100 fax. +420 556 841 117 mobil +420 605 228 985 www.logis.cz <http://www.logis.cz/> ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
