Do you put the URLs to all 350000 documents in the text file? If yes, you can check logs/hadoop.log to see if any fetch fails.
If not, may be some of the documents are too deep and increasing the depth value while crawling, might solve the problem. Regards, Susam Pal On 3/3/08, Jean-Christophe Alleman <[EMAIL PROTECTED]> wrote: > Hi list ! > > I have a problem while I index, all the documents I want to index are not > indexed... I have about 350 000 documents but Nutch doesn\'t index all > of them ! > > I create a txt file in which I put the URL I want to index, in > crawl-urlfilter.txt I change MYDOMAINAME : I put what I need. > > What goes wrong when I index ? > > Please help ! > > Jisay > > _________________________________________________________________ > Changez votre Live en un clic ! > http://get.live.com
