Do you put the URLs to all 350000 documents in the text file?

If yes, you can check logs/hadoop.log to see if any fetch fails.

If not, may be some of the documents are too deep and increasing the
depth value while crawling, might solve the problem.

Regards,
Susam Pal

On 3/3/08, Jean-Christophe Alleman <[EMAIL PROTECTED]> wrote:

> Hi list !
>
> I have a problem while I index, all the documents I want to index are not 
> indexed... I have about        350 000 documents but Nutch doesn\'t index all 
> of them !
>
> I create a txt file in which I put the URL I want to index, in 
> crawl-urlfilter.txt I change MYDOMAINAME : I put what I need.
>
> What goes wrong when I index ?
>
> Please help !
>
> Jisay
>
> _________________________________________________________________
> Changez votre Live en un clic !
> http://get.live.com

Reply via email to