Fwd: PDF not crawled/indexed

Tolga Tue, 22 May 2012 01:07:05 -0700

Sorry, I forgot to also add my original problem. PDF files are notcrawled. I even modified -topN to be 10.


-------- Original Message --------
Subject:        PDF not crawled/indexed
Date:   Tue, 22 May 2012 10:48:15 +0300
From:   Tolga <[email protected]>
To:     [email protected]




Hi,

I am crawling my website with this command:

bin/nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S) -solr
http://localhost:8983/solr/ -depth 20 -topN 5

Is it a good idea to modify the directory name? Should I always delete
indexes prior to crawling and stick to the same directory name?

Regards,

Fwd: PDF not crawled/indexed

Reply via email to