@ Dennis: Thanks for clearifying the difference between deep indexing and whole web crawling. I think I have the text document with the url in the urlDir all right. I have been able to run a crawl, but it only fetches some 50 documents.
@ Paul: .htaccess file, Options +Indexes, IndexOptions +SuppressColumnSorting? Yes, I am using Apache (and I have to apologize for not mentioning that I am using Nutch 0.9). However, this looks a bit scary for me - I don't have experience with programming in Java and stuff. I already found myself very clever by using a virtual machine in order to crawl my local file system. In the same way I have found a solution. I have placed my 2500 documents per 50 in some 50 directories, and placed them in each other: directory 1 contains 50 documents and directory 2, directory 2 contains 50 documents and directory 3, etc. Not the most beautiful solution, but it fits my purposes (running a test to compare two search engines) for the moment. This way, I have been able to index some 2100 documents, I could figure out why it stopped there, but for the moment, I am satisfied. -- View this message in context: http://www.nabble.com/How-to-run-a-complete-crawl--tp25919860p25936033.html Sent from the Nutch - User mailing list archive at Nabble.com.