@ Dennis:  Thanks for clearifying the difference between deep indexing and
whole web crawling. I think I have the text document with the url in the
urlDir all right. I have been able to run a crawl, but it only fetches some
50 documents.

@ Paul: .htaccess file, Options +Indexes, IndexOptions
+SuppressColumnSorting? Yes, I am using Apache (and I have to apologize for
not mentioning that I am using Nutch 0.9). However, this looks a bit scary
for me - I don't have experience with programming in Java and stuff. I
already found myself very clever by using a virtual machine in order to
crawl my local file system.

In the same way I have found a solution. I have placed my 2500 documents per
50 in some 50 directories, and placed them in each other: directory 1
contains 50 documents and directory 2, directory 2 contains 50 documents and
directory 3, etc. Not the most beautiful solution, but it fits my purposes
(running a test to compare two search engines) for the moment.

This way, I have been able to index some 2100 documents, I could figure out
why it stopped there, but for the moment, I am satisfied.
-- 
View this message in context: 
http://www.nabble.com/How-to-run-a-complete-crawl--tp25919860p25936033.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to