Re: How to run a complete crawl?

2009-10-17 Thread Vincent155
@ Dennis: Thanks for clearifying the difference between deep indexing and whole web crawling. I think I have the text document with the url in the urlDir all right. I have been able to run a crawl, but it only fetches some 50 documents. @ Paul: .htaccess file, Options +Indexes, IndexOptions

Re: Nutch Enterprise

2009-10-17 Thread Andrzej Bialecki
Dennis Kubes wrote: Depending on what you are wanting to do Solr may be a better choice as and Enterprise search server. If you are needing crawling you can use Nutch or attach a different crawler to Solr. If you are wanting to do more full web type search, then Nutch is a better option.

nutch for many pages

2009-10-17 Thread Oto Brglez
Hi you all! I'm beginner in crawlers. I want to use Nutch as a system for crawling ~500 online sites. Can i somehow configure Nutch so that it can read targets from the database or some other source? Is Nutch software for this kind of job? I was hoping to use Nutch becouse of Solr and

Re: ERROR datanode.DataNode - DatanodeRegistration ... BlockAlreadyExistsException

2009-10-17 Thread Andrzej Bialecki
Jesse Hires wrote: Does anyone have any insight into the following error I am seeing in the hadoop logs? Is this something I should be concerned with, or is it expected that this shows up in the logs from time to time? If it is not expected, where can I look for more information on what is going

Re: How to run a complete crawl?

2009-10-17 Thread Andrzej Bialecki
Vincent155 wrote: I have a virtual machine running (VMware 1.0.7). Both host and guest run on Fedora 10. In the virtual machine, I have Nutch installed. I can index directories on my host as if they are websites. Now I want to compare Nutch with another search enige. For that, I want to index