@ Dennis: Thanks for clearifying the difference between deep indexing and
whole web crawling. I think I have the text document with the url in the
urlDir all right. I have been able to run a crawl, but it only fetches some
50 documents.
@ Paul: .htaccess file, Options +Indexes, IndexOptions
Dennis Kubes wrote:
Depending on what you are wanting to do Solr may be a better choice as
and Enterprise search server. If you are needing crawling you can use
Nutch or attach a different crawler to Solr. If you are wanting to do
more full web type search, then Nutch is a better option.
Hi you all!
I'm beginner in crawlers. I want to use Nutch as a system for crawling
~500 online sites.
Can i somehow configure Nutch so that it can read targets from the
database or some other source?
Is Nutch software for this kind of job? I was hoping to use Nutch
becouse of Solr and
Jesse Hires wrote:
Does anyone have any insight into the following error I am seeing in the
hadoop logs? Is this something I should be concerned with, or is it expected
that this shows up in the logs from time to time? If it is not expected,
where can I look for more information on what is going
Vincent155 wrote:
I have a virtual machine running (VMware 1.0.7). Both host and guest run on
Fedora 10. In the virtual machine, I have Nutch installed. I can index
directories on my host as if they are websites.
Now I want to compare Nutch with another search enige. For that, I want to
index