ERROR datanode.DataNode - DatanodeRegistration ... BlockAlreadyExistsException

2009-10-16 Thread Jesse Hires
Does anyone have any insight into the following error I am seeing in the hadoop logs? Is this something I should be concerned with, or is it expected that this shows up in the logs from time to time? If it is not expected, where can I look for more information on what is going on? 2009-10-16 17:02

Re: Nutch Enterprise

2009-10-16 Thread fredericoagent
Thanks for the quick response. I am interested as my company is looking at Googe enterprise search/google appliance and i was wondering whether the nutch software could be a possible option to evaluate. At the moment we will be using Google as a search engine for the intranet for provision of in

Re: Nutch Enterprise

2009-10-16 Thread Dennis Kubes
Depending on what you are wanting to do Solr may be a better choice as and Enterprise search server. If you are needing crawling you can use Nutch or attach a different crawler to Solr. If you are wanting to do more full web type search, then Nutch is a better option. What are your requireme

Nutch Enterprise

2009-10-16 Thread fredericoagent
Does anybody have any information on using Nutch as Enterprise search ?, and what would I need ? is it just a case of the current nutch package or do you need other addons. And how does that compare against Google Enterprise ? thanks -- View this message in context: http://www.nabble.com/Nutch

Re: How to run a complete crawl?

2009-10-16 Thread Paul Tomblin
On Fri, Oct 16, 2009 at 10:19 AM, Dennis Kubes wrote: > Because you are crawling the local files you would either need urls in the > initial urlDir text file or those documents you are crawling would need to > point to the other urls. > Another way to do this is to put the following in the docume

Re: How to run a complete crawl?

2009-10-16 Thread Dennis Kubes
Whole web crawling is about indexing the entire web versus deep indexing of a single site. The urls parameter is the urlDir, a directory which should hold one or more text files with listings of urls to be fetched. The dir parameter is the output directory for the crawls. Because you are cr