date:20091016

ERROR datanode.DataNode - DatanodeRegistration ... BlockAlreadyExistsException

2009-10-16 Thread Jesse Hires

Does anyone have any insight into the following error I am seeing in the hadoop logs? Is this something I should be concerned with, or is it expected that this shows up in the logs from time to time? If it is not expected, where can I look for more information on what is going on? 2009-10-16 17:02

Re: Nutch Enterprise

2009-10-16 Thread fredericoagent

Thanks for the quick response. I am interested as my company is looking at Googe enterprise search/google appliance and i was wondering whether the nutch software could be a possible option to evaluate. At the moment we will be using Google as a search engine for the intranet for provision of in

Re: Nutch Enterprise

2009-10-16 Thread Dennis Kubes

Depending on what you are wanting to do Solr may be a better choice as and Enterprise search server. If you are needing crawling you can use Nutch or attach a different crawler to Solr. If you are wanting to do more full web type search, then Nutch is a better option. What are your requireme

Nutch Enterprise

2009-10-16 Thread fredericoagent

Does anybody have any information on using Nutch as Enterprise search ?, and what would I need ? is it just a case of the current nutch package or do you need other addons. And how does that compare against Google Enterprise ? thanks -- View this message in context: http://www.nabble.com/Nutch

Re: How to run a complete crawl?

2009-10-16 Thread Paul Tomblin

On Fri, Oct 16, 2009 at 10:19 AM, Dennis Kubes wrote: > Because you are crawling the local files you would either need urls in the > initial urlDir text file or those documents you are crawling would need to > point to the other urls. > Another way to do this is to put the following in the docume

Re: How to run a complete crawl?

2009-10-16 Thread Dennis Kubes

Whole web crawling is about indexing the entire web versus deep indexing of a single site. The urls parameter is the urlDir, a directory which should hold one or more text files with listings of urls to be fetched. The dir parameter is the output directory for the crawls. Because you are cr

ERROR datanode.DataNode - DatanodeRegistration ... BlockAlreadyExistsException

Re: Nutch Enterprise

Re: Nutch Enterprise

Nutch Enterprise

Re: How to run a complete crawl?

Re: How to run a complete crawl?

6 matches

Site Navigation

Mail list logo

Footer information