My experience on EC2 has been that the RAM and disk space are overkill, while the computing speed is lacking. I had been running my crawler on a 1GB slicehost slice, and when I moved it over to a medium high-cpu instance on EC2 (~2x the cost), the generate and update steps took 50% longer. Right now I'm looking at using rackspace cloud servers instead.
Kevin On Mon, Apr 12, 2010 at 5:37 AM, Stefano Cherchi <stefanocher...@yahoo.it>wrote: > Hi Yves, > > I'm going to start some test of nutch+solr on EC2 in a couple of days, so I > will be able to give you some feedback on it soon. > > I'm actually a little concerned about computing speed, rather than ram or > disk space, because I've experienced a consistent lack of performance in > cpu-intensive tasks such as compiling large amounts of code. > > S > ---------------------------------- > "Anyone proposing to run Windows on servers should be prepared to explain > what they know about servers that Google, Yahoo, and Amazon don't." > Paul Graham > > > "A mathematician is a device for turning coffee into theorems." > Paul Erdos (who obviously never met a sysadmin) > > > > ----- Messaggio originale ----- > > Da: Yves Petinot <ypeti...@cs.columbia.edu> > > A: nutch-user@lucene.apache.org > > Inviato: Ven 9 aprile 2010, 16:49:47 > > Oggetto: Nutch and EC2 > > > > Hi, > > I'm currently contemplating migrating my crawler cluster to EC2 and > > while this appears very tempting (infinite number of nodes), i've read > about > > some potential limitations in terms of the number of map/red tasks that > can > > effectively run on any instance. Especially for the L/XL instances there > doesn't > > seem to be any swap space set up (by default at least), so that running > more > > than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of > RAM and ~ > > 3G per JVM). As a comparison, my current setup with dedicated blade > servers can > > easily sustain 5 to 10 map/red task per node. I'm basically trying to > understand > > whether this lack of swap space will effectively mean that i need an EC2 > cluster > > with at least 2 to 3 times more instances than i have nodes in my current > > cluster > > Does anyone on the list have some experience in transitioning to > > EC2 and maybe with respect to this swap issue and/or on how to spec out > and EC2 > > cluster ? > > cheers, > > -yp > > > > >