Hi Yves, I'm going to start some test of nutch+solr on EC2 in a couple of days, so I will be able to give you some feedback on it soon.
I'm actually a little concerned about computing speed, rather than ram or disk space, because I've experienced a consistent lack of performance in cpu-intensive tasks such as compiling large amounts of code. S ---------------------------------- "Anyone proposing to run Windows on servers should be prepared to explain what they know about servers that Google, Yahoo, and Amazon don't." Paul Graham "A mathematician is a device for turning coffee into theorems." Paul Erdos (who obviously never met a sysadmin) ----- Messaggio originale ----- > Da: Yves Petinot <ypeti...@cs.columbia.edu> > A: nutch-user@lucene.apache.org > Inviato: Ven 9 aprile 2010, 16:49:47 > Oggetto: Nutch and EC2 > > Hi, I'm currently contemplating migrating my crawler cluster to EC2 and > while this appears very tempting (infinite number of nodes), i've read about > some potential limitations in terms of the number of map/red tasks that can > effectively run on any instance. Especially for the L/XL instances there > doesn't > seem to be any swap space set up (by default at least), so that running more > than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of RAM > and ~ > 3G per JVM). As a comparison, my current setup with dedicated blade servers > can > easily sustain 5 to 10 map/red task per node. I'm basically trying to > understand > whether this lack of swap space will effectively mean that i need an EC2 > cluster > with at least 2 to 3 times more instances than i have nodes in my current > cluster Does anyone on the list have some experience in transitioning to > EC2 and maybe with respect to this swap issue and/or on how to spec out and > EC2 > cluster ? cheers, -yp