Hi,
I'm currently contemplating migrating my crawler cluster to EC2 and
while this appears very tempting (infinite number of nodes), i've read
about some potential limitations in terms of the number of map/red tasks
that can effectively run on any instance. Especially for the L/XL
instances there doesn't seem to be any swap space set up (by default at
least), so that running more than 2 to 4 tasks per instance may not be
feasible (assuming 8/16 G of RAM and ~ 3G per JVM). As a comparison, my
current setup with dedicated blade servers can easily sustain 5 to 10
map/red task per node. I'm basically trying to understand whether this
lack of swap space will effectively mean that i need an EC2 cluster with
at least 2 to 3 times more instances than i have nodes in my current cluster
Does anyone on the list have some experience in transitioning to EC2 and
maybe with respect to this swap issue and/or on how to spec out and EC2
cluster ?
cheers,
-yp
- Nutch and EC2 Yves Petinot
-