I'm currently contemplating migrating my crawler cluster to EC2 and while this appears very tempting (infinite number of nodes), i've read about some potential limitations in terms of the number of map/red tasks that can effectively run on any instance. Especially for the L/XL instances there doesn't seem to be any swap space set up (by default at least), so that running more than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of RAM and ~ 3G per JVM). As a comparison, my current setup with dedicated blade servers can easily sustain 5 to 10 map/red task per node. I'm basically trying to understand whether this lack of swap space will effectively mean that i need an EC2 cluster with at least 2 to 3 times more instances than i have nodes in my current cluster

Does anyone on the list have some experience in transitioning to EC2 and maybe with respect to this swap issue and/or on how to spec out and EC2 cluster ?



