Hi Yves,

I'm going to start some test of nutch+solr on EC2 in a couple of days, so I 
will be able to give you some feedback on it soon. 

I'm actually a little concerned about computing speed, rather than ram or disk 
space, because I've experienced a consistent lack of performance in 
cpu-intensive tasks such as compiling large amounts of code.

"Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't."
Paul Graham

"A mathematician is a device for turning coffee into theorems."
Paul Erdos (who obviously never met a sysadmin)

> Hi,

I'm currently contemplating migrating my crawler cluster to EC2 and 
> while this appears very tempting (infinite number of nodes), i've read about 
> some potential limitations in terms of the number of map/red tasks that can 
> effectively run on any instance. Especially for the L/XL instances there 
> doesn't 
> seem to be any swap space set up (by default at least), so that running more 
> than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of RAM 
> and ~ 
> 3G per JVM). As a comparison, my current setup with dedicated blade servers 
> can 
> easily sustain 5 to 10 map/red task per node. I'm basically trying to 
> understand 
> whether this lack of swap space will effectively mean that i need an EC2 
> cluster 
> with at least 2 to 3 times more instances than i have nodes in my current 
> cluster

Does anyone on the list have some experience in transitioning to 
> EC2 and maybe with respect to this swap issue and/or on how to spec out and 
> EC2 
> cluster ?



