Hi Yves,

I'm going to start some test of nutch+solr on EC2 in a couple of days, so I 
will be able to give you some feedback on it soon. 

I'm actually a little concerned about computing speed, rather than ram or disk 
space, because I've experienced a consistent lack of performance in 
cpu-intensive tasks such as compiling large amounts of code.

S
 ---------------------------------- 
"Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't."
Paul Graham


"A mathematician is a device for turning coffee into theorems."
Paul Erdos (who obviously never met a sysadmin)



----- Messaggio originale -----
> Da: Yves Petinot <ypeti...@cs.columbia.edu>
> A: nutch-user@lucene.apache.org
> Inviato: Ven 9 aprile 2010, 16:49:47
> Oggetto: Nutch and EC2
> 
> Hi,

I'm currently contemplating migrating my crawler cluster to EC2 and 
> while this appears very tempting (infinite number of nodes), i've read about 
> some potential limitations in terms of the number of map/red tasks that can 
> effectively run on any instance. Especially for the L/XL instances there 
> doesn't 
> seem to be any swap space set up (by default at least), so that running more 
> than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of RAM 
> and ~ 
> 3G per JVM). As a comparison, my current setup with dedicated blade servers 
> can 
> easily sustain 5 to 10 map/red task per node. I'm basically trying to 
> understand 
> whether this lack of swap space will effectively mean that i need an EC2 
> cluster 
> with at least 2 to 3 times more instances than i have nodes in my current 
> cluster

Does anyone on the list have some experience in transitioning to 
> EC2 and maybe with respect to this swap issue and/or on how to spec out and 
> EC2 
> cluster ?

cheers,

-yp




Reply via email to