My experience on EC2 has been that the RAM and disk space are overkill,
while the computing speed is lacking.  I had been running my crawler on a
1GB slicehost slice, and when I moved it over to a medium high-cpu instance
on EC2 (~2x the cost), the generate and update steps took 50% longer.  Right
now I'm looking at using rackspace cloud servers instead.

Kevin

On Mon, Apr 12, 2010 at 5:37 AM, Stefano Cherchi <stefanocher...@yahoo.it>wrote:

> Hi Yves,
>
> I'm going to start some test of nutch+solr on EC2 in a couple of days, so I
> will be able to give you some feedback on it soon.
>
> I'm actually a little concerned about computing speed, rather than ram or
> disk space, because I've experienced a consistent lack of performance in
> cpu-intensive tasks such as compiling large amounts of code.
>
> S
>  ----------------------------------
> "Anyone proposing to run Windows on servers should be prepared to explain
> what they know about servers that Google, Yahoo, and Amazon don't."
> Paul Graham
>
>
> "A mathematician is a device for turning coffee into theorems."
> Paul Erdos (who obviously never met a sysadmin)
>
>
>
> ----- Messaggio originale -----
> > Da: Yves Petinot <ypeti...@cs.columbia.edu>
> > A: nutch-user@lucene.apache.org
> > Inviato: Ven 9 aprile 2010, 16:49:47
> > Oggetto: Nutch and EC2
> >
> > Hi,
>
> I'm currently contemplating migrating my crawler cluster to EC2 and
> > while this appears very tempting (infinite number of nodes), i've read
> about
> > some potential limitations in terms of the number of map/red tasks that
> can
> > effectively run on any instance. Especially for the L/XL instances there
> doesn't
> > seem to be any swap space set up (by default at least), so that running
> more
> > than 2 to 4 tasks per instance may not be feasible (assuming 8/16 G of
> RAM and ~
> > 3G per JVM). As a comparison, my current setup with dedicated blade
> servers can
> > easily sustain 5 to 10 map/red task per node. I'm basically trying to
> understand
> > whether this lack of swap space will effectively mean that i need an EC2
> cluster
> > with at least 2 to 3 times more instances than i have nodes in my current
> > cluster
>
> Does anyone on the list have some experience in transitioning to
> > EC2 and maybe with respect to this swap issue and/or on how to spec out
> and EC2
> > cluster ?
>
> cheers,
>
> -yp
>
>
>
>
>

Reply via email to