Hi Otis,
[snip]
If you're planning on parsing the pages (sounds like it) then the
m1.small
instances are going to take a very long time - their disk I/O and
CPU are
pretty low-end.
Yeah, I can imagine! :)
But if your 550M page crawl pulled 21 TB of *raw*(?) data, then I
have a feeling
that even 40 large EC2 instances won't have enough storage, right?
Would you recommend 75 of them (63 TB) or 100 of them (84 TB)?
We ran with 50, IIRC - but we used compressed output files, so there
was enough space.
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g