Hi Otis,

[snip]

If you're planning on parsing the pages (sounds like it) then the m1.small instances are going to take a very long time - their disk I/O and CPU are
pretty low-end.

Yeah, I can imagine! :)
But if your 550M page crawl pulled 21 TB of *raw*(?) data, then I have a feeling
that even 40 large EC2 instances won't have enough storage, right?
Would you recommend 75 of them (63 TB) or 100 of them (84 TB)?

We ran with 50, IIRC - but we used compressed output files, so there was enough space.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to