Had problems sending, resending.

On Tue, Sep 23, 2008 at 6:33 PM, Guilherme Menezes <
[EMAIL PROTECTED]> wrote:

> Hi everyone,
>
> Our research group is planning to set up a cluster sufficient to crawl
> around 1 billion single Web pages (estimated Brazilian Web size) for
> academic purposes, maybe using Nutch. We currently have 4 boxes (16GB of
> ram, 6 * 750 GB disks w/ 3 controllers, Quad-Core AMD Opteron processor),
> and we are currently considering to buy more nodes. We have some questions
> right now which some of you may help:
>
> 1) Is it better to buy less powerful nodes in order to have more nodes and
> more parallelism, or is it better to have a smaller number of nodes
> equivalent to the ones we currently have? I guess just 1 disk per controller
> would help. I don't really know also if 16 GB of ram would be necessary. And
> maybe a quad-core wouldn't be necessary too, maybe just a duo-core would be
> sufficient. In your experiences, where would it be better to spend money on?
> Ram, disk, processing, more nodes, everything?
>
> 2) How many nodes would it be necessary to perform a Web crawl of 1 billion
> pages in about 1 month? Have you had any similar experiences? How many did
> you use?
>
> Thank you for any help! We are very interested in understanding Nutch and
> collaborating in the future.
>

Reply via email to