I wonder if crawling to that depth for that many links you may have no
choice but to set up a hadoop cluster rather than trying to run it on a
single machine.

On Wed, Sep 17, 2008 at 6:30 AM, Edward Quick <[EMAIL PROTECTED]>wrote:

>
> Hi,
>
> I'm running an intranet crawl and have got to the 6th depth which
> apparently has 2.2 million links to fetch. I started off with 100Gb but that
> was barely enough for the fetch not to mention the updatedb step, so I'm
> just trying to find a reliable method for determining how much space is
> required to do the crawl.
>
> Any ideas?
>
> Ed.
>
> _________________________________________________________________
> Win New York holidays with Kellogg's & Live Search
> http://clk.atdmt.com/UKM/go/111354033/direct/01/

Reply via email to