On 9/6/06, Andrei Hajdukewycz <[EMAIL PROTECTED]> wrote:
> Another problem I've noticed is that it seems the db grows *rapidly* with 
> each successive recrawl. Mine started at 379MB, and it seems to increase by 
> roughly 350MB every time I run a recrawl, despite there not being anywhere 
> near that many additional pages.
>
> This seems like a pretty severe problem, honestly, obviously there's a lot of 
> duplicated data in the segments.

I have the same problem: my index grew from 1.5GB after the original
crawl to over 5GB(!) after the recrawl...from the looks of it, I might
as well crawl anew every time. :\

t.n.a.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to