On 9/6/06, Andrei Hajdukewycz <[EMAIL PROTECTED]> wrote: > Another problem I've noticed is that it seems the db grows *rapidly* with > each successive recrawl. Mine started at 379MB, and it seems to increase by > roughly 350MB every time I run a recrawl, despite there not being anywhere > near that many additional pages. > > This seems like a pretty severe problem, honestly, obviously there's a lot of > duplicated data in the segments.
I have the same problem: my index grew from 1.5GB after the original crawl to over 5GB(!) after the recrawl...from the looks of it, I might as well crawl anew every time. :\ t.n.a. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
