I don't see this with trunk... I just tried TestIndexingPerformance with 1M docs, and it seemed to work fine. Memory use stabilized at 40MB. Most memory use was for indexing (not analysis). char[] topped out at 4.5MB
-Yonik http://www.lucidimagination.com On Tue, Oct 6, 2009 at 12:31 PM, Mark Miller <markrmil...@gmail.com> wrote: > Yeah - I was wondering about that ... not sure how these guys are > stacking up ... > > Yonik Seeley wrote: >> TestIndexingPerformance? >> What the heck... that's not even multi-threaded! >> >> -Yonik >> http://www.lucidimagination.com >> >> >> >> On Tue, Oct 6, 2009 at 12:17 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >>> Darnit - didn't finish that email. This is after running your old short >>> doc perf test for 10,000 iterations. You see the same thing with 1000 >>> iterations but much less pronounced eg gettin' worse with more iterations. >>> >>> Mark Miller wrote: >>> >>>> A little before and after. The before is around may 5th'is - the after >>>> is trunk. >>>> >>>> http://myhardshadow.com/memanalysis/before.png >>>> http://myhardshadow.com/memanalysis/after.png >>>> >>>> Mark Miller wrote: >>>> >>>> >>>>> Took a peak at the checkout around the time he says he's using. >>>>> >>>>> CharTokenizer appears to be holding onto much large char[] arrays now >>>>> than before. Same with snowball.Among - used to be almost nothing, now >>>>> its largio. >>>>> >>>>> The new TokenStream stuff appears to be clinging. Needs to find some >>>>> inner peace.