On Mon, Sep 07, 2009 at 02:33:06PM -0400, Rich Lane wrote: > Xapian keeps writes buffered in memory. Try setting the environment > variable XAPIAN_FLUSH_THRESHOLD to a smaller value (the default is 10000 > documents) and see if that helps.
Thanks--it was hard for me to find that kind of information. I first tried setting XAPIAN_FLUSH_THRESHOLD to 1, and sup-sync ran slowly and just kept getting slower: ## read 139m (about 7%) @ 9.2m/s. 0:00:15 elapsed, about 0:03:21 remaining ... ## read 1238m (about 35%) @ 3.1m/s. 0:06:36 elapsed, about 0:12:08 remaining I stopped at this point because it was taking too long. The memory use seemed stable, but that could have been because it was making such slow progress. I guess xapian gets a lot slower writing as the db grows? That's a bit discouraging. Using ferret, sup-sync only dropped from 28.1m/s to 27.3m/s during its run. For reference, when I didn't set XAPIAN_FLUSH_THRESHOLD, I was getting 35-36m/s until it ran out of memory. I then set XAPIAN_FLUSH_THRESHOLD to 100 and got more reasonable results. It started at 25.6m/s and slowed to 17.8m/s. It stabilized at around 41M virtual memory used and finished successfuly. I also note that the memory use didn't jump during the finish-up phase ("Deleting missing messages") as it had with ferret. Finally, I set XAPIAN_FLUSH_THRESHOLD to 1000. It started at 34.6m/s and dropped to 29.8m/s., stabilized at around 51M virtual memory, and finished successfully. In this case, it stays faster than ferret, but it sill bugs me that xapian still slows down while ferret doesn't. So I conclude... I don't know what I conclude. Letting xapian use a lot of memory sure helps its performance. And a big sup-sync should only have to be done rarely. So maybe just document that those on low-memory systems should consider using XAPIAN_FLUSH_THRESHOLD during sup-sync. Thanks again for your help! Andrew _______________________________________________ sup-talk mailing list sup-talk@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-talk