Hi Tim, For what it is worth, behind Trove (http://trove.nla.gov.au/) are 3 SOLR-managed indices and 1 Lucene index. None of ours is as big as one of your shards, and one of our SOLR-managed indices is tiny, but your experiences with long GC pauses are familar to us.
One of the most difficult indices to tune is our bibliographic index of around 38M mostly metadata records which is around 125GB and 97MB tii files. We need to commit updates and reopen the index every 90 seconds, and the facet recalculation (using UnInverted) was taking quite a lot of time, and seemed to generate lots of objects to be collected on each reopening. Although we've been through several rounds of tuning which have seemed to work, at least temporarily, a few months ago we started getting 12 sec "full gc" times every 90 secs, which was no good! We've noticed/did three things: 1) optimise to 1 segment - we'd got to the stage where 50% of the documents had been updated (hence deleted), and the maxdocid was 50% bigger than it needed to be, and hence datastructures whose size was proportional to maxdocid had increased a lot. Optimising to 1 segment greatly reduced full GC frequency and times. 2) for most of our facets, forcing the facets to be filters rather than uninverted happened to work better - but this depends on many factors, and certainly isnt a cure-all for all facets - uninverted often works much better than filters! 3) after lots of benchmarking real updates and queries on a dev system, we came up with this set of JVM parameters that worked "best" for our environment (at the moment!): -Xmx17000M -XX:NewSize=3500M -XX:SurvivorRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC \ -XX:+CMSIncrementalMode I can't say exactly why, except that with this combination of parameters and our data, a much bigger newgen led to less movement of objects to oldgen, and non-full-GC collections on oldgen worked much better. Currently we are seeing less than 10 Full GC's a day, and they almost always take less than 4 seconds. This index is running on an 8 core X5570 machine with 64GB, sharing it with a large/busy mysql instance and the Trove web server. One of our other indices is only updated once per day, but is larger: 33.5M docs representing full text of archived web pages, 246GB, tii file is 36MB. JVM parms are -Xmx10000M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC. It also does less than 10 Full GC's per day, taking less than 5 sec each. Our other large index, newspapers, is a native Lucene index, about 180GB with comparatively large tii of 280MB (probably for the same reason your tii is large - the contents of this database is mostly OCR'ed text). This index is updated/reopened every 3 minutes (to incorporate OCR text corrections and tagging) and we use a bitmap to represent all facet values, which typically take 5 secs to rebuild on each reopen. JVM parms: -mx15000M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Although this JVM usually does fewer than 5 GC's per day, these Full GC's often take 20-30 seconds, and we need to test increasing the Newsize on this JVM to see if we can reduce these pauses. The web archive and newspaper index are running on 8 core X5570 machine with 72GB. We are also running a separate copy/version of this index behind the site http://newspapers.nla.gov.au/ - the main difference is that the Trove version using shingling (inspired by the Hathi Trust results) to improve searches containing common words. This other version is running on a machine with 32GB and 8 X5460 cores and has JVM parms: -mx11500M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Apart from the old newspapers index, all other SOLR/lucene indices are maintained on SSDs (Intel x25m 160GB), which whilst not having anything to do with GCs, work very very well - we couldnt cope with our current query volumes on rotating disk without spending a great deal of money. The old newspaper index is running on a SAN with 24 fast disks backing it, and we can't support the same query rate on it as we can with the other newspaper index on SSDs (even before the shingling change). Kent Fitch Trove development team National Library of Australia