Peter Sturge, this was a nice hint, thanks again! If you are here in Germany anytime I can invite you to a beer or an apfelschorle ! :-) I only needed to change the lockType to none in the solrconfig.xml, disable the replication and set the data dir to the master data dir!
Regards, Peter Karich. > Hi Peter, > > this scenario would be really great for us - I didn't know that this is > possible and works, so: thanks! > At the moment we are doing similar with replicating to the readonly > instance but > the replication is somewhat lengthy and resource-intensive at this > datavolume ;-) > > Regards, > Peter. > > >> 1. You can run multiple Solr instances in separate JVMs, with both >> having their solr.xml configured to use the same index folder. >> You need to be careful that one and only one of these instances will >> ever update the index at a time. The best way to ensure this is to use >> one for writing only, >> and the other is read-only and never writes to the index. This >> read-only instance is the one to use for tuning for high search >> performance. Even though the RO instance doesn't write to the index, >> it still needs periodic (albeit empty) commits to kick off >> autowarming/cache refresh. >> >> Depending on your needs, you might not need to have 2 separate >> instances. We need it because the 'write' instance is also doing a lot >> of metadata pre-write operations in the same jvm as Solr, and so has >> its own memory requirements. >> >> 2. We use sharding all the time, and it works just fine with this >> scenario, as the RO instance is simply another shard in the pack. >> >> >> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <peat...@yahoo.de> wrote: >> >> >>> Peter, >>> >>> thanks a lot for your in-depth explanations! >>> Your findings will be definitely helpful for my next performance >>> improvement tests :-) >>> >>> Two questions: >>> >>> 1. How would I do that: >>> >>> >>> >>>> or a local read-only instance that reads the same core as the indexing >>>> instance (for the latter, you'll need something that periodically >>>> refreshes - i.e. runs commit()). >>>> >>>> >>> 2. Did you try sharding with your current setup (e.g. one big, >>> nearly-static index and a tiny write+read index)? >>> >>> Regards, >>> Peter. >>> >>> >>> >>>> Hi, >>>> >>>> Below are some notes regarding Solr cache tuning that should prove >>>> useful for anyone who uses Solr with frequent commits (e.g. <5min). >>>> >>>> Environment: >>>> Solr 1.4.1 or branch_3x trunk. >>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>> are likely less relevant to the 4.x environment. >>>> >>>> Overview: >>>> Our Solr environment makes extensive use of faceting, we perform >>>> commits every 30secs, and the indexes tend be on the large-ish side >>>> (>20million docs). >>>> Note: For our data, when we commit, we are always adding new data, >>>> never changing existing data. >>>> This type of environment can be tricky to tune, as Solr is more geared >>>> toward fast reads than frequent writes. >>>> >>>> Symptoms: >>>> If anyone has used faceting in searches where you are also performing >>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>> GC Overhead Exeeded errors. >>>> In high commit rate environments, this is almost always due to >>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>> finish autowarming their caches before the next commit() >>>> comes along and invalidates them. >>>> Once this starts happening on a regular basis, it is likely your >>>> Solr's JVM will run out of memory eventually, as the number of >>>> searchers (and their cache arrays) will keep growing until the JVM >>>> dies of thirst. >>>> To check if your Solr environment is suffering from this, turn on INFO >>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>>> onDeckSearchers=x'. >>>> >>>> In tests, we've only ever seen this problem when using faceting, and >>>> facet.method=fc. >>>> >>>> Some solutions to this are: >>>> Reduce the commit rate to allow searchers to fully warm before the >>>> next commit >>>> Reduce or eliminate the autowarming in caches >>>> Both of the above >>>> >>>> The trouble is, if you're doing NRT commits, you likely have a good >>>> reason for it, and reducing/elimintating autowarming will very >>>> significantly impact search performance in high commit rate >>>> environments. >>>> >>>> Solution: >>>> Here are some setup steps we've used that allow lots of faceting (we >>>> typically search with at least 20-35 different facet fields, and date >>>> faceting/sorting) on large indexes, and still keep decent search >>>> performance: >>>> >>>> 1. Firstly, you should consider using the enum method for facet >>>> searches (facet.method=enum) unless you've got A LOT of memory on your >>>> machine. In our tests, this method uses a lot less memory and >>>> autowarms more quickly than fc. (Note, I've not tried the new >>>> segement-based 'fcs' option, as I can't find support for it in >>>> branch_3x - looks nice for 4.x though) >>>> Admittedly, for our data, enum is not quite as fast for searching as >>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile >>>> tradeoff. >>>> If you do have access to LOTS of memory, AND you can guarantee that >>>> the index won't grow beyond the memory capacity (i.e. you have some >>>> sort of deletion policy in place), fc can be a lot faster than enum >>>> when searching with lots of facets across many terms. >>>> >>>> 2. Secondly, we've found that LRUCache is faster at autowarming than >>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our >>>> environment - your mileage may vary. >>>> >>>> So, our filterCache section in solrconfig.xml looks like this: >>>> <filterCache >>>> class="solr.LRUCache" >>>> size="3600" >>>> initialSize="1400" >>>> autowarmCount="3600"/> >>>> >>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30 >>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size >>>> shows usually in the region of ~2400. >>>> >>>> 3. It's also a good idea to have some sort of >>>> firstSearcher/newSearcher event listener queries to allow new data to >>>> populate the caches. >>>> Of course, what you put in these is dependent on the facets you need/use. >>>> We've found a good combination is a firstSearcher with as many facets >>>> in the search as your environment can handle, then a subset of the >>>> most common facets for the newSearcher. >>>> >>>> 4. We also set: >>>> <useColdSearcher>true</useColdSearcher> >>>> just in case. >>>> >>>> 5. Another key area for search performance with high commits is to use >>>> 2 Solr instances - one for the high commit rate indexing, and one for >>>> searching. >>>> The read-only searching instance can be a remote replica, or a local >>>> read-only instance that reads the same core as the indexing instance >>>> (for the latter, you'll need something that periodically refreshes - >>>> i.e. runs commit()). >>>> This way, you can tune the indexing instance for writing performance >>>> and the searching instance as above for max read performance. >>>> >>>> Using the setup above, we get fantastic searching speed for small >>>> facet sets (well under 1sec), and really good searching for large >>>> facet sets (a couple of secs depending on index size, number of >>>> facets, unique terms etc. etc.), >>>> even when searching against largeish indexes (>20million docs). >>>> We have yet to see any OOM or GC errors using the techniques above, >>>> even in low memory conditions. >>>> >>>> I hope there are people that find this useful. I know I've spent a lot >>>> of time looking for stuff like this, so hopefullly, this will save >>>> someone some time. >>>> >>>> >>>> Peter >>>> >>>>