On 3/29/2012 2:49 AM, Rafal Gwizdala wrote:
That's bad news.
If 5-7 seconds is not safe then what is the safe interval for updates?
Near real-time is not for me as it works only when querying by document Id
- this doesn't solve anything in my case. I just want the index to be
updated in real-time, 30-40 seconds delay is acceptable but not much more
than that. Is there anything that can be done, or should I start looking
for some other indexing tool?
I'm wondering why there's such terrible performance degradation over time -
SOLR runs fine for first 10-20 hours, updates are extremely fast and then
they become slower and slower until eventually they stop executing at all.
Is there any issue with garbage collection or index fragmentation or some
internal data structures that can't manage their data effectively when
updates are frequent?

You've gotten some replies from experts already. I'm nowhere near their caliber, but I do have some things to say about my experiences.

When I do a commit, it can take 30 seconds or longer. The bulk of that time is spent warming the caches. Most of the time it's between 5 and 15 seconds. I have a program that starts updates at the top of every minute, but won't begin checking time again until the previous update is done. I've checked things carefully, and it's warming the filter cache that takes so much time. The crazy thing is that my autoWarmCount for filterCache is only 4. We have some very very nasty filter queries.

Are you kicking off these every 5-7 second updates even if the previous update has not finished running? You might be able to make things better by only doing the current update if the previous update has finished, which means using the default waitSearcher=true on your commits. You can try other things - reducing the size of Solr's caches and reducing the autoWarmCount, possibly to zero.

Garbage collection can definitely be a problem, and that can be compounded if the machine does not have enough RAM for the OS to keep a large chunk of your index cached, and/or you have not given enough RAM to the JVM. As far as garbage collection, I have had good luck with the following options added to the java commandline. As you can see, I have an 8GB heap size, which is quite a bit more than my Solr actually needs. Garbage collection is less of a problem if the JVM has plenty of memory to work with - though I understand that if it has too much memory, you start having different problems with GC.

-Xms8192M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

The servers are maxed out at 64GB, and each server is handling three index cores totaling about 60GB, so I can't quite fit all of my index into RAM. I wish I had 256GB per server - Solr would perform much better.

You say your server has 4GB of memory, and that Solr is only using 300MB? I would guess that you need to upgrade to 8GB, 16GB, or more if you can. Then you should give Solr at least 2-3GB of that, leaving the rest to cache your index. With 5 million records, your index is probably several gigabytes in size.

Thanks,
Shawn

Reply via email to