On 3/29/2012 2:49 AM, Rafal Gwizdala wrote:
That's bad news.
If 5-7 seconds is not safe then what is the safe interval for updates?
Near real-time is not for me as it works only when querying by document Id
- this doesn't solve anything in my case. I just want the index to be
updated in real-time, 30-40 seconds delay is acceptable but not much more
than that. Is there anything that can be done, or should I start looking
for some other indexing tool?
I'm wondering why there's such terrible performance degradation over time -
SOLR runs fine for first 10-20 hours, updates are extremely fast and then
they become slower and slower until eventually they stop executing at all.
Is there any issue with garbage collection or index fragmentation or some
internal data structures that can't manage their data effectively when
updates are frequent?
You've gotten some replies from experts already. I'm nowhere near their
caliber, but I do have some things to say about my experiences.
When I do a commit, it can take 30 seconds or longer. The bulk of that
time is spent warming the caches. Most of the time it's between 5 and
15 seconds. I have a program that starts updates at the top of every
minute, but won't begin checking time again until the previous update is
done. I've checked things carefully, and it's warming the filter cache
that takes so much time. The crazy thing is that my autoWarmCount for
filterCache is only 4. We have some very very nasty filter queries.
Are you kicking off these every 5-7 second updates even if the previous
update has not finished running? You might be able to make things
better by only doing the current update if the previous update has
finished, which means using the default waitSearcher=true on your
commits. You can try other things - reducing the size of Solr's caches
and reducing the autoWarmCount, possibly to zero.
Garbage collection can definitely be a problem, and that can be
compounded if the machine does not have enough RAM for the OS to keep a
large chunk of your index cached, and/or you have not given enough RAM
to the JVM. As far as garbage collection, I have had good luck with the
following options added to the java commandline. As you can see, I have
an 8GB heap size, which is quite a bit more than my Solr actually
needs. Garbage collection is less of a problem if the JVM has plenty of
memory to work with - though I understand that if it has too much
memory, you start having different problems with GC.
-Xms8192M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
The servers are maxed out at 64GB, and each server is handling three
index cores totaling about 60GB, so I can't quite fit all of my index
into RAM. I wish I had 256GB per server - Solr would perform much better.
You say your server has 4GB of memory, and that Solr is only using
300MB? I would guess that you need to upgrade to 8GB, 16GB, or more if
you can. Then you should give Solr at least 2-3GB of that, leaving the
rest to cache your index. With 5 million records, your index is
probably several gigabytes in size.
Thanks,
Shawn