Re: long QTime for big index

Shawn Heisey Thu, 31 Jan 2013 12:45:14 -0800

On 1/31/2013 12:47 PM, Mou wrote:

To clarify, the third shard is used to store the recently added/updated
data. Two main big cores take very long to replicate ( when a full
replication is required) so the third one helps us to return the newly
indexed documents quickly. It gets deleted every hour after we replicate the
two other cores with last hour's of new/changed data. This third core is
very small.

I use this approach. My entire index is 74 million documents, but allnew data is added to a shard that only contains about 400K documents.The other six shards have over 12 million documents each and take upabout 22GB of disk space. It takes two servers to house one completecopy of my index.

Index updates happen once a minute. Because most delete/reinsertactivity happens on recently added content and all new content getsadded only to the small shard, the large shards can run for many minuteswithout seeing commits.

As you said, with that big index and distributed queries , searches were too
slow.So we tried to use the filtercache to speed up the queries. Filtercache
was big as we have thousands of different filters. other caches were not
very helpful as queries are not repetitive and there is heavy add/update to
the index. So we have to use bigger heap size. Now,with that big heap size
GC pauses was horrible, so we moved to Zing jvm. Zing jvm is now using 134 G
of heap and does not have those big pauses but it also does not leave much
memory for OS.

I am now testing with small heap, small filter cache ( just the basic
filters) and lot of memory available for OS disk cache. If that does not
work, I am thinking of breaking my index down into small pieces.

I hope it works for you! With this approach, the first queries willprobably still be pretty slow, but as the data gets cached, thingsshould speed up.

You can pre-cache the important parts of your index with a command likethe following in the index directory.


cat `ls | egrep -v "(\.fd|\.tv)"` > /dev/null

That command will read all the index files except for the stored fields(.fdx, .fdt) and termvectors (.tvx, .tvd, .tvf). That puts them in theOS disk cache. Before trying that command, you would want to find outhow much disk space those files take to make sure they will all fit inRAM. It is usually a bad idea to schedule this operation in cron.


Thanks,
Shawn

Re: long QTime for big index

Reply via email to