@Shawn
Any idea why the cache doesn't use roaring bitsets ?

On Thu, Dec 1, 2016 at 3:49 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> > I am using Solr and serving huge number of requests in my application.
> >
> > I need to know how can I utilize caching in Solr.
> >
> > As of now in  then clicking Core Selector → [core name] → Plugins /
> Stats.
> >
> > I am seeing my hit ration as 0 for all the caches. What does this mean
> and
> > how this can be optimized.
>
> If your hitratio is zero, then none of the queries related to that cache
> are finding matches.  This means that your client systems are never
> sending the same query twice.
>
> One possible reason for a zero hitratio is using "NOW" in date queries
> -- NOW changes every millisecond, and the actual timestamp value is what
> ends up in the cache.  This means that the same query with NOW executed
> more than once will actually be different from the cache's perspective.
> The solution is date rounding -- using things like NOW/HOUR or NOW/DAY.
> You could use NOW/MINUTE, but the window for caching would be quite small.
>
> 5000 entries for your filterCache is almost certainly too big.  Each
> filterCache entry tends to be quite large.  If the core has ten million
> documents in it, then each filterCache entry would be 1.25 million bytes
> in size -- the entry is a bitset of all documents in the core.  This
> includes deleted docs that have not yet been reclaimed by merging.  If a
> filterCache for an index that size (which is not all that big) were to
> actually fill up with 5000 entries, it would require over six gigabytes
> of memory just for the cache.
>
> The 1000 that you have on queryResultCache is also rather large, but
> probably not a problem.  There's also documentCache, which generally is
> OK to have sized at several thousand -- I have 16384 on mine.  If your
> documents are particularly large, then you probably would want to have a
> smaller number.
>
> It's good that your autowarmCount values are low.  High values here tend
> to make commits take a very long time.
>
> You do not need to send your message more than once.  The first repeat
> was after less than 40 minutes.  The second was after about two hours.
> Waiting a day or two for a response, particularly for a difficult
> problem, is not unusual for a mailing list.  I begain this reply as soon
> as I saw your message -- about 7:30 AM in my timezone.
>
> Thanks,
> Shawn
>
>

Reply via email to