Re: Queries regarding solr cache

2017-01-04 Thread Shawn Heisey
On 1/4/2017 3:45 AM, kshitij tyagi wrote:
> Problem:
>
> I am Noticing that my slaves are not able to use proper caching as:
>
> 1. I am indexing on my master and committing frequently, what i am noticing
> is that my slaves are committing very frequently and cache is not being
> build properly and so my hit ratio is almost zero for caching.
>
> 2. What changes I need to make so that the cache builds up properly even
> after commits and cache could be used properly, this is wasting a lot of my
> resources and also slowering up the queries.

Whenever you commit with openSearcher set to true (which is the
default), Solr immediately throws the cache away.  This is by design --
the cache contains internal document IDs from the previous index, due to
merging, the new index might have entirely different ID values for the
same documents.  A commit on the master will cause the slave to copy the
index on its next configured replication interval, and then basically do
a commit of its own to signal that a new searcher is needed.

The caches have a feature called autowarming, which takes the top N
entries in the cache and re-executes the queries that produced the
entries to populate the new cache before the new searcher starts.  If
you set autowarmCount too high, it makes the commits take a really long
time.

If you are committing so frequently that your cache is ineffective, then
you need to commit less frequently.  Whenever you do a commit on the
master, the slave will also do a commit after it copies the new index.

Thanks,
Shawn



Re: Queries regarding solr cache

2017-01-04 Thread kshitij tyagi
Hi Shawn,

Need your help:

I am using master slave architecture in my system and here is the
solrconfig.xml:
   ${enable.master:false} startup commit 00:00:10 managed-schema
   ${enable.slave:false} http://${MASTER_CORE_URL}/${solr.core.name}
${POLL_TIME}  

Problem:

I am Noticing that my slaves are not able to use proper caching as:

1. I am indexing on my master and committing frequently, what i am noticing
is that my slaves are committing very frequently and cache is not being
build properly and so my hit ratio is almost zero for caching.

2. What changes I need to make so that the cache builds up properly even
after commits and cache could be used properly, this is wasting a lot of my
resources and also slowering up the queries.

On Mon, Dec 5, 2016 at 9:06 PM, Shawn Heisey  wrote:

> On 12/5/2016 6:44 AM, kshitij tyagi wrote:
> >   - lookups:381
> >   - hits:24
> >   - hitratio:0.06
> >   - inserts:363
> >   - evictions:0
> >   - size:345
> >   - warmupTime:2932
> >   - cumulative_lookups:294948
> >   - cumulative_hits:15840
> >   - cumulative_hitratio:0.05
> >   - cumulative_inserts:277963
> >   - cumulative_evictions:70078
> >
> >   How can I increase my hit ratio? I am not able to understand solr
> >   caching mechanism clearly. Please help.
>
> This means that out of the nearly 30 queries executed by that
> handler, only five percent (15000) of them were found in the cache.  The
> rest of them were not found in the cache at the moment they were made.
> Since these numbers come from the queryResultCache, this refers to the
> "q" parameter.  The filterCache handles things in the fq parameter.  The
> documentCache holds actual documents from your index and fills in stored
> data in results so the document doesn't have to be fetched from the index.
>
> Possible reasons:  1) Your users are rarely entering the same query more
> than once.  2) Your client code is adding something unique to every
> query (q parameter) so very few of them are the same.  3) You are
> committing so frequently that the cache never has a chance to get large
> enough to make a difference.
>
> Here are some queryResultCache stats from one of my indexes:
>
> class:org.apache.solr.search.FastLRUCache
> version:1.0
> description:Concurrent LRU Cache(maxSize=512, initialSize=512,
> minSize=460, acceptableSize=486, cleanupThread=true,
> autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0)
> src:$URL:
> https:/​/​svn.apache.org/​repos/​asf/​lucene/​dev/​
> branches/​lucene_solr_4_7/​solr/​core/​src/​java/​org/​
> apache/​solr/​search/​FastLRUCache.java
> lookups:   3496
> hits:  3145
> hitratio:  0.9
> inserts:   335
> evictions: 0
> size:  338
> warmupTime: 2209
> cumulative_lookups:   12394606
> cumulative_hits:  11247114
> cumulative_hitratio:  0.91
> cumulative_inserts:   1110375
> cumulative_evictions: 409887
>
> These numbers indicate that 91 percent of the queries made to this
> handler were served from the cache.
>
> Thanks,
> Shawn
>
>


Re: Queries regarding solr cache

2016-12-05 Thread Shawn Heisey
On 12/5/2016 6:44 AM, kshitij tyagi wrote:
>   - lookups:381
>   - hits:24
>   - hitratio:0.06
>   - inserts:363
>   - evictions:0
>   - size:345
>   - warmupTime:2932
>   - cumulative_lookups:294948
>   - cumulative_hits:15840
>   - cumulative_hitratio:0.05
>   - cumulative_inserts:277963
>   - cumulative_evictions:70078
>
>   How can I increase my hit ratio? I am not able to understand solr
>   caching mechanism clearly. Please help.

This means that out of the nearly 30 queries executed by that
handler, only five percent (15000) of them were found in the cache.  The
rest of them were not found in the cache at the moment they were made. 
Since these numbers come from the queryResultCache, this refers to the
"q" parameter.  The filterCache handles things in the fq parameter.  The
documentCache holds actual documents from your index and fills in stored
data in results so the document doesn't have to be fetched from the index.

Possible reasons:  1) Your users are rarely entering the same query more
than once.  2) Your client code is adding something unique to every
query (q parameter) so very few of them are the same.  3) You are
committing so frequently that the cache never has a chance to get large
enough to make a difference.

Here are some queryResultCache stats from one of my indexes:

class:org.apache.solr.search.FastLRUCache
version:1.0
description:Concurrent LRU Cache(maxSize=512, initialSize=512,
minSize=460, acceptableSize=486, cleanupThread=true,
autowarmCount=8,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@1d172ac0)
src:$URL:
https:/​/​svn.apache.org/​repos/​asf/​lucene/​dev/​branches/​lucene_solr_4_7/​solr/​core/​src/​java/​org/​apache/​solr/​search/​FastLRUCache.java
lookups:   3496
hits:  3145
hitratio:  0.9
inserts:   335
evictions: 0
size:  338
warmupTime: 2209
cumulative_lookups:   12394606
cumulative_hits:  11247114
cumulative_hitratio:  0.91
cumulative_inserts:   1110375
cumulative_evictions: 409887

These numbers indicate that 91 percent of the queries made to this
handler were served from the cache.

Thanks,
Shawn



Re: Queries regarding solr cache

2016-12-05 Thread kshitij tyagi
Hi Shawn,

Thanks for the reply:

here are the details for query result cache(i am not using NOW in my
queries and most of the queries are common):


   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=1000, initialSize=1000,
   autowarmCount=10,
   regenerator=org.apache.solr.search.SolrIndexSearcher$3@73380510)
   - src:null
   - stats:
  - lookups:381
  - hits:24
  - hitratio:0.06
  - inserts:363
  - evictions:0
  - size:345
  - warmupTime:2932
  - cumulative_lookups:294948
  - cumulative_hits:15840
  - cumulative_hitratio:0.05
  - cumulative_inserts:277963
  - cumulative_evictions:70078

  How can I increase my hit ratio? I am not able to understand solr
  caching mechanism clearly. Please help.



On Thu, Dec 1, 2016 at 8:19 PM, Shawn Heisey  wrote:

> On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> > I am using Solr and serving huge number of requests in my application.
> >
> > I need to know how can I utilize caching in Solr.
> >
> > As of now in  then clicking Core Selector → [core name] → Plugins /
> Stats.
> >
> > I am seeing my hit ration as 0 for all the caches. What does this mean
> and
> > how this can be optimized.
>
> If your hitratio is zero, then none of the queries related to that cache
> are finding matches.  This means that your client systems are never
> sending the same query twice.
>
> One possible reason for a zero hitratio is using "NOW" in date queries
> -- NOW changes every millisecond, and the actual timestamp value is what
> ends up in the cache.  This means that the same query with NOW executed
> more than once will actually be different from the cache's perspective.
> The solution is date rounding -- using things like NOW/HOUR or NOW/DAY.
> You could use NOW/MINUTE, but the window for caching would be quite small.
>
> 5000 entries for your filterCache is almost certainly too big.  Each
> filterCache entry tends to be quite large.  If the core has ten million
> documents in it, then each filterCache entry would be 1.25 million bytes
> in size -- the entry is a bitset of all documents in the core.  This
> includes deleted docs that have not yet been reclaimed by merging.  If a
> filterCache for an index that size (which is not all that big) were to
> actually fill up with 5000 entries, it would require over six gigabytes
> of memory just for the cache.
>
> The 1000 that you have on queryResultCache is also rather large, but
> probably not a problem.  There's also documentCache, which generally is
> OK to have sized at several thousand -- I have 16384 on mine.  If your
> documents are particularly large, then you probably would want to have a
> smaller number.
>
> It's good that your autowarmCount values are low.  High values here tend
> to make commits take a very long time.
>
> You do not need to send your message more than once.  The first repeat
> was after less than 40 minutes.  The second was after about two hours.
> Waiting a day or two for a response, particularly for a difficult
> problem, is not unusual for a mailing list.  I begain this reply as soon
> as I saw your message -- about 7:30 AM in my timezone.
>
> Thanks,
> Shawn
>
>


Re: Queries regarding solr cache

2016-12-01 Thread Jeff Wartes
I found this, which intends to explore the usage of RoaringDocIdSet for solr: 
https://issues.apache.org/jira/browse/SOLR-9008
This suggests Lucene’s filter cache already uses it, or did at one point: 
https://issues.apache.org/jira/browse/LUCENE-6077

I was playing with id set implementations earlier this year for 
https://issues.apache.org/jira/browse/LUCENE-7211. 
I know I tried a SparseFixedBitSet there, and I think that I observed that 
RoaringDocIdSet existed and tried that too, but I apparently didn’t write 
anything down. My vague recollection is that I couldn’t use it for my use case, 
due to the in-order insertion requirement.



On 12/1/16, 8:10 AM, "Shawn Heisey"  wrote:

On 12/1/2016 8:16 AM, Dorian Hoxha wrote:
> @Shawn
> Any idea why the cache doesn't use roaring bitsets ?

I had to look that up to even know what it was.  Apparently Lucene does
have an implementation of that, a class called RoaringDocIdSet.  It was
incorporated into the source code in October 2014 with this issue:

https://issues.apache.org/jira/browse/LUCENE-5983

As for the reason that it wasn't used for the filterCache, I think
that's because the filterCache existed LONG before that bitset
implementation was available, and when things work well (which describes
the filterCache), devs try not to mess with them too much.

I have mentioned the idea on a recently-filed issue regarding bitset
memory efficiency:

https://issues.apache.org/jira/browse/SOLR-9764

Thanks,
Shawn





Re: Queries regarding solr cache

2016-12-01 Thread Shawn Heisey
On 12/1/2016 8:16 AM, Dorian Hoxha wrote:
> @Shawn
> Any idea why the cache doesn't use roaring bitsets ?

I had to look that up to even know what it was.  Apparently Lucene does
have an implementation of that, a class called RoaringDocIdSet.  It was
incorporated into the source code in October 2014 with this issue:

https://issues.apache.org/jira/browse/LUCENE-5983

As for the reason that it wasn't used for the filterCache, I think
that's because the filterCache existed LONG before that bitset
implementation was available, and when things work well (which describes
the filterCache), devs try not to mess with them too much.

I have mentioned the idea on a recently-filed issue regarding bitset
memory efficiency:

https://issues.apache.org/jira/browse/SOLR-9764

Thanks,
Shawn



Re: Queries regarding solr cache

2016-12-01 Thread Dorian Hoxha
@Shawn
Any idea why the cache doesn't use roaring bitsets ?

On Thu, Dec 1, 2016 at 3:49 PM, Shawn Heisey  wrote:

> On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> > I am using Solr and serving huge number of requests in my application.
> >
> > I need to know how can I utilize caching in Solr.
> >
> > As of now in  then clicking Core Selector → [core name] → Plugins /
> Stats.
> >
> > I am seeing my hit ration as 0 for all the caches. What does this mean
> and
> > how this can be optimized.
>
> If your hitratio is zero, then none of the queries related to that cache
> are finding matches.  This means that your client systems are never
> sending the same query twice.
>
> One possible reason for a zero hitratio is using "NOW" in date queries
> -- NOW changes every millisecond, and the actual timestamp value is what
> ends up in the cache.  This means that the same query with NOW executed
> more than once will actually be different from the cache's perspective.
> The solution is date rounding -- using things like NOW/HOUR or NOW/DAY.
> You could use NOW/MINUTE, but the window for caching would be quite small.
>
> 5000 entries for your filterCache is almost certainly too big.  Each
> filterCache entry tends to be quite large.  If the core has ten million
> documents in it, then each filterCache entry would be 1.25 million bytes
> in size -- the entry is a bitset of all documents in the core.  This
> includes deleted docs that have not yet been reclaimed by merging.  If a
> filterCache for an index that size (which is not all that big) were to
> actually fill up with 5000 entries, it would require over six gigabytes
> of memory just for the cache.
>
> The 1000 that you have on queryResultCache is also rather large, but
> probably not a problem.  There's also documentCache, which generally is
> OK to have sized at several thousand -- I have 16384 on mine.  If your
> documents are particularly large, then you probably would want to have a
> smaller number.
>
> It's good that your autowarmCount values are low.  High values here tend
> to make commits take a very long time.
>
> You do not need to send your message more than once.  The first repeat
> was after less than 40 minutes.  The second was after about two hours.
> Waiting a day or two for a response, particularly for a difficult
> problem, is not unusual for a mailing list.  I begain this reply as soon
> as I saw your message -- about 7:30 AM in my timezone.
>
> Thanks,
> Shawn
>
>


Re: Queries regarding solr cache

2016-12-01 Thread Shawn Heisey
On 12/1/2016 4:04 AM, kshitij tyagi wrote:
> I am using Solr and serving huge number of requests in my application.
>
> I need to know how can I utilize caching in Solr.
>
> As of now in  then clicking Core Selector → [core name] → Plugins / Stats.
>
> I am seeing my hit ration as 0 for all the caches. What does this mean and
> how this can be optimized.

If your hitratio is zero, then none of the queries related to that cache
are finding matches.  This means that your client systems are never
sending the same query twice.

One possible reason for a zero hitratio is using "NOW" in date queries
-- NOW changes every millisecond, and the actual timestamp value is what
ends up in the cache.  This means that the same query with NOW executed
more than once will actually be different from the cache's perspective. 
The solution is date rounding -- using things like NOW/HOUR or NOW/DAY. 
You could use NOW/MINUTE, but the window for caching would be quite small.

5000 entries for your filterCache is almost certainly too big.  Each
filterCache entry tends to be quite large.  If the core has ten million
documents in it, then each filterCache entry would be 1.25 million bytes
in size -- the entry is a bitset of all documents in the core.  This
includes deleted docs that have not yet been reclaimed by merging.  If a
filterCache for an index that size (which is not all that big) were to
actually fill up with 5000 entries, it would require over six gigabytes
of memory just for the cache.

The 1000 that you have on queryResultCache is also rather large, but
probably not a problem.  There's also documentCache, which generally is
OK to have sized at several thousand -- I have 16384 on mine.  If your
documents are particularly large, then you probably would want to have a
smaller number.

It's good that your autowarmCount values are low.  High values here tend
to make commits take a very long time.

You do not need to send your message more than once.  The first repeat
was after less than 40 minutes.  The second was after about two hours. 
Waiting a day or two for a response, particularly for a difficult
problem, is not unusual for a mailing list.  I begain this reply as soon
as I saw your message -- about 7:30 AM in my timezone.

Thanks,
Shawn



Fwd: Queries regarding solr cache

2016-12-01 Thread kshitij tyagi
-- Forwarded message --
From: kshitij tyagi 
Date: Thu, Dec 1, 2016 at 4:34 PM
Subject: Queries regarding solr cache
To: solr-user@lucene.apache.org


Hi All,

I am using Solr and serving huge number of requests in my application.

I need to know how can I utilize caching in Solr.

As of now in  then clicking Core Selector → [core name] → Plugins / Stats.

I am seeing my hit ration as 0 for all the caches. What does this mean and
how this can be optimized.

My current solr configurations are:





Regards,
Kshitij


Queries regarding solr cache

2016-12-01 Thread kshitij tyagi
Hi All,

I am using Solr and serving huge number of requests in my application.

I need to know how can I utilize caching in Solr.

As of now in  then clicking Core Selector → [core name] → Plugins / Stats.

I am seeing my hit ration as 0 for all the caches. What does this mean and
how this can be optimized.

My current solr configurations are:





Regards,
Kshitij