Re: More Like This and Caching
Hi David, Jason and Otis, Thank you for the feedback on the question. It is very much appreciated. To confirm what caches are being used, I will remove on of the Solr servers from the cluster, restart it, note the status of the various Solr caches, issue some MLT queries to it, and compare the status of the cache against the notes previously taken. I believe this will provide the definitive answer on this. I will reply to this thread with my findings. Kind regards, Giammarco On Fri, May 10, 2013 at 1:14 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: This is correct, doc cache for previously read docs regardless of which query read them and query cache for repeat query. Plus OS cache for actual index files. Otis Solr ElasticSearch Support http://sematext.com/ On May 9, 2013 2:32 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Purely from empirical observation, both the DocumentCache and QueryResultCache are being populated and reused in reloads of a simple MLT search. You can see in the cache inserts how much extra-curricular activity is happening to populate the MLT data by how many inserts and lookups occur on the first load. (lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis) http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score There is no activity in the filterCache, fieldCache, or fieldValueCache - and that makes plenty of sense. On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote: I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was a very extensive discussion on this list not long back titled: Re: SolrCloud loadbalancing, replication, and failover look that thread up and you'll get a lot of in-depth on the topic. David -Original Message- From: Giammarco Schisani [mailto:giamma...@schisani.com] Sent: Thursday, May 09, 2013 2:59 PM To: solr-user@lucene.apache.org Subject: More Like This and Caching Hi all, Could anybody explain which Solr cache (e.g. queryResultCache, documentCache, fieldCache, etc.) can be used by the More Like This handler? One of my colleagues had previously suggested that the More Like This handler does not take advantage of any of the Solr caches. However, if I issue two identical MLT requests to the same Solr instance, the second request will execute much faster than the first request (for example, the first request will execute in 200ms and the second request will execute in 20ms). This makes me believe that at least one of the Solr caches is being used by the More Like This handler. I think the documentCache is the cache that is most likely being used, but would you be able to confirm? As information, I am currently using Solr version 3.6.1. Kind regards, Giammarco Schisani
More Like This and Caching
Hi all, Could anybody explain which Solr cache (e.g. queryResultCache, documentCache, fieldCache, etc.) can be used by the More Like This handler? One of my colleagues had previously suggested that the More Like This handler does not take advantage of any of the Solr caches. However, if I issue two identical MLT requests to the same Solr instance, the second request will execute much faster than the first request (for example, the first request will execute in 200ms and the second request will execute in 20ms). This makes me believe that at least one of the Solr caches is being used by the More Like This handler. I think the documentCache is the cache that is most likely being used, but would you be able to confirm? As information, I am currently using Solr version 3.6.1. Kind regards, Giammarco Schisani
RE: More Like This and Caching
I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was a very extensive discussion on this list not long back titled: Re: SolrCloud loadbalancing, replication, and failover look that thread up and you'll get a lot of in-depth on the topic. David -Original Message- From: Giammarco Schisani [mailto:giamma...@schisani.com] Sent: Thursday, May 09, 2013 2:59 PM To: solr-user@lucene.apache.org Subject: More Like This and Caching Hi all, Could anybody explain which Solr cache (e.g. queryResultCache, documentCache, fieldCache, etc.) can be used by the More Like This handler? One of my colleagues had previously suggested that the More Like This handler does not take advantage of any of the Solr caches. However, if I issue two identical MLT requests to the same Solr instance, the second request will execute much faster than the first request (for example, the first request will execute in 200ms and the second request will execute in 20ms). This makes me believe that at least one of the Solr caches is being used by the More Like This handler. I think the documentCache is the cache that is most likely being used, but would you be able to confirm? As information, I am currently using Solr version 3.6.1. Kind regards, Giammarco Schisani
Re: More Like This and Caching
Purely from empirical observation, both the DocumentCache and QueryResultCache are being populated and reused in reloads of a simple MLT search. You can see in the cache inserts how much extra-curricular activity is happening to populate the MLT data by how many inserts and lookups occur on the first load. (lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis ) http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score There is no activity in the filterCache, fieldCache, or fieldValueCache - and that makes plenty of sense. On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote: I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was a very extensive discussion on this list not long back titled: Re: SolrCloud loadbalancing, replication, and failover look that thread up and you'll get a lot of in-depth on the topic. David -Original Message- From: Giammarco Schisani [mailto:giamma...@schisani.com] Sent: Thursday, May 09, 2013 2:59 PM To: solr-user@lucene.apache.org Subject: More Like This and Caching Hi all, Could anybody explain which Solr cache (e.g. queryResultCache, documentCache, fieldCache, etc.) can be used by the More Like This handler? One of my colleagues had previously suggested that the More Like This handler does not take advantage of any of the Solr caches. However, if I issue two identical MLT requests to the same Solr instance, the second request will execute much faster than the first request (for example, the first request will execute in 200ms and the second request will execute in 20ms). This makes me believe that at least one of the Solr caches is being used by the More Like This handler. I think the documentCache is the cache that is most likely being used, but would you be able to confirm? As information, I am currently using Solr version 3.6.1. Kind regards, Giammarco Schisani
Re: More Like This and Caching
This is correct, doc cache for previously read docs regardless of which query read them and query cache for repeat query. Plus OS cache for actual index files. Otis Solr ElasticSearch Support http://sematext.com/ On May 9, 2013 2:32 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Purely from empirical observation, both the DocumentCache and QueryResultCache are being populated and reused in reloads of a simple MLT search. You can see in the cache inserts how much extra-curricular activity is happening to populate the MLT data by how many inserts and lookups occur on the first load. (lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis) http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score There is no activity in the filterCache, fieldCache, or fieldValueCache - and that makes plenty of sense. On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote: I'm not the expert here, but perhaps what you're noticing is actually the OS's disk cache. The actual solr index isn't cached by solr, but as you read the blocks off disk the OS disk cache probably did cache those blocks for you. On the 2nd run the index blocks were read out of memory. There was a very extensive discussion on this list not long back titled: Re: SolrCloud loadbalancing, replication, and failover look that thread up and you'll get a lot of in-depth on the topic. David -Original Message- From: Giammarco Schisani [mailto:giamma...@schisani.com] Sent: Thursday, May 09, 2013 2:59 PM To: solr-user@lucene.apache.org Subject: More Like This and Caching Hi all, Could anybody explain which Solr cache (e.g. queryResultCache, documentCache, fieldCache, etc.) can be used by the More Like This handler? One of my colleagues had previously suggested that the More Like This handler does not take advantage of any of the Solr caches. However, if I issue two identical MLT requests to the same Solr instance, the second request will execute much faster than the first request (for example, the first request will execute in 200ms and the second request will execute in 20ms). This makes me believe that at least one of the Solr caches is being used by the More Like This handler. I think the documentCache is the cache that is most likely being used, but would you be able to confirm? As information, I am currently using Solr version 3.6.1. Kind regards, Giammarco Schisani