Re: Solr Pagination

Salman Ansari Sat, 10 Oct 2015 01:56:12 -0700

Thanks Shawn for your response. Based on that
1) Can you please direct me where I can get more information about cold
shard vs hot shard?


2)  That 10GB number assumes there's no other software on the machine, like
a database server or a webserver.
Yes the machine is dedicated for Solr

3) How much index data is on the machine?
I have 3 collections 2 for testing (so the aggregate of both of them does
not exceed 1M document) and the main collection that I am querying now
which contains around 69M. I have distributed all my collections into 2
shards each with 2 replicas. The consumption on the hard disk is about 40GB.

4) A memory size of 14GB would be unusual for a physical machine, and makes me
wonder if you're using virtual machines
Yes I am using virtual machine as using a bare metal will be difficult in
my case as all of our data center is on the cloud. I can increase its
capacity though. While testing some edge cases on Solr, I realized on Solr
admin that the memory sometimes reaches to its limit (14GB RAM, and 4GB JVM)

5) Just to confirm, I have combined the lessons from

http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
AND
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

to come up with the following settings

FilterCache

    <filterCache class="solr.FastLRUCache"
                 size="16384"
                 initialSize="4096"
                 autowarmCount="4096"/>

DocummentCahce

    <documentCache class="solr.LRUCache"
                   size="16384"
                   initialSize="16384"
                   autowarmCount="0"/>

NewSearcher and FirsSearcher

<listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
           <lst><str name="q">*</str><str name="sort">score desc id
desc</str></lst>
      </arr>
    </listener>
    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
<lst> <str name="q">*</str> <str name="sort">score desc id desc</str> </lst>
        <!-- seed common facets and filter queries -->
        <lst> <str name="q">*</str>
              <str name="facet.field">category</str>        </lst>
      </arr>
    </listener>

Will this be using more cache in Solr and prepoupulate it?

Regards,
Salman




On Sat, Oct 10, 2015 at 5:10 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/9/2015 1:39 PM, Salman Ansari wrote:
>
> > INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> > [sabr102_shard1_replica1] webapp=/solr path=/select
> > params={start=0&q=(content_text:Football)&rows=10} hits=24408 status=0
> > QTime=3391
>
> Over 3 seconds for a query like this definitely sounds like there's a
> problem.
>
> > INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> > [sabr102_shard1_replica1] webapp=/solr path=/select
> > params={start=1000&q=(content_text:Football)&rows=10} hits=24408 status=0
> > QTime=21569
>
> Adding a start value of 1000 increases QTime by a factor of more than
> 6?  Even more evidence of a performance problem.
>
> For comparison purposes, I did a couple of simple queries on a large
> index of mine.  Here are the response headers showing the QTime value
> and all the parameters (except my shard URLs) for each query:
>
>   "responseHeader": {
>     "status": 0,
>     "QTime": 1253,
>     "params": {
>       "df": "catchall",
>       "spellcheck.maxCollationEvaluations": "2",
>       "spellcheck.dictionary": "default",
>       "echoParams": "all",
>       "spellcheck.maxCollations": "5",
>       "q.op": "AND",
>       "shards.info": "true",
>       "spellcheck.maxCollationTries": "2",
>       "rows": "70",
>       "spellcheck.extendedResults": "false",
>       "shards": "REDACTED SEVEN SHARD URLS",
>       "shards.tolerant": "true",
>       "spellcheck.onlyMorePopular": "false",
>       "facet.method": "enum",
>       "spellcheck.count": "9",
>       "q": "catchall:carriage",
>       "indent": "true",
>       "wt": "json",
>       "_": "1444420900498"
>     }
>
>
>   "responseHeader": {
>     "status": 0,
>     "QTime": 176,
>     "params": {
>       "df": "catchall",
>       "spellcheck.maxCollationEvaluations": "2",
>       "spellcheck.dictionary": "default",
>       "echoParams": "all",
>       "spellcheck.maxCollations": "5",
>       "q.op": "AND",
>       "shards.info": "true",
>       "spellcheck.maxCollationTries": "2",
>       "rows": "70",
>       "spellcheck.extendedResults": "false",
>       "shards": "REDACTED SEVEN SHARD URLS",
>       "shards.tolerant": "true",
>       "spellcheck.onlyMorePopular": "false",
>       "facet.method": "enum",
>       "spellcheck.count": "9",
>       "q": "catchall:wibble",
>       "indent": "true",
>       "wt": "json",
>       "_": "1444421001024"
>     }
>
> The first query had a numFound of 120906, the second a numFound of 32.
> When I re-executed the first  query (the one with a QTime of 1253) so it
> would use the Solr caches, QTime was 17.
>
> This is an index that has six cold shards with 38.8 million documents
> each and a hot shard with 1.5 million documents.  Total document count
> for the index is over 234 million documents, and the total size of the
> index is about 272GB.  Each copy of the index has its shards split
> between two servers that each have 64GB of RAM, with an 8GB max Java
> heap.  I do not have enough memory to cache all the index contents in
> RAM, but I can get a little less than half of it in the cache -- each
> machine has about 56GB of cache available and contains around 135GB of
> index data.  The index data is stored on a RAID10 array with six SATA
> disks, so it's fairly fast, but nowhere near as fast as SSD.
>
> You've already mentioned the SolrPerformanceProblems wiki page that I
> wrote, which is where I would normally send you for more information.
> You said that your machine has 14GB of RAM and 4GB is allocated to Solr,
> leaving about 10GB for caching.  That 10GB number assumes there's no
> other software on the machine, like a database server or a webserver.
> How much index data is on the machine?  You need to count all the Solr
> cores.  If the "10GB for caching" figure is accurate, then more than
> about 20GB of index data means you might need more memory.  If it's more
> than about 40GB of index data, you definitely need more memory.
>
> A memory size of 14GB would be unusual for a physical machine, and makes
> me wonder if you're using virtual machines.  Bare metal is always going
> to offer better performance than a VM.  Another potential problem with
> VMs is that the host system might have its memory oversubscribed -- the
> total amount of memory in the host machine might be less than the total
> amount of memory allocated to all the running virtual machines.  Solr
> performance will be terrible if VM memory is oversubscribed.
>
> Thanks,
> Shawn
>
>

Re: Solr Pagination

Reply via email to