Re: how to debug solr performance degradation
On 2/27/2015 12:51 PM, Tang, Rebecca wrote: > Thank you guys for all the suggestions and help! I'Ve identified the main > culprit with debug=timing. It was the mlt component. After I removed it, > the speed of the query went back to reasonable. Another culprit is the > expand component, but I can't remove it. We've downgraded our amazon > instance to 60G mem with general purpose SSD and the performance is pretty > good. It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :) > > I also added all the suggested JMV parameters. Now I have a gc.log that I > dig into. > > One thing I would like to understand is how memory is managed by solr. > > If I do 'top -u solr', I see something like this: > > Mem: 62920240k total, 62582524k used, 337716k free, 133360k buffers > Swap:0k total,0k used,0k free, 54500892k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 4266 solr 20 0 192g 5.1g 854m S 0.0 8.4 37:09.97 java > > There are two things: > 1) Mem: 62920240k total, 62582524k used. I think this is what the solr > admin "physical memory" bar graph reports on. Can I assume that most of > the mem is used for loading part of the index? > > 2) And then there's the VIRT 192g and RES 5.1g. What is the 5.1 RES > (physical memory) that is used by solr? The "total" and "used" values from top refer to *all* memory in the entire machine, and it does match the "physical memory" graph in the admin UI. If you notice that the "cached" value is 54GB, that's where most of the memory usage is actually happening. This is the OS disk cache -- the OS is automatically using extra memory to cache data on the disk. You are only caching about a third of your index, which may not be enough for good performance, especially with complex queries. The VIRT (virtual) and RES (resident) values are describing how Java is using memory from the OS point of view. The java process has allocated 5.1GB of RAM for the heap and all other memory structures. The VIRT number is the total amount of *address space* (virtual memory, not actual memory) that the process has allocated. For Solr, this will typically be (approximately) the size of all your indexes plus the RES and SHR values. Solr (Lucene) uses the mmap functionality in the operating system for all disk access by default (configurable) -- this means that it maps the file on the disk into virtual memory. This makes it so that a program doesn't need to use disk I/O calls to access the data ... it just pretends that the file is sitting in memory. The operating system takes care of translating those memory reads and writes into disk access. All memory that is not explicitly allocated to a program is automatically used to cache that disk access -- this is the "cached" number from top that I already mentioned. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html http://en.wikipedia.org/wiki/Page_cache Thanks, Shawn
Re: how to debug solr performance degradation
Thank you guys for all the suggestions and help! I'Ve identified the main culprit with debug=timing. It was the mlt component. After I removed it, the speed of the query went back to reasonable. Another culprit is the expand component, but I can't remove it. We've downgraded our amazon instance to 60G mem with general purpose SSD and the performance is pretty good. It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :) I also added all the suggested JMV parameters. Now I have a gc.log that I dig into. One thing I would like to understand is how memory is managed by solr. If I do 'top -u solr', I see something like this: Mem: 62920240k total, 62582524k used, 337716k free, 133360k buffers Swap:0k total,0k used,0k free, 54500892k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4266 solr 20 0 192g 5.1g 854m S 0.0 8.4 37:09.97 java There are two things: 1) Mem: 62920240k total, 62582524k used. I think this is what the solr admin "physical memory" bar graph reports on. Can I assume that most of the mem is used for loading part of the index? 2) And then there's the VIRT 192g and RES 5.1g. What is the 5.1 RES (physical memory) that is used by solr? Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu On 2/25/15 7:57 PM, "Otis Gospodnetic" wrote: >Lots of suggestions here already. +1 for those JVM params from Boogie and >for looking at JMX. >Rebecca, try SPM <http://sematext.com/spm> (will look at JMX for you, >among >other things), it may save you time figuring out >JVM/heap/memory/performance issues. If you can't tell what's slow via >SPM, >we can have a look at your metrics (charts are sharable) and may be able >to >help you faster than guessing. > >Otis >-- >Monitoring * Alerting * Anomaly Detection * Centralized Log Management >Solr & Elasticsearch Support * http://sematext.com/ > > >On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson >wrote: > >> Before diving in too deeply, try attaching &debug=timing to the query. >> Near the bottom of the response there'll be a list of the time taken >> by each _component_. So there'll be separate entries for query, >> highlighting, etc. >> >> This may not show any surprises, you might be spending all your time >> scoring. But it's worth doing as a check and might save you from going >> down some dead-ends. I mean if your query winds up spending 80% of its >> time in the highlighter you know where to start looking.. >> >> Best, >> Erick >> >> >> On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer >> wrote: >> > rebecca, >> > >> > you probably need to dig into your queries, but if you want to >> force/preload the index into memory you could try doing something like >> > >> > cat `find /path/to/solr/index` > /dev/null >> > >> > >> > if you haven't already reviewed the following, you might take a look >>here >> > https://wiki.apache.org/solr/SolrPerformanceProblems >> > >> > perhaps going back to a very vanilla/default solr configuration and >> building back up from that baseline to better isolate what might >>specific >> setting be impacting your environment >> > >> > >> > From: Tang, Rebecca >> > Sent: Wednesday, February 25, 2015 11:44 >> > To: solr-user@lucene.apache.org >> > Subject: RE: how to debug solr performance degradation >> > >> > Sorry, I should have been more specific. >> > >> > I was referring to the solr admin UI page. Today we started up an AWS >> > instance with 240 G of memory to see if we fit all of our index >>(183G) in >> > the memory and have enough for the JMV, could it improve the >>performance. >> > >> > I attached the admin UI screen shot with the email. >> > >> > The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% >>9.52 >> > GB is used. >> > >> > The next bar is Swap Space and it¹s at 0.00 MB. >> > >> > The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. >> > >> > My understanding is that when Solr starts up, it reserves some memory >>for >> > the JVM, and then it tries to use up as much of the remaining physical >> > memory as possible. And I used to see the physical memory at anywhere >> > between 70% to 90+%. Is this understanding correct? >> > >> > And now,
Re: how to debug solr performance degradation
Lots of suggestions here already. +1 for those JVM params from Boogie and for looking at JMX. Rebecca, try SPM <http://sematext.com/spm> (will look at JMX for you, among other things), it may save you time figuring out JVM/heap/memory/performance issues. If you can't tell what's slow via SPM, we can have a look at your metrics (charts are sharable) and may be able to help you faster than guessing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson wrote: > Before diving in too deeply, try attaching &debug=timing to the query. > Near the bottom of the response there'll be a list of the time taken > by each _component_. So there'll be separate entries for query, > highlighting, etc. > > This may not show any surprises, you might be spending all your time > scoring. But it's worth doing as a check and might save you from going > down some dead-ends. I mean if your query winds up spending 80% of its > time in the highlighter you know where to start looking.. > > Best, > Erick > > > On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer > wrote: > > rebecca, > > > > you probably need to dig into your queries, but if you want to > force/preload the index into memory you could try doing something like > > > > cat `find /path/to/solr/index` > /dev/null > > > > > > if you haven't already reviewed the following, you might take a look here > > https://wiki.apache.org/solr/SolrPerformanceProblems > > > > perhaps going back to a very vanilla/default solr configuration and > building back up from that baseline to better isolate what might specific > setting be impacting your environment > > > > ________ > > From: Tang, Rebecca > > Sent: Wednesday, February 25, 2015 11:44 > > To: solr-user@lucene.apache.org > > Subject: RE: how to debug solr performance degradation > > > > Sorry, I should have been more specific. > > > > I was referring to the solr admin UI page. Today we started up an AWS > > instance with 240 G of memory to see if we fit all of our index (183G) in > > the memory and have enough for the JMV, could it improve the performance. > > > > I attached the admin UI screen shot with the email. > > > > The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 > > GB is used. > > > > The next bar is Swap Space and it¹s at 0.00 MB. > > > > The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. > > > > My understanding is that when Solr starts up, it reserves some memory for > > the JVM, and then it tries to use up as much of the remaining physical > > memory as possible. And I used to see the physical memory at anywhere > > between 70% to 90+%. Is this understanding correct? > > > > And now, even with 240G of memory, our index is performing at 10 - 20 > > seconds for a query. Granted that our queries have fq¹s and highlighting > > and faceting, I think with a machine this powerful I should be able to > get > > the queries executed under 5 seconds. > > > > This is what we send to Solr: > > q=(phillip%20morris) > > &wt=json > > &start=0 > > &rows=50 > > &facet=true > > &facet.mincount=0 > > &facet.pivot=industry,collection_facet > > &facet.pivot=availability_facet,availabilitystatus_facet > > &facet.field=dddate > > > &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% > > > 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be > > > gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder > > > %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she > > > et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% > > 22%20OR%20dt%3A%22tab%20sheet%22)) > > &facet.field=dt_facet > > &facet.field=brd_facet > > &facet.field=dg_facet > > &hl=true > > &hl.simple.pre=%3Ch1%3E > > &hl.simple.post=%3C%2Fh1%3E > > &hl.requireFieldMatch=false > > &hl.preserveMulti=true > > &hl.fl=ot,ti > > &f.ot.hl.fragsize=300 > > &f.ot.hl.alternateField=ot > > &f.ot.hl.maxAlternateFieldLength=300 > > &f.ti.hl.fragsize=300 > > &f.ti.hl.alternateField=ti > > &f.ti.hl.maxAlternateFieldLength=300 > > &fq={!collapse%20field=signature} > > &expand=true > > &sort=score+des
Re: how to debug solr performance degradation
Before diving in too deeply, try attaching &debug=timing to the query. Near the bottom of the response there'll be a list of the time taken by each _component_. So there'll be separate entries for query, highlighting, etc. This may not show any surprises, you might be spending all your time scoring. But it's worth doing as a check and might save you from going down some dead-ends. I mean if your query winds up spending 80% of its time in the highlighter you know where to start looking.. Best, Erick On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer wrote: > rebecca, > > you probably need to dig into your queries, but if you want to force/preload > the index into memory you could try doing something like > > cat `find /path/to/solr/index` > /dev/null > > > if you haven't already reviewed the following, you might take a look here > https://wiki.apache.org/solr/SolrPerformanceProblems > > perhaps going back to a very vanilla/default solr configuration and building > back up from that baseline to better isolate what might specific setting be > impacting your environment > > > From: Tang, Rebecca > Sent: Wednesday, February 25, 2015 11:44 > To: solr-user@lucene.apache.org > Subject: RE: how to debug solr performance degradation > > Sorry, I should have been more specific. > > I was referring to the solr admin UI page. Today we started up an AWS > instance with 240 G of memory to see if we fit all of our index (183G) in > the memory and have enough for the JMV, could it improve the performance. > > I attached the admin UI screen shot with the email. > > The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 > GB is used. > > The next bar is Swap Space and it¹s at 0.00 MB. > > The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. > > My understanding is that when Solr starts up, it reserves some memory for > the JVM, and then it tries to use up as much of the remaining physical > memory as possible. And I used to see the physical memory at anywhere > between 70% to 90+%. Is this understanding correct? > > And now, even with 240G of memory, our index is performing at 10 - 20 > seconds for a query. Granted that our queries have fq¹s and highlighting > and faceting, I think with a machine this powerful I should be able to get > the queries executed under 5 seconds. > > This is what we send to Solr: > q=(phillip%20morris) > &wt=json > &start=0 > &rows=50 > &facet=true > &facet.mincount=0 > &facet.pivot=industry,collection_facet > &facet.pivot=availability_facet,availabilitystatus_facet > &facet.field=dddate > &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% > 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be > gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder > %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she > et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% > 22%20OR%20dt%3A%22tab%20sheet%22)) > &facet.field=dt_facet > &facet.field=brd_facet > &facet.field=dg_facet > &hl=true > &hl.simple.pre=%3Ch1%3E > &hl.simple.post=%3C%2Fh1%3E > &hl.requireFieldMatch=false > &hl.preserveMulti=true > &hl.fl=ot,ti > &f.ot.hl.fragsize=300 > &f.ot.hl.alternateField=ot > &f.ot.hl.maxAlternateFieldLength=300 > &f.ti.hl.fragsize=300 > &f.ti.hl.alternateField=ti > &f.ti.hl.maxAlternateFieldLength=300 > &fq={!collapse%20field=signature} > &expand=true > &sort=score+desc,availability_facet+asc > > > My guess is that it¹s performing so badly because it¹s only using 4% of > the memory? And searches require disk access. > > > Rebecca > > From: Shawn Heisey [apa...@elyograg.org] > Sent: Tuesday, February 24, 2015 5:23 PM > To: solr-user@lucene.apache.org > Subject: Re: how to debug solr performance degradation > > On 2/24/2015 5:45 PM, Tang, Rebecca wrote: >> We gave the machine 180G mem to see if it improves performance. However, >> after we increased the memory, Solr started using only 5% of the physical >> memory. It has always used 90-something%. >> >> What could be causing solr to not grab all the physical memory (grabbing >> so little of the physical memory)? > > I would like to know what memory numbers in which program you are > looking at, and why you believe those numbers are a problem. > > The JVM has a very different view of memory than the operating system. > Numbers in "top" mean different things than numbers on the dashboard of > the admin UI, or the numbers in jconsole. If you're on Windows, then > replace "top" with task manager, process explorer, resource monitor, etc. > > Please provide as many details as you can about the things you are > looking at. > > Thanks, > Shawn >
Re: how to debug solr performance degradation
rebecca, you probably need to dig into your queries, but if you want to force/preload the index into memory you could try doing something like cat `find /path/to/solr/index` > /dev/null if you haven't already reviewed the following, you might take a look here https://wiki.apache.org/solr/SolrPerformanceProblems perhaps going back to a very vanilla/default solr configuration and building back up from that baseline to better isolate what might specific setting be impacting your environment From: Tang, Rebecca Sent: Wednesday, February 25, 2015 11:44 To: solr-user@lucene.apache.org Subject: RE: how to debug solr performance degradation Sorry, I should have been more specific. I was referring to the solr admin UI page. Today we started up an AWS instance with 240 G of memory to see if we fit all of our index (183G) in the memory and have enough for the JMV, could it improve the performance. I attached the admin UI screen shot with the email. The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 GB is used. The next bar is Swap Space and it¹s at 0.00 MB. The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. My understanding is that when Solr starts up, it reserves some memory for the JVM, and then it tries to use up as much of the remaining physical memory as possible. And I used to see the physical memory at anywhere between 70% to 90+%. Is this understanding correct? And now, even with 240G of memory, our index is performing at 10 - 20 seconds for a query. Granted that our queries have fq¹s and highlighting and faceting, I think with a machine this powerful I should be able to get the queries executed under 5 seconds. This is what we send to Solr: q=(phillip%20morris) &wt=json &start=0 &rows=50 &facet=true &facet.mincount=0 &facet.pivot=industry,collection_facet &facet.pivot=availability_facet,availabilitystatus_facet &facet.field=dddate &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% 22%20OR%20dt%3A%22tab%20sheet%22)) &facet.field=dt_facet &facet.field=brd_facet &facet.field=dg_facet &hl=true &hl.simple.pre=%3Ch1%3E &hl.simple.post=%3C%2Fh1%3E &hl.requireFieldMatch=false &hl.preserveMulti=true &hl.fl=ot,ti &f.ot.hl.fragsize=300 &f.ot.hl.alternateField=ot &f.ot.hl.maxAlternateFieldLength=300 &f.ti.hl.fragsize=300 &f.ti.hl.alternateField=ti &f.ti.hl.maxAlternateFieldLength=300 &fq={!collapse%20field=signature} &expand=true &sort=score+desc,availability_facet+asc My guess is that it¹s performing so badly because it¹s only using 4% of the memory? And searches require disk access. Rebecca From: Shawn Heisey [apa...@elyograg.org] Sent: Tuesday, February 24, 2015 5:23 PM To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation On 2/24/2015 5:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? I would like to know what memory numbers in which program you are looking at, and why you believe those numbers are a problem. The JVM has a very different view of memory than the operating system. Numbers in "top" mean different things than numbers on the dashboard of the admin UI, or the numbers in jconsole. If you're on Windows, then replace "top" with task manager, process explorer, resource monitor, etc. Please provide as many details as you can about the things you are looking at. Thanks, Shawn
RE: how to debug solr performance degradation
Unfortunately (or luckily, depending on view), attachments does not work with this mailing list. You'll have to upload it somewhere and provide an URL. It is quite hard _not_ to get your whole index into disk cache, so my guess is that it will get there eventually. Just to check: If you re-issue your queries, does the response time change? If not, then disk caching is not the problem. Anyway, with your new information, I would say that pivot faceting is the culprit. Does the timing tests in https://issues.apache.org/jira/browse/SOLR-6803 line up with the cardinalities of your fields? My next step would be to disable parts of the query (highlight, faceting and collapsing one at a time) to check which part is the heaviest. - Toke Eskildsen From: Tang, Rebecca [rebecca.t...@ucsf.edu] Sent: 25 February 2015 20:44 To: solr-user@lucene.apache.org Subject: RE: how to debug solr performance degradation Sorry, I should have been more specific. I was referring to the solr admin UI page. Today we started up an AWS instance with 240 G of memory to see if we fit all of our index (183G) in the memory and have enough for the JMV, could it improve the performance. I attached the admin UI screen shot with the email. The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 GB is used. The next bar is Swap Space and it¹s at 0.00 MB. The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. My understanding is that when Solr starts up, it reserves some memory for the JVM, and then it tries to use up as much of the remaining physical memory as possible. And I used to see the physical memory at anywhere between 70% to 90+%. Is this understanding correct? And now, even with 240G of memory, our index is performing at 10 - 20 seconds for a query. Granted that our queries have fq¹s and highlighting and faceting, I think with a machine this powerful I should be able to get the queries executed under 5 seconds. This is what we send to Solr: q=(phillip%20morris) &wt=json &start=0 &rows=50 &facet=true &facet.mincount=0 &facet.pivot=industry,collection_facet &facet.pivot=availability_facet,availabilitystatus_facet &facet.field=dddate &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% 22%20OR%20dt%3A%22tab%20sheet%22)) &facet.field=dt_facet &facet.field=brd_facet &facet.field=dg_facet &hl=true &hl.simple.pre=%3Ch1%3E &hl.simple.post=%3C%2Fh1%3E &hl.requireFieldMatch=false &hl.preserveMulti=true &hl.fl=ot,ti &f.ot.hl.fragsize=300 &f.ot.hl.alternateField=ot &f.ot.hl.maxAlternateFieldLength=300 &f.ti.hl.fragsize=300 &f.ti.hl.alternateField=ti &f.ti.hl.maxAlternateFieldLength=300 &fq={!collapse%20field=signature} &expand=true &sort=score+desc,availability_facet+asc My guess is that it¹s performing so badly because it¹s only using 4% of the memory? And searches require disk access. Rebecca From: Shawn Heisey [apa...@elyograg.org] Sent: Tuesday, February 24, 2015 5:23 PM To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation On 2/24/2015 5:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? I would like to know what memory numbers in which program you are looking at, and why you believe those numbers are a problem. The JVM has a very different view of memory than the operating system. Numbers in "top" mean different things than numbers on the dashboard of the admin UI, or the numbers in jconsole. If you're on Windows, then replace "top" with task manager, process explorer, resource monitor, etc. Please provide as many details as you can about the things you are looking at. Thanks, Shawn
RE: how to debug solr performance degradation
Sorry, I should have been more specific. I was referring to the solr admin UI page. Today we started up an AWS instance with 240 G of memory to see if we fit all of our index (183G) in the memory and have enough for the JMV, could it improve the performance. I attached the admin UI screen shot with the email. The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 GB is used. The next bar is Swap Space and it¹s at 0.00 MB. The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. My understanding is that when Solr starts up, it reserves some memory for the JVM, and then it tries to use up as much of the remaining physical memory as possible. And I used to see the physical memory at anywhere between 70% to 90+%. Is this understanding correct? And now, even with 240G of memory, our index is performing at 10 - 20 seconds for a query. Granted that our queries have fq¹s and highlighting and faceting, I think with a machine this powerful I should be able to get the queries executed under 5 seconds. This is what we send to Solr: q=(phillip%20morris) &wt=json &start=0 &rows=50 &facet=true &facet.mincount=0 &facet.pivot=industry,collection_facet &facet.pivot=availability_facet,availabilitystatus_facet &facet.field=dddate &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank% 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page% 22%20OR%20dt%3A%22tab%20sheet%22)) &facet.field=dt_facet &facet.field=brd_facet &facet.field=dg_facet &hl=true &hl.simple.pre=%3Ch1%3E &hl.simple.post=%3C%2Fh1%3E &hl.requireFieldMatch=false &hl.preserveMulti=true &hl.fl=ot,ti &f.ot.hl.fragsize=300 &f.ot.hl.alternateField=ot &f.ot.hl.maxAlternateFieldLength=300 &f.ti.hl.fragsize=300 &f.ti.hl.alternateField=ti &f.ti.hl.maxAlternateFieldLength=300 &fq={!collapse%20field=signature} &expand=true &sort=score+desc,availability_facet+asc My guess is that it¹s performing so badly because it¹s only using 4% of the memory? And searches require disk access. Rebecca From: Shawn Heisey [apa...@elyograg.org] Sent: Tuesday, February 24, 2015 5:23 PM To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation On 2/24/2015 5:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? I would like to know what memory numbers in which program you are looking at, and why you believe those numbers are a problem. The JVM has a very different view of memory than the operating system. Numbers in "top" mean different things than numbers on the dashboard of the admin UI, or the numbers in jconsole. If you're on Windows, then replace "top" with task manager, process explorer, resource monitor, etc. Please provide as many details as you can about the things you are looking at. Thanks, Shawn
Re: how to debug solr performance degradation
meant to type "JMX or sflow agent" also should have mentioned you want to be running a very recent JDK From: Boogie Shafer Sent: Tuesday, February 24, 2015 18:03 To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation rebecca, i would suggest making sure you have some gc logging configured so you have some visibility into the JVM, esp if you don't already have JMX for sflow agent configured to give you external visibility of those internal metrics the options below just print out the gc activity to a log -Xloggc:gc.log -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintClassHistogram -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintAdaptiveSizePolicy -XX:+PrintTLAB -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10m on the memory tuning side if things, as has already been mentioned, try to leave as much memory (outside the JVM) available to your OS to cache as much of the actual index as possible in your case, you have a lot of RAM, so i would suggest starting with the gc logging options above, plus these very basic JVM memory settings -XX:+UseG1GC -Xms2G -Xmx4G -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=1000 -XX:GCTimeRatio=19 in short, start by letting the JVM tune itself ;) then start looking at the actual GC behavior (this will be visible in the gc logs) --- on the OS performance monitoring, a few real time tools which i like to use on linux nmon dstat htop for trending start with the basics (sysstat/sar) and build from there (hsflowd is super easy to install and get pushing data up to a central console like ganglia) you can add to that by adding the sflow JVM agent to your solr environment enabling JMX interface on jetty will let you use tools like jconsole or jvisualvm From: François Schiettecatte Sent: Tuesday, February 24, 2015 17:06 To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation Rebecca You don’t want to give all the memory to the JVM. You want to give it just enough for it to work optimally and leave the rest of the memory for the OS to use for caching data. Giving the JVM too much memory can result in worse performance because of GC. There is no magic formula to figuring out the memory allocation for the JVM, that is very dependent on the workload. In your case I would start with 5GB, and increment by 5GB with each run. I also use these settings for the JVM -XX:+UseG1GC -Xms1G -Xmx1G -XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 I got them from this list so can’t take credit for them but they work for me. Cheers François > On Feb 24, 2015, at 7:45 PM, Tang, Rebecca wrote: > > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? > > > Rebecca Tang > Applications Developer, UCSF CKM > Industry Documents Digital Libraries > E: rebecca.t...@ucsf.edu > > > > > > On 2/24/15 12:44 PM, "Shawn Heisey" wrote: > >> On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >>> Our solr index used to perform OK on our beta production box (anywhere >>> between 0-3 seconds to complete any query), but today I noticed that the >>> performance is very bad (queries take between 12 15 seconds). >>> >>> I haven't updated the solr index configuration >>> (schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ >>> every month, I rebuild the solr index from scratch and deploy it to the >>> box. We will eventually go to incremental builds. But for now, all >>> indexes are built from scratch. >>> >>> Here are the stats: >>> Solr index size 183G >>> Documents in index 14364201 >>> We just have single solr box >>> It has 100G memory >>> 500G Harddrive >>> 16 cpus >> >> The bottom line on this problem, and I'm sure it's not something you're >> going to want to hear: You don't have enough memory available to cache >> your index. I'd plan on at least 192GB of RAM for an index this size, >> and 256GB would be better. >> >> Depending on the exact index schema, the nature of your queries, and how >> large your Java heap for Solr is, 100GB of RAM could be enough for good >> performance on an index that size ... or it mi
Re: how to debug solr performance degradation
rebecca, i would suggest making sure you have some gc logging configured so you have some visibility into the JVM, esp if you don't already have JMX for sflow agent configured to give you external visibility of those internal metrics the options below just print out the gc activity to a log -Xloggc:gc.log -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintClassHistogram -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintAdaptiveSizePolicy -XX:+PrintTLAB -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10m on the memory tuning side if things, as has already been mentioned, try to leave as much memory (outside the JVM) available to your OS to cache as much of the actual index as possible in your case, you have a lot of RAM, so i would suggest starting with the gc logging options above, plus these very basic JVM memory settings -XX:+UseG1GC -Xms2G -Xmx4G -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=1000 -XX:GCTimeRatio=19 in short, start by letting the JVM tune itself ;) then start looking at the actual GC behavior (this will be visible in the gc logs) --- on the OS performance monitoring, a few real time tools which i like to use on linux nmon dstat htop for trending start with the basics (sysstat/sar) and build from there (hsflowd is super easy to install and get pushing data up to a central console like ganglia) you can add to that by adding the sflow JVM agent to your solr environment enabling JMX interface on jetty will let you use tools like jconsole or jvisualvm From: François Schiettecatte Sent: Tuesday, February 24, 2015 17:06 To: solr-user@lucene.apache.org Subject: Re: how to debug solr performance degradation Rebecca You don’t want to give all the memory to the JVM. You want to give it just enough for it to work optimally and leave the rest of the memory for the OS to use for caching data. Giving the JVM too much memory can result in worse performance because of GC. There is no magic formula to figuring out the memory allocation for the JVM, that is very dependent on the workload. In your case I would start with 5GB, and increment by 5GB with each run. I also use these settings for the JVM -XX:+UseG1GC -Xms1G -Xmx1G -XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 I got them from this list so can’t take credit for them but they work for me. Cheers François > On Feb 24, 2015, at 7:45 PM, Tang, Rebecca wrote: > > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? > > > Rebecca Tang > Applications Developer, UCSF CKM > Industry Documents Digital Libraries > E: rebecca.t...@ucsf.edu > > > > > > On 2/24/15 12:44 PM, "Shawn Heisey" wrote: > >> On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >>> Our solr index used to perform OK on our beta production box (anywhere >>> between 0-3 seconds to complete any query), but today I noticed that the >>> performance is very bad (queries take between 12 15 seconds). >>> >>> I haven't updated the solr index configuration >>> (schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ >>> every month, I rebuild the solr index from scratch and deploy it to the >>> box. We will eventually go to incremental builds. But for now, all >>> indexes are built from scratch. >>> >>> Here are the stats: >>> Solr index size 183G >>> Documents in index 14364201 >>> We just have single solr box >>> It has 100G memory >>> 500G Harddrive >>> 16 cpus >> >> The bottom line on this problem, and I'm sure it's not something you're >> going to want to hear: You don't have enough memory available to cache >> your index. I'd plan on at least 192GB of RAM for an index this size, >> and 256GB would be better. >> >> Depending on the exact index schema, the nature of your queries, and how >> large your Java heap for Solr is, 100GB of RAM could be enough for good >> performance on an index that size ... or it might be nowhere near >> enough. I would imagine that one of two things is true here, possibly >> both: 1) Your queries are very complex and involve accessing a very >> large percentage of the index data. 2) Your Java heap is enormous, >> leaving very little RAM
Re: how to debug solr performance degradation
On 2/24/2015 5:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? I would like to know what memory numbers in which program you are looking at, and why you believe those numbers are a problem. The JVM has a very different view of memory than the operating system. Numbers in "top" mean different things than numbers on the dashboard of the admin UI, or the numbers in jconsole. If you're on Windows, then replace "top" with task manager, process explorer, resource monitor, etc. Please provide as many details as you can about the things you are looking at. Thanks, Shawn
Re: how to debug solr performance degradation
Rebecca You don’t want to give all the memory to the JVM. You want to give it just enough for it to work optimally and leave the rest of the memory for the OS to use for caching data. Giving the JVM too much memory can result in worse performance because of GC. There is no magic formula to figuring out the memory allocation for the JVM, that is very dependent on the workload. In your case I would start with 5GB, and increment by 5GB with each run. I also use these settings for the JVM -XX:+UseG1GC -Xms1G -Xmx1G -XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 I got them from this list so can’t take credit for them but they work for me. Cheers François > On Feb 24, 2015, at 7:45 PM, Tang, Rebecca wrote: > > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? > > > Rebecca Tang > Applications Developer, UCSF CKM > Industry Documents Digital Libraries > E: rebecca.t...@ucsf.edu > > > > > > On 2/24/15 12:44 PM, "Shawn Heisey" wrote: > >> On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >>> Our solr index used to perform OK on our beta production box (anywhere >>> between 0-3 seconds to complete any query), but today I noticed that the >>> performance is very bad (queries take between 12 15 seconds). >>> >>> I haven't updated the solr index configuration >>> (schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ >>> every month, I rebuild the solr index from scratch and deploy it to the >>> box. We will eventually go to incremental builds. But for now, all >>> indexes are built from scratch. >>> >>> Here are the stats: >>> Solr index size 183G >>> Documents in index 14364201 >>> We just have single solr box >>> It has 100G memory >>> 500G Harddrive >>> 16 cpus >> >> The bottom line on this problem, and I'm sure it's not something you're >> going to want to hear: You don't have enough memory available to cache >> your index. I'd plan on at least 192GB of RAM for an index this size, >> and 256GB would be better. >> >> Depending on the exact index schema, the nature of your queries, and how >> large your Java heap for Solr is, 100GB of RAM could be enough for good >> performance on an index that size ... or it might be nowhere near >> enough. I would imagine that one of two things is true here, possibly >> both: 1) Your queries are very complex and involve accessing a very >> large percentage of the index data. 2) Your Java heap is enormous, >> leaving very little RAM for the OS to automatically cache the index. >> >> Adding more memory to the machine, if that's possible, might fix some of >> the problems. You can find a discussion of the problem here: >> >> http://wiki.apache.org/solr/SolrPerformanceProblems >> >> If you have any questions after reading that wiki article, feel free to >> ask them. >> >> Thanks, >> Shawn >> >
Re: how to debug solr performance degradation
Be careful what you think is being used by Solr since Lucene uses MMapDirectories under the covers, and this means you might be seeing virtual memory. See Uwe's excellent blog here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Tue, Feb 24, 2015 at 5:02 PM, Walter Underwood wrote: > The other memory is used by the OS as file buffers. All the important parts > of the on-disk search index are buffered in memory. When the Solr process > wants a block, it is already right there, no delays for disk access. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Feb 24, 2015, at 4:45 PM, Tang, Rebecca wrote: > >> We gave the machine 180G mem to see if it improves performance. However, >> after we increased the memory, Solr started using only 5% of the physical >> memory. It has always used 90-something%. >> >> What could be causing solr to not grab all the physical memory (grabbing >> so little of the physical memory)? >> >> Rebecca Tang >> Applications Developer, UCSF CKM >> Industry Documents Digital Libraries >> E: rebecca.t...@ucsf.edu >> >> On 2/24/15 12:44 PM, "Shawn Heisey" wrote: >> >>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote: Our solr index used to perform OK on our beta production box (anywhere between 0-3 seconds to complete any query), but today I noticed that the performance is very bad (queries take between 12 15 seconds). I haven't updated the solr index configuration (schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ every month, I rebuild the solr index from scratch and deploy it to the box. We will eventually go to incremental builds. But for now, all indexes are built from scratch. Here are the stats: Solr index size 183G Documents in index 14364201 We just have single solr box It has 100G memory 500G Harddrive 16 cpus >>> >>> The bottom line on this problem, and I'm sure it's not something you're >>> going to want to hear: You don't have enough memory available to cache >>> your index. I'd plan on at least 192GB of RAM for an index this size, >>> and 256GB would be better. >>> >>> Depending on the exact index schema, the nature of your queries, and how >>> large your Java heap for Solr is, 100GB of RAM could be enough for good >>> performance on an index that size ... or it might be nowhere near >>> enough. I would imagine that one of two things is true here, possibly >>> both: 1) Your queries are very complex and involve accessing a very >>> large percentage of the index data. 2) Your Java heap is enormous, >>> leaving very little RAM for the OS to automatically cache the index. >>> >>> Adding more memory to the machine, if that's possible, might fix some of >>> the problems. You can find a discussion of the problem here: >>> >>> http://wiki.apache.org/solr/SolrPerformanceProblems >>> >>> If you have any questions after reading that wiki article, feel free to >>> ask them. >>> >>> Thanks, >>> Shawn >>> >> >
Re: how to debug solr performance degradation
The other memory is used by the OS as file buffers. All the important parts of the on-disk search index are buffered in memory. When the Solr process wants a block, it is already right there, no delays for disk access. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 24, 2015, at 4:45 PM, Tang, Rebecca wrote: > We gave the machine 180G mem to see if it improves performance. However, > after we increased the memory, Solr started using only 5% of the physical > memory. It has always used 90-something%. > > What could be causing solr to not grab all the physical memory (grabbing > so little of the physical memory)? > > Rebecca Tang > Applications Developer, UCSF CKM > Industry Documents Digital Libraries > E: rebecca.t...@ucsf.edu > > On 2/24/15 12:44 PM, "Shawn Heisey" wrote: > >> On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >>> Our solr index used to perform OK on our beta production box (anywhere >>> between 0-3 seconds to complete any query), but today I noticed that the >>> performance is very bad (queries take between 12 15 seconds). >>> >>> I haven't updated the solr index configuration >>> (schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ >>> every month, I rebuild the solr index from scratch and deploy it to the >>> box. We will eventually go to incremental builds. But for now, all >>> indexes are built from scratch. >>> >>> Here are the stats: >>> Solr index size 183G >>> Documents in index 14364201 >>> We just have single solr box >>> It has 100G memory >>> 500G Harddrive >>> 16 cpus >> >> The bottom line on this problem, and I'm sure it's not something you're >> going to want to hear: You don't have enough memory available to cache >> your index. I'd plan on at least 192GB of RAM for an index this size, >> and 256GB would be better. >> >> Depending on the exact index schema, the nature of your queries, and how >> large your Java heap for Solr is, 100GB of RAM could be enough for good >> performance on an index that size ... or it might be nowhere near >> enough. I would imagine that one of two things is true here, possibly >> both: 1) Your queries are very complex and involve accessing a very >> large percentage of the index data. 2) Your Java heap is enormous, >> leaving very little RAM for the OS to automatically cache the index. >> >> Adding more memory to the machine, if that's possible, might fix some of >> the problems. You can find a discussion of the problem here: >> >> http://wiki.apache.org/solr/SolrPerformanceProblems >> >> If you have any questions after reading that wiki article, feel free to >> ask them. >> >> Thanks, >> Shawn >> >
Re: how to debug solr performance degradation
We gave the machine 180G mem to see if it improves performance. However, after we increased the memory, Solr started using only 5% of the physical memory. It has always used 90-something%. What could be causing solr to not grab all the physical memory (grabbing so little of the physical memory)? Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu On 2/24/15 12:44 PM, "Shawn Heisey" wrote: >On 2/24/2015 1:09 PM, Tang, Rebecca wrote: >> Our solr index used to perform OK on our beta production box (anywhere >>between 0-3 seconds to complete any query), but today I noticed that the >>performance is very bad (queries take between 12 15 seconds). >> >> I haven't updated the solr index configuration >>(schema.xml/solrconfig.xml) lately. All that's changed is the data ‹ >>every month, I rebuild the solr index from scratch and deploy it to the >>box. We will eventually go to incremental builds. But for now, all >>indexes are built from scratch. >> >> Here are the stats: >> Solr index size 183G >> Documents in index 14364201 >> We just have single solr box >> It has 100G memory >> 500G Harddrive >> 16 cpus > >The bottom line on this problem, and I'm sure it's not something you're >going to want to hear: You don't have enough memory available to cache >your index. I'd plan on at least 192GB of RAM for an index this size, >and 256GB would be better. > >Depending on the exact index schema, the nature of your queries, and how >large your Java heap for Solr is, 100GB of RAM could be enough for good >performance on an index that size ... or it might be nowhere near >enough. I would imagine that one of two things is true here, possibly >both: 1) Your queries are very complex and involve accessing a very >large percentage of the index data. 2) Your Java heap is enormous, >leaving very little RAM for the OS to automatically cache the index. > >Adding more memory to the machine, if that's possible, might fix some of >the problems. You can find a discussion of the problem here: > >http://wiki.apache.org/solr/SolrPerformanceProblems > >If you have any questions after reading that wiki article, feel free to >ask them. > >Thanks, >Shawn >
Re: how to debug solr performance degradation
On 2/24/2015 1:09 PM, Tang, Rebecca wrote: > Our solr index used to perform OK on our beta production box (anywhere > between 0-3 seconds to complete any query), but today I noticed that the > performance is very bad (queries take between 12 – 15 seconds). > > I haven't updated the solr index configuration (schema.xml/solrconfig.xml) > lately. All that's changed is the data — every month, I rebuild the solr > index from scratch and deploy it to the box. We will eventually go to > incremental builds. But for now, all indexes are built from scratch. > > Here are the stats: > Solr index size 183G > Documents in index 14364201 > We just have single solr box > It has 100G memory > 500G Harddrive > 16 cpus The bottom line on this problem, and I'm sure it's not something you're going to want to hear: You don't have enough memory available to cache your index. I'd plan on at least 192GB of RAM for an index this size, and 256GB would be better. Depending on the exact index schema, the nature of your queries, and how large your Java heap for Solr is, 100GB of RAM could be enough for good performance on an index that size ... or it might be nowhere near enough. I would imagine that one of two things is true here, possibly both: 1) Your queries are very complex and involve accessing a very large percentage of the index data. 2) Your Java heap is enormous, leaving very little RAM for the OS to automatically cache the index. Adding more memory to the machine, if that's possible, might fix some of the problems. You can find a discussion of the problem here: http://wiki.apache.org/solr/SolrPerformanceProblems If you have any questions after reading that wiki article, feel free to ask them. Thanks, Shawn
RE: how to debug solr performance degradation
Tang, Rebecca [rebecca.t...@ucsf.edu] wrote: [12-15 second response time instead of 0-3] > Solr index size 183G > Documents in index 14364201 > We just have single solr box > It has 100G memory > 500G Harddrive > 16 cpus The usual culprit is memory (if you are using spinning drive as your storage). It appears that you have enough raw memory though. Could you check how much memory the machine has free for disk caching? If it is a relative small amount, let's say below 50GB, then please provide a breakdown of what the memory is used for (very large JVM heap for example). > I want to pinpoint where the performance issue is coming from. Could I have > some suggestions/help on how to benchmark/debug solr performance issues. Rough checking of IOWait and CPU load is a fine starting point. If if is CPU load then you can turn on debug in Solr admin, which should tell you where the time is spend resolving the queries. It it is IOWait then ensure a lot of free memory for disk cache and/or improve your storage speed (SSDs instead of spinning drives, local storage instead of remote). - Toke Eskildsen, State and University Library, Denmark.