Re: how to debug solr performance degradation

2015-02-27 Thread Shawn Heisey
On 2/27/2015 12:51 PM, Tang, Rebecca wrote:
 Thank you guys for all the suggestions and help! I'Ve identified the main
 culprit with debug=timing.  It was the mlt component.  After I removed it,
 the speed of the query went back to reasonable.  Another culprit is the
 expand component, but I can't remove it.  We've downgraded our amazon
 instance to 60G mem with general purpose SSD and the performance is pretty
 good.  It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :)

 I also added all the suggested JMV parameters.  Now I have a gc.log that I
 dig into.

 One thing I would like to understand is how memory is managed by solr.

 If I do 'top -u solr', I see something like this:

 Mem:  62920240k total, 62582524k used,   337716k free,   133360k buffers
 Swap:0k total,0k used,0k free, 54500892k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 
  4266 solr  20   0  192g 5.1g 854m S  0.0  8.4  37:09.97 java

 There are two things:
 1) Mem: 62920240k total, 62582524k used. I think this is what the solr
 admin physical memory bar graph reports on.  Can I assume that most of
 the mem is used for loading part of the index?

 2) And then there's the VIRT 192g and RES 5.1g.  What is the 5.1 RES
 (physical memory) that is used by solr?

The total and used values from top refer to *all* memory in the
entire machine, and it does match the physical memory graph in the
admin UI.  If you notice that the cached value is 54GB, that's where
most of the memory usage is actually happening.  This is the OS disk
cache -- the OS is automatically using extra memory to cache data on the
disk.  You are only caching about a third of your index, which may not
be enough for good performance, especially with complex queries.

The VIRT (virtual) and RES (resident) values are describing how Java is
using memory from the OS point of view.  The java process has allocated
5.1GB of RAM for the heap and all other memory structures.  The VIRT
number is the total amount of *address space* (virtual memory, not
actual memory) that the process has allocated.  For Solr, this will
typically be (approximately) the size of all your indexes plus the RES
and SHR values.

Solr (Lucene) uses the mmap functionality in the operating system for
all disk access by default (configurable) -- this means that it maps the
file on the disk into virtual memory.  This makes it so that a program
doesn't need to use disk I/O calls to access the data ... it just
pretends that the file is sitting in memory.  The operating system takes
care of translating those memory reads and writes into disk access.  All
memory that is not explicitly allocated to a program is automatically
used to cache that disk access -- this is the cached number from top
that I already mentioned.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn



Re: how to debug solr performance degradation

2015-02-27 Thread Tang, Rebecca
Thank you guys for all the suggestions and help! I'Ve identified the main
culprit with debug=timing.  It was the mlt component.  After I removed it,
the speed of the query went back to reasonable.  Another culprit is the
expand component, but I can't remove it.  We've downgraded our amazon
instance to 60G mem with general purpose SSD and the performance is pretty
good.  It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :)

I also added all the suggested JMV parameters.  Now I have a gc.log that I
dig into.

One thing I would like to understand is how memory is managed by solr.

If I do 'top -u solr', I see something like this:

Mem:  62920240k total, 62582524k used,   337716k free,   133360k buffers
Swap:0k total,0k used,0k free, 54500892k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 4266 solr  20   0  192g 5.1g 854m S  0.0  8.4  37:09.97 java

There are two things:
1) Mem: 62920240k total, 62582524k used. I think this is what the solr
admin physical memory bar graph reports on.  Can I assume that most of
the mem is used for loading part of the index?

2) And then there's the VIRT 192g and RES 5.1g.  What is the 5.1 RES
(physical memory) that is used by solr?




Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 2/25/15 7:57 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote:

Lots of suggestions here already.  +1 for those JVM params from Boogie and
for looking at JMX.
Rebecca, try SPM http://sematext.com/spm (will look at JMX for you,
among
other things), it may save you time figuring out
JVM/heap/memory/performance issues.  If you can't tell what's slow via
SPM,
we can have a look at your metrics (charts are sharable) and may be able
to
help you faster than guessing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Before diving in too deeply, try attaching debug=timing to the query.
 Near the bottom of the response there'll be a list of the time taken
 by each _component_. So there'll be separate entries for query,
 highlighting, etc.

 This may not show any surprises, you might be spending all your time
 scoring. But it's worth doing as a check and might save you from going
 down some dead-ends. I mean if your query winds up spending 80% of its
 time in the highlighter you know where to start looking..

 Best,
 Erick


 On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
 boogie.sha...@proquest.com wrote:
  rebecca,
 
  you probably need to dig into your queries, but if you want to
 force/preload the index into memory you could try doing something like
 
  cat `find /path/to/solr/index`  /dev/null
 
 
  if you haven't already reviewed the following, you might take a look
here
  https://wiki.apache.org/solr/SolrPerformanceProblems
 
  perhaps going back to a very vanilla/default solr configuration and
 building back up from that baseline to better isolate what might
specific
 setting be impacting your environment
 
  
  From: Tang, Rebecca rebecca.t...@ucsf.edu
  Sent: Wednesday, February 25, 2015 11:44
  To: solr-user@lucene.apache.org
  Subject: RE: how to debug solr performance degradation
 
  Sorry, I should have been more specific.
 
  I was referring to the solr admin UI page. Today we started up an AWS
  instance with 240 G of memory to see if we fit all of our index
(183G) in
  the memory and have enough for the JMV, could it improve the
performance.
 
  I attached the admin UI screen shot with the email.
 
  The top bar is ³Physical Memory² and we have 240.24 GB, but only 4%
9.52
  GB is used.
 
  The next bar is Swap Space and it¹s at 0.00 MB.
 
  The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.
 
  My understanding is that when Solr starts up, it reserves some memory
for
  the JVM, and then it tries to use up as much of the remaining physical
  memory as possible.  And I used to see the physical memory at anywhere
  between 70% to 90+%.  Is this understanding correct?
 
  And now, even with 240G of memory, our index is performing at 10 - 20
  seconds for a query.  Granted that our queries have fq¹s and
highlighting
  and faceting, I think with a machine this powerful I should be able to
 get
  the queries executed under 5 seconds.
 
  This is what we send to Solr:
  q=(phillip%20morris)
  wt=json
  start=0
  rows=50
  facet=true
  facet.mincount=0
  facet.pivot=industry,collection_facet
  facet.pivot=availability_facet,availabilitystatus_facet
  facet.field=dddate
 
 
fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blan
k%
 
 
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20
be
 
 
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20fold
er
 
 
%20end%22%20OR%20dt%3A

Re: how to debug solr performance degradation

2015-02-25 Thread Boogie Shafer
rebecca,

you probably need to dig into your queries, but if you want to force/preload 
the index into memory you could try doing something like

cat `find /path/to/solr/index`  /dev/null


if you haven't already reviewed the following, you might take a look here
https://wiki.apache.org/solr/SolrPerformanceProblems

perhaps going back to a very vanilla/default solr configuration and building 
back up from that baseline to better isolate what might specific setting be 
impacting your environment


From: Tang, Rebecca rebecca.t...@ucsf.edu
Sent: Wednesday, February 25, 2015 11:44
To: solr-user@lucene.apache.org
Subject: RE: how to debug solr performance degradation

Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
wt=json
start=0
rows=50
facet=true
facet.mincount=0
facet.pivot=industry,collection_facet
facet.pivot=availability_facet,availabilitystatus_facet
facet.field=dddate
fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
facet.field=dt_facet
facet.field=brd_facet
facet.field=dg_facet
hl=true
hl.simple.pre=%3Ch1%3E
hl.simple.post=%3C%2Fh1%3E
hl.requireFieldMatch=false
hl.preserveMulti=true
hl.fl=ot,ti
f.ot.hl.fragsize=300
f.ot.hl.alternateField=ot
f.ot.hl.maxAlternateFieldLength=300
f.ti.hl.fragsize=300
f.ti.hl.alternateField=ti
f.ti.hl.maxAlternateFieldLength=300
fq={!collapse%20field=signature}
expand=true
sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in top mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace top with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn



RE: how to debug solr performance degradation

2015-02-25 Thread Toke Eskildsen
Unfortunately (or luckily, depending on view), attachments does not work with 
this mailing list. You'll have to upload it somewhere and provide an URL. It is 
quite hard _not_ to get your whole index into disk cache, so my guess is that 
it will get there eventually. Just to check: If you re-issue your queries, does 
the response time change? If not, then disk caching is not the problem.

Anyway, with your new information, I would say that pivot faceting is the 
culprit. Does the timing tests in 
https://issues.apache.org/jira/browse/SOLR-6803 line up with the cardinalities 
of your fields?

My next step would be to disable parts of the query (highlight, faceting and 
collapsing one at a time) to check which part is the heaviest.

- Toke Eskildsen

From: Tang, Rebecca [rebecca.t...@ucsf.edu]
Sent: 25 February 2015 20:44
To: solr-user@lucene.apache.org
Subject: RE: how to debug solr performance degradation

Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
wt=json
start=0
rows=50
facet=true
facet.mincount=0
facet.pivot=industry,collection_facet
facet.pivot=availability_facet,availabilitystatus_facet
facet.field=dddate
fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
facet.field=dt_facet
facet.field=brd_facet
facet.field=dg_facet
hl=true
hl.simple.pre=%3Ch1%3E
hl.simple.post=%3C%2Fh1%3E
hl.requireFieldMatch=false
hl.preserveMulti=true
hl.fl=ot,ti
f.ot.hl.fragsize=300
f.ot.hl.alternateField=ot
f.ot.hl.maxAlternateFieldLength=300
f.ti.hl.fragsize=300
f.ti.hl.alternateField=ti
f.ti.hl.maxAlternateFieldLength=300
fq={!collapse%20field=signature}
expand=true
sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in top mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace top with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn




RE: how to debug solr performance degradation

2015-02-25 Thread Tang, Rebecca
Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
wt=json
start=0
rows=50
facet=true
facet.mincount=0
facet.pivot=industry,collection_facet
facet.pivot=availability_facet,availabilitystatus_facet
facet.field=dddate
fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
facet.field=dt_facet
facet.field=brd_facet
facet.field=dg_facet
hl=true
hl.simple.pre=%3Ch1%3E
hl.simple.post=%3C%2Fh1%3E
hl.requireFieldMatch=false
hl.preserveMulti=true
hl.fl=ot,ti
f.ot.hl.fragsize=300
f.ot.hl.alternateField=ot
f.ot.hl.maxAlternateFieldLength=300
f.ti.hl.fragsize=300
f.ti.hl.alternateField=ti
f.ti.hl.maxAlternateFieldLength=300
fq={!collapse%20field=signature}
expand=true
sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in top mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace top with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn




Re: how to debug solr performance degradation

2015-02-25 Thread Erick Erickson
Before diving in too deeply, try attaching debug=timing to the query.
Near the bottom of the response there'll be a list of the time taken
by each _component_. So there'll be separate entries for query,
highlighting, etc.

This may not show any surprises, you might be spending all your time
scoring. But it's worth doing as a check and might save you from going
down some dead-ends. I mean if your query winds up spending 80% of its
time in the highlighter you know where to start looking..

Best,
Erick


On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
boogie.sha...@proquest.com wrote:
 rebecca,

 you probably need to dig into your queries, but if you want to force/preload 
 the index into memory you could try doing something like

 cat `find /path/to/solr/index`  /dev/null


 if you haven't already reviewed the following, you might take a look here
 https://wiki.apache.org/solr/SolrPerformanceProblems

 perhaps going back to a very vanilla/default solr configuration and building 
 back up from that baseline to better isolate what might specific setting be 
 impacting your environment

 
 From: Tang, Rebecca rebecca.t...@ucsf.edu
 Sent: Wednesday, February 25, 2015 11:44
 To: solr-user@lucene.apache.org
 Subject: RE: how to debug solr performance degradation

 Sorry, I should have been more specific.

 I was referring to the solr admin UI page. Today we started up an AWS
 instance with 240 G of memory to see if we fit all of our index (183G) in
 the memory and have enough for the JMV, could it improve the performance.

 I attached the admin UI screen shot with the email.

 The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
 GB is used.

 The next bar is Swap Space and it¹s at 0.00 MB.

 The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

 My understanding is that when Solr starts up, it reserves some memory for
 the JVM, and then it tries to use up as much of the remaining physical
 memory as possible.  And I used to see the physical memory at anywhere
 between 70% to 90+%.  Is this understanding correct?

 And now, even with 240G of memory, our index is performing at 10 - 20
 seconds for a query.  Granted that our queries have fq¹s and highlighting
 and faceting, I think with a machine this powerful I should be able to get
 the queries executed under 5 seconds.

 This is what we send to Solr:
 q=(phillip%20morris)
 wt=json
 start=0
 rows=50
 facet=true
 facet.mincount=0
 facet.pivot=industry,collection_facet
 facet.pivot=availability_facet,availabilitystatus_facet
 facet.field=dddate
 fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
 gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
 %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
 et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
 22%20OR%20dt%3A%22tab%20sheet%22))
 facet.field=dt_facet
 facet.field=brd_facet
 facet.field=dg_facet
 hl=true
 hl.simple.pre=%3Ch1%3E
 hl.simple.post=%3C%2Fh1%3E
 hl.requireFieldMatch=false
 hl.preserveMulti=true
 hl.fl=ot,ti
 f.ot.hl.fragsize=300
 f.ot.hl.alternateField=ot
 f.ot.hl.maxAlternateFieldLength=300
 f.ti.hl.fragsize=300
 f.ti.hl.alternateField=ti
 f.ti.hl.maxAlternateFieldLength=300
 fq={!collapse%20field=signature}
 expand=true
 sort=score+desc,availability_facet+asc


 My guess is that it¹s performing so badly because it¹s only using 4% of
 the memory? And searches require disk access.


 Rebecca
 
 From: Shawn Heisey [apa...@elyograg.org]
 Sent: Tuesday, February 24, 2015 5:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: how to debug solr performance degradation

 On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

 I would like to know what memory numbers in which program you are
 looking at, and why you believe those numbers are a problem.

 The JVM has a very different view of memory than the operating system.
 Numbers in top mean different things than numbers on the dashboard of
 the admin UI, or the numbers in jconsole.  If you're on Windows, then
 replace top with task manager, process explorer, resource monitor, etc.

 Please provide as many details as you can about the things you are
 looking at.

 Thanks,
 Shawn



Re: how to debug solr performance degradation

2015-02-25 Thread Otis Gospodnetic
Lots of suggestions here already.  +1 for those JVM params from Boogie and
for looking at JMX.
Rebecca, try SPM http://sematext.com/spm (will look at JMX for you, among
other things), it may save you time figuring out
JVM/heap/memory/performance issues.  If you can't tell what's slow via SPM,
we can have a look at your metrics (charts are sharable) and may be able to
help you faster than guessing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Before diving in too deeply, try attaching debug=timing to the query.
 Near the bottom of the response there'll be a list of the time taken
 by each _component_. So there'll be separate entries for query,
 highlighting, etc.

 This may not show any surprises, you might be spending all your time
 scoring. But it's worth doing as a check and might save you from going
 down some dead-ends. I mean if your query winds up spending 80% of its
 time in the highlighter you know where to start looking..

 Best,
 Erick


 On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
 boogie.sha...@proquest.com wrote:
  rebecca,
 
  you probably need to dig into your queries, but if you want to
 force/preload the index into memory you could try doing something like
 
  cat `find /path/to/solr/index`  /dev/null
 
 
  if you haven't already reviewed the following, you might take a look here
  https://wiki.apache.org/solr/SolrPerformanceProblems
 
  perhaps going back to a very vanilla/default solr configuration and
 building back up from that baseline to better isolate what might specific
 setting be impacting your environment
 
  
  From: Tang, Rebecca rebecca.t...@ucsf.edu
  Sent: Wednesday, February 25, 2015 11:44
  To: solr-user@lucene.apache.org
  Subject: RE: how to debug solr performance degradation
 
  Sorry, I should have been more specific.
 
  I was referring to the solr admin UI page. Today we started up an AWS
  instance with 240 G of memory to see if we fit all of our index (183G) in
  the memory and have enough for the JMV, could it improve the performance.
 
  I attached the admin UI screen shot with the email.
 
  The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
  GB is used.
 
  The next bar is Swap Space and it¹s at 0.00 MB.
 
  The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.
 
  My understanding is that when Solr starts up, it reserves some memory for
  the JVM, and then it tries to use up as much of the remaining physical
  memory as possible.  And I used to see the physical memory at anywhere
  between 70% to 90+%.  Is this understanding correct?
 
  And now, even with 240G of memory, our index is performing at 10 - 20
  seconds for a query.  Granted that our queries have fq¹s and highlighting
  and faceting, I think with a machine this powerful I should be able to
 get
  the queries executed under 5 seconds.
 
  This is what we send to Solr:
  q=(phillip%20morris)
  wt=json
  start=0
  rows=50
  facet=true
  facet.mincount=0
  facet.pivot=industry,collection_facet
  facet.pivot=availability_facet,availabilitystatus_facet
  facet.field=dddate
 
 fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
 
 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
 
 gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
 
 %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
 
 et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
  22%20OR%20dt%3A%22tab%20sheet%22))
  facet.field=dt_facet
  facet.field=brd_facet
  facet.field=dg_facet
  hl=true
  hl.simple.pre=%3Ch1%3E
  hl.simple.post=%3C%2Fh1%3E
  hl.requireFieldMatch=false
  hl.preserveMulti=true
  hl.fl=ot,ti
  f.ot.hl.fragsize=300
  f.ot.hl.alternateField=ot
  f.ot.hl.maxAlternateFieldLength=300
  f.ti.hl.fragsize=300
  f.ti.hl.alternateField=ti
  f.ti.hl.maxAlternateFieldLength=300
  fq={!collapse%20field=signature}
  expand=true
  sort=score+desc,availability_facet+asc
 
 
  My guess is that it¹s performing so badly because it¹s only using 4% of
  the memory? And searches require disk access.
 
 
  Rebecca
  
  From: Shawn Heisey [apa...@elyograg.org]
  Sent: Tuesday, February 24, 2015 5:23 PM
  To: solr-user@lucene.apache.org
  Subject: Re: how to debug solr performance degradation
 
  On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
  We gave the machine 180G mem to see if it improves performance.
 However,
  after we increased the memory, Solr started using only 5% of the
 physical
  memory.  It has always used 90-something%.
 
  What could be causing solr to not grab all the physical memory (grabbing
  so little of the physical memory)?
 
  I would like to know what memory numbers in which program you are
  looking at, and why you

RE: how to debug solr performance degradation

2015-02-24 Thread Toke Eskildsen
Tang, Rebecca [rebecca.t...@ucsf.edu] wrote:
[12-15 second response time instead of 0-3]
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

The usual culprit is memory (if you are using spinning drive as your storage). 
It appears that you have enough raw memory though. Could you check how much 
memory the machine has free for disk caching? If it is a relative small amount, 
let's say below 50GB, then please provide a breakdown of what the memory is 
used for (very large JVM heap for example).

 I want to pinpoint where the performance issue is coming from.  Could I have 
 some suggestions/help on how to benchmark/debug solr performance issues.

Rough checking of IOWait and CPU load is a fine starting point. If if is CPU 
load then you can turn on debug in Solr admin, which should tell you where the 
time is spend resolving the queries. It it is IOWait then ensure a lot of free 
memory for disk cache and/or improve your storage speed (SSDs instead of 
spinning drives, local storage instead of remote).

- Toke Eskildsen, State and University Library, Denmark.


Re: how to debug solr performance degradation

2015-02-24 Thread Shawn Heisey
On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere 
 between 0-3 seconds to complete any query), but today I noticed that the 
 performance is very bad (queries take between 12 – 15 seconds).

 I haven't updated the solr index configuration (schema.xml/solrconfig.xml) 
 lately.  All that's changed is the data — every month, I rebuild the solr 
 index from scratch and deploy it to the box.  We will eventually go to 
 incremental builds. But for now, all indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

The bottom line on this problem, and I'm sure it's not something you're
going to want to hear:  You don't have enough memory available to cache
your index.  I'd plan on at least 192GB of RAM for an index this size,
and 256GB would be better.

Depending on the exact index schema, the nature of your queries, and how
large your Java heap for Solr is, 100GB of RAM could be enough for good
performance on an index that size ... or it might be nowhere near
enough.  I would imagine that one of two things is true here, possibly
both:  1) Your queries are very complex and involve accessing a very
large percentage of the index data.  2) Your Java heap is enormous,
leaving very little RAM for the OS to automatically cache the index.

Adding more memory to the machine, if that's possible, might fix some of
the problems.  You can find a discussion of the problem here:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you have any questions after reading that wiki article, feel free to
ask them.

Thanks,
Shawn



Re: how to debug solr performance degradation

2015-02-24 Thread Shawn Heisey
On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system. 
Numbers in top mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace top with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn



Re: how to debug solr performance degradation

2015-02-24 Thread Erick Erickson
Be careful what you think is being used by Solr since Lucene uses
MMapDirectories under the covers, and this means you might be seeing
virtual memory. See Uwe's excellent blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Tue, Feb 24, 2015 at 5:02 PM, Walter Underwood wun...@wunderwood.org wrote:
 The other memory is used by the OS as file buffers. All the important parts 
 of the on-disk search index are buffered in memory. When the Solr process 
 wants a block, it is already right there, no delays for disk access.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Feb 24, 2015, at 4:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:

 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?

 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu

 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).

 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.

 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.

 Adding more memory to the machine, if that's possible, might fix some of
 the problems.  You can find a discussion of the problem here:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 If you have any questions after reading that wiki article, feel free to
 ask them.

 Thanks,
 Shawn





Re: how to debug solr performance degradation

2015-02-24 Thread François Schiettecatte
Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


 On Feb 24, 2015, at 7:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:
 
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.
 
 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?
 
 
 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu
 
 
 
 
 
 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).
 
 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.
 
 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus
 
 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.
 
 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.
 
 Adding more memory to the machine, if that's possible, might fix some of
 the problems.  You can find a discussion of the problem here:
 
 http://wiki.apache.org/solr/SolrPerformanceProblems
 
 If you have any questions after reading that wiki article, feel free to
 ask them.
 
 Thanks,
 Shawn
 
 



Re: how to debug solr performance degradation

2015-02-24 Thread Boogie Shafer

rebecca,

i would suggest making sure you have some gc logging configured so you have 
some visibility into the JVM, esp if you don't already have JMX for sflow agent 
configured to give you external visibility of those internal metrics

the options below just print out the gc activity to a log

-Xloggc:gc.log
-verbose:gc 
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram 
-XX:+PrintHeapAtGC 
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure 
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTLAB
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m




on the memory tuning side if things, as has already been mentioned, try to 
leave as much memory (outside the JVM) available to your OS to cache as much of 
the actual index as possible

in your case, you have a lot of RAM, so i would suggest starting with the gc 
logging options above, plus these very basic JVM memory settings
-XX:+UseG1GC
-Xms2G
-Xmx4G
-XX:+UseAdaptiveSizePolicy 
-XX:MaxGCPauseMillis=1000 
-XX:GCTimeRatio=19

in short, start by letting the JVM tune itself ;)

then start looking at the actual GC behavior (this will be visible in the gc 
logs)


---
on the OS performance monitoring, a few real time tools which i like to use on 
linux

nmon
dstat
htop

for trending start with the basics (sysstat/sar) 
and build from there (hsflowd is super easy to install and get pushing data up 
to a central console like ganglia)
you can add to that by adding the sflow JVM agent to your solr environment

enabling JMX interface on jetty will let you use tools like jconsole or 
jvisualvm





From: François Schiettecatte fschietteca...@gmail.com
Sent: Tuesday, February 24, 2015 17:06
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


 On Feb 24, 2015, at 7:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:

 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?


 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu





 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).

 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.

 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.

 Adding more memory to the machine, if that's possible, might fix some of
 the problems.  You can find a discussion of the problem here:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 If you have any questions after reading that wiki article, feel free to
 ask them.

 Thanks,
 Shawn




Re: how to debug solr performance degradation

2015-02-24 Thread Boogie Shafer

meant to type JMX or sflow agent

also should have mentioned you want to be running a very recent JDK


From: Boogie Shafer boogie.sha...@proquest.com
Sent: Tuesday, February 24, 2015 18:03
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

rebecca,

i would suggest making sure you have some gc logging configured so you have 
some visibility into the JVM, esp if you don't already have JMX for sflow agent 
configured to give you external visibility of those internal metrics

the options below just print out the gc activity to a log

-Xloggc:gc.log
-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintHeapAtGC
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTLAB
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m




on the memory tuning side if things, as has already been mentioned, try to 
leave as much memory (outside the JVM) available to your OS to cache as much of 
the actual index as possible

in your case, you have a lot of RAM, so i would suggest starting with the gc 
logging options above, plus these very basic JVM memory settings
-XX:+UseG1GC
-Xms2G
-Xmx4G
-XX:+UseAdaptiveSizePolicy
-XX:MaxGCPauseMillis=1000
-XX:GCTimeRatio=19

in short, start by letting the JVM tune itself ;)

then start looking at the actual GC behavior (this will be visible in the gc 
logs)


---
on the OS performance monitoring, a few real time tools which i like to use on 
linux

nmon
dstat
htop

for trending start with the basics (sysstat/sar)
and build from there (hsflowd is super easy to install and get pushing data up 
to a central console like ganglia)
you can add to that by adding the sflow JVM agent to your solr environment

enabling JMX interface on jetty will let you use tools like jconsole or 
jvisualvm





From: François Schiettecatte fschietteca...@gmail.com
Sent: Tuesday, February 24, 2015 17:06
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


 On Feb 24, 2015, at 7:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:

 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.

 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?


 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu





 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).

 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.

 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.

 Adding more memory to the machine

Re: how to debug solr performance degradation

2015-02-24 Thread Walter Underwood
The other memory is used by the OS as file buffers. All the important parts of 
the on-disk search index are buffered in memory. When the Solr process wants a 
block, it is already right there, no delays for disk access.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Feb 24, 2015, at 4:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:

 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.
 
 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?
 
 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu
 
 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).
 
 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.
 
 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus
 
 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.
 
 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.
 
 Adding more memory to the machine, if that's possible, might fix some of
 the problems.  You can find a discussion of the problem here:
 
 http://wiki.apache.org/solr/SolrPerformanceProblems
 
 If you have any questions after reading that wiki article, feel free to
 ask them.
 
 Thanks,
 Shawn
 
 



Re: how to debug solr performance degradation

2015-02-24 Thread Tang, Rebecca
We gave the machine 180G mem to see if it improves performance.  However,
after we increased the memory, Solr started using only 5% of the physical
memory.  It has always used 90-something%.

What could be causing solr to not grab all the physical memory (grabbing
so little of the physical memory)?


Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:

On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
between 0-3 seconds to complete any query), but today I noticed that the
performance is very bad (queries take between 12 ­ 15 seconds).

 I haven't updated the solr index configuration
(schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
every month, I rebuild the solr index from scratch and deploy it to the
box.  We will eventually go to incremental builds. But for now, all
indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus

The bottom line on this problem, and I'm sure it's not something you're
going to want to hear:  You don't have enough memory available to cache
your index.  I'd plan on at least 192GB of RAM for an index this size,
and 256GB would be better.

Depending on the exact index schema, the nature of your queries, and how
large your Java heap for Solr is, 100GB of RAM could be enough for good
performance on an index that size ... or it might be nowhere near
enough.  I would imagine that one of two things is true here, possibly
both:  1) Your queries are very complex and involve accessing a very
large percentage of the index data.  2) Your Java heap is enormous,
leaving very little RAM for the OS to automatically cache the index.

Adding more memory to the machine, if that's possible, might fix some of
the problems.  You can find a discussion of the problem here:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you have any questions after reading that wiki article, feel free to
ask them.

Thanks,
Shawn