Re: how to debug solr performance degradation

2015-02-27 Thread Shawn Heisey
On 2/27/2015 12:51 PM, Tang, Rebecca wrote:
> Thank you guys for all the suggestions and help! I'Ve identified the main
> culprit with debug=timing.  It was the mlt component.  After I removed it,
> the speed of the query went back to reasonable.  Another culprit is the
> expand component, but I can't remove it.  We've downgraded our amazon
> instance to 60G mem with general purpose SSD and the performance is pretty
> good.  It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :)
>
> I also added all the suggested JMV parameters.  Now I have a gc.log that I
> dig into.
>
> One thing I would like to understand is how memory is managed by solr.
>
> If I do 'top -u solr', I see something like this:
>
> Mem:  62920240k total, 62582524k used,   337716k free,   133360k buffers
> Swap:0k total,0k used,0k free, 54500892k cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 
>  4266 solr  20   0  192g 5.1g 854m S  0.0  8.4  37:09.97 java
>
> There are two things:
> 1) Mem: 62920240k total, 62582524k used. I think this is what the solr
> admin "physical memory" bar graph reports on.  Can I assume that most of
> the mem is used for loading part of the index?
>
> 2) And then there's the VIRT 192g and RES 5.1g.  What is the 5.1 RES
> (physical memory) that is used by solr?

The "total" and "used" values from top refer to *all* memory in the
entire machine, and it does match the "physical memory" graph in the
admin UI.  If you notice that the "cached" value is 54GB, that's where
most of the memory usage is actually happening.  This is the OS disk
cache -- the OS is automatically using extra memory to cache data on the
disk.  You are only caching about a third of your index, which may not
be enough for good performance, especially with complex queries.

The VIRT (virtual) and RES (resident) values are describing how Java is
using memory from the OS point of view.  The java process has allocated
5.1GB of RAM for the heap and all other memory structures.  The VIRT
number is the total amount of *address space* (virtual memory, not
actual memory) that the process has allocated.  For Solr, this will
typically be (approximately) the size of all your indexes plus the RES
and SHR values.

Solr (Lucene) uses the mmap functionality in the operating system for
all disk access by default (configurable) -- this means that it maps the
file on the disk into virtual memory.  This makes it so that a program
doesn't need to use disk I/O calls to access the data ... it just
pretends that the file is sitting in memory.  The operating system takes
care of translating those memory reads and writes into disk access.  All
memory that is not explicitly allocated to a program is automatically
used to cache that disk access -- this is the "cached" number from top
that I already mentioned.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn



Re: how to debug solr performance degradation

2015-02-27 Thread Tang, Rebecca
Thank you guys for all the suggestions and help! I'Ve identified the main
culprit with debug=timing.  It was the mlt component.  After I removed it,
the speed of the query went back to reasonable.  Another culprit is the
expand component, but I can't remove it.  We've downgraded our amazon
instance to 60G mem with general purpose SSD and the performance is pretty
good.  It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :)

I also added all the suggested JMV parameters.  Now I have a gc.log that I
dig into.

One thing I would like to understand is how memory is managed by solr.

If I do 'top -u solr', I see something like this:

Mem:  62920240k total, 62582524k used,   337716k free,   133360k buffers
Swap:0k total,0k used,0k free, 54500892k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 4266 solr  20   0  192g 5.1g 854m S  0.0  8.4  37:09.97 java

There are two things:
1) Mem: 62920240k total, 62582524k used. I think this is what the solr
admin "physical memory" bar graph reports on.  Can I assume that most of
the mem is used for loading part of the index?

2) And then there's the VIRT 192g and RES 5.1g.  What is the 5.1 RES
(physical memory) that is used by solr?




Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 2/25/15 7:57 PM, "Otis Gospodnetic"  wrote:

>Lots of suggestions here already.  +1 for those JVM params from Boogie and
>for looking at JMX.
>Rebecca, try SPM <http://sematext.com/spm> (will look at JMX for you,
>among
>other things), it may save you time figuring out
>JVM/heap/memory/performance issues.  If you can't tell what's slow via
>SPM,
>we can have a look at your metrics (charts are sharable) and may be able
>to
>help you faster than guessing.
>
>Otis
>--
>Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>Solr & Elasticsearch Support * http://sematext.com/
>
>
>On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson 
>wrote:
>
>> Before diving in too deeply, try attaching &debug=timing to the query.
>> Near the bottom of the response there'll be a list of the time taken
>> by each _component_. So there'll be separate entries for query,
>> highlighting, etc.
>>
>> This may not show any surprises, you might be spending all your time
>> scoring. But it's worth doing as a check and might save you from going
>> down some dead-ends. I mean if your query winds up spending 80% of its
>> time in the highlighter you know where to start looking..
>>
>> Best,
>> Erick
>>
>>
>> On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
>>  wrote:
>> > rebecca,
>> >
>> > you probably need to dig into your queries, but if you want to
>> force/preload the index into memory you could try doing something like
>> >
>> > cat `find /path/to/solr/index` > /dev/null
>> >
>> >
>> > if you haven't already reviewed the following, you might take a look
>>here
>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > perhaps going back to a very vanilla/default solr configuration and
>> building back up from that baseline to better isolate what might
>>specific
>> setting be impacting your environment
>> >
>> > 
>> > From: Tang, Rebecca 
>> > Sent: Wednesday, February 25, 2015 11:44
>> > To: solr-user@lucene.apache.org
>> > Subject: RE: how to debug solr performance degradation
>> >
>> > Sorry, I should have been more specific.
>> >
>> > I was referring to the solr admin UI page. Today we started up an AWS
>> > instance with 240 G of memory to see if we fit all of our index
>>(183G) in
>> > the memory and have enough for the JMV, could it improve the
>>performance.
>> >
>> > I attached the admin UI screen shot with the email.
>> >
>> > The top bar is ³Physical Memory² and we have 240.24 GB, but only 4%
>>9.52
>> > GB is used.
>> >
>> > The next bar is Swap Space and it¹s at 0.00 MB.
>> >
>> > The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.
>> >
>> > My understanding is that when Solr starts up, it reserves some memory
>>for
>> > the JVM, and then it tries to use up as much of the remaining physical
>> > memory as possible.  And I used to see the physical memory at anywhere
>> > between 70% to 90+%.  Is this understanding correct?
>> >
>> > And now,

Re: how to debug solr performance degradation

2015-02-25 Thread Otis Gospodnetic
Lots of suggestions here already.  +1 for those JVM params from Boogie and
for looking at JMX.
Rebecca, try SPM <http://sematext.com/spm> (will look at JMX for you, among
other things), it may save you time figuring out
JVM/heap/memory/performance issues.  If you can't tell what's slow via SPM,
we can have a look at your metrics (charts are sharable) and may be able to
help you faster than guessing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson 
wrote:

> Before diving in too deeply, try attaching &debug=timing to the query.
> Near the bottom of the response there'll be a list of the time taken
> by each _component_. So there'll be separate entries for query,
> highlighting, etc.
>
> This may not show any surprises, you might be spending all your time
> scoring. But it's worth doing as a check and might save you from going
> down some dead-ends. I mean if your query winds up spending 80% of its
> time in the highlighter you know where to start looking..
>
> Best,
> Erick
>
>
> On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
>  wrote:
> > rebecca,
> >
> > you probably need to dig into your queries, but if you want to
> force/preload the index into memory you could try doing something like
> >
> > cat `find /path/to/solr/index` > /dev/null
> >
> >
> > if you haven't already reviewed the following, you might take a look here
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > perhaps going back to a very vanilla/default solr configuration and
> building back up from that baseline to better isolate what might specific
> setting be impacting your environment
> >
> > ________
> > From: Tang, Rebecca 
> > Sent: Wednesday, February 25, 2015 11:44
> > To: solr-user@lucene.apache.org
> > Subject: RE: how to debug solr performance degradation
> >
> > Sorry, I should have been more specific.
> >
> > I was referring to the solr admin UI page. Today we started up an AWS
> > instance with 240 G of memory to see if we fit all of our index (183G) in
> > the memory and have enough for the JMV, could it improve the performance.
> >
> > I attached the admin UI screen shot with the email.
> >
> > The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
> > GB is used.
> >
> > The next bar is Swap Space and it¹s at 0.00 MB.
> >
> > The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.
> >
> > My understanding is that when Solr starts up, it reserves some memory for
> > the JVM, and then it tries to use up as much of the remaining physical
> > memory as possible.  And I used to see the physical memory at anywhere
> > between 70% to 90+%.  Is this understanding correct?
> >
> > And now, even with 240G of memory, our index is performing at 10 - 20
> > seconds for a query.  Granted that our queries have fq¹s and highlighting
> > and faceting, I think with a machine this powerful I should be able to
> get
> > the queries executed under 5 seconds.
> >
> > This is what we send to Solr:
> > q=(phillip%20morris)
> > &wt=json
> > &start=0
> > &rows=50
> > &facet=true
> > &facet.mincount=0
> > &facet.pivot=industry,collection_facet
> > &facet.pivot=availability_facet,availabilitystatus_facet
> > &facet.field=dddate
> >
> &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
> >
> 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
> >
> gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
> >
> %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
> >
> et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
> > 22%20OR%20dt%3A%22tab%20sheet%22))
> > &facet.field=dt_facet
> > &facet.field=brd_facet
> > &facet.field=dg_facet
> > &hl=true
> > &hl.simple.pre=%3Ch1%3E
> > &hl.simple.post=%3C%2Fh1%3E
> > &hl.requireFieldMatch=false
> > &hl.preserveMulti=true
> > &hl.fl=ot,ti
> > &f.ot.hl.fragsize=300
> > &f.ot.hl.alternateField=ot
> > &f.ot.hl.maxAlternateFieldLength=300
> > &f.ti.hl.fragsize=300
> > &f.ti.hl.alternateField=ti
> > &f.ti.hl.maxAlternateFieldLength=300
> > &fq={!collapse%20field=signature}
> > &expand=true
> > &sort=score+des

Re: how to debug solr performance degradation

2015-02-25 Thread Erick Erickson
Before diving in too deeply, try attaching &debug=timing to the query.
Near the bottom of the response there'll be a list of the time taken
by each _component_. So there'll be separate entries for query,
highlighting, etc.

This may not show any surprises, you might be spending all your time
scoring. But it's worth doing as a check and might save you from going
down some dead-ends. I mean if your query winds up spending 80% of its
time in the highlighter you know where to start looking..

Best,
Erick


On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer
 wrote:
> rebecca,
>
> you probably need to dig into your queries, but if you want to force/preload 
> the index into memory you could try doing something like
>
> cat `find /path/to/solr/index` > /dev/null
>
>
> if you haven't already reviewed the following, you might take a look here
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> perhaps going back to a very vanilla/default solr configuration and building 
> back up from that baseline to better isolate what might specific setting be 
> impacting your environment
>
> 
> From: Tang, Rebecca 
> Sent: Wednesday, February 25, 2015 11:44
> To: solr-user@lucene.apache.org
> Subject: RE: how to debug solr performance degradation
>
> Sorry, I should have been more specific.
>
> I was referring to the solr admin UI page. Today we started up an AWS
> instance with 240 G of memory to see if we fit all of our index (183G) in
> the memory and have enough for the JMV, could it improve the performance.
>
> I attached the admin UI screen shot with the email.
>
> The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
> GB is used.
>
> The next bar is Swap Space and it¹s at 0.00 MB.
>
> The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.
>
> My understanding is that when Solr starts up, it reserves some memory for
> the JVM, and then it tries to use up as much of the remaining physical
> memory as possible.  And I used to see the physical memory at anywhere
> between 70% to 90+%.  Is this understanding correct?
>
> And now, even with 240G of memory, our index is performing at 10 - 20
> seconds for a query.  Granted that our queries have fq¹s and highlighting
> and faceting, I think with a machine this powerful I should be able to get
> the queries executed under 5 seconds.
>
> This is what we send to Solr:
> q=(phillip%20morris)
> &wt=json
> &start=0
> &rows=50
> &facet=true
> &facet.mincount=0
> &facet.pivot=industry,collection_facet
> &facet.pivot=availability_facet,availabilitystatus_facet
> &facet.field=dddate
> &fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
> 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
> gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
> %20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
> et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
> 22%20OR%20dt%3A%22tab%20sheet%22))
> &facet.field=dt_facet
> &facet.field=brd_facet
> &facet.field=dg_facet
> &hl=true
> &hl.simple.pre=%3Ch1%3E
> &hl.simple.post=%3C%2Fh1%3E
> &hl.requireFieldMatch=false
> &hl.preserveMulti=true
> &hl.fl=ot,ti
> &f.ot.hl.fragsize=300
> &f.ot.hl.alternateField=ot
> &f.ot.hl.maxAlternateFieldLength=300
> &f.ti.hl.fragsize=300
> &f.ti.hl.alternateField=ti
> &f.ti.hl.maxAlternateFieldLength=300
> &fq={!collapse%20field=signature}
> &expand=true
> &sort=score+desc,availability_facet+asc
>
>
> My guess is that it¹s performing so badly because it¹s only using 4% of
> the memory? And searches require disk access.
>
>
> Rebecca
> 
> From: Shawn Heisey [apa...@elyograg.org]
> Sent: Tuesday, February 24, 2015 5:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: how to debug solr performance degradation
>
> On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
>> We gave the machine 180G mem to see if it improves performance.  However,
>> after we increased the memory, Solr started using only 5% of the physical
>> memory.  It has always used 90-something%.
>>
>> What could be causing solr to not grab all the physical memory (grabbing
>> so little of the physical memory)?
>
> I would like to know what memory numbers in which program you are
> looking at, and why you believe those numbers are a problem.
>
> The JVM has a very different view of memory than the operating system.
> Numbers in "top" mean different things than numbers on the dashboard of
> the admin UI, or the numbers in jconsole.  If you're on Windows, then
> replace "top" with task manager, process explorer, resource monitor, etc.
>
> Please provide as many details as you can about the things you are
> looking at.
>
> Thanks,
> Shawn
>


Re: how to debug solr performance degradation

2015-02-25 Thread Boogie Shafer
rebecca,

you probably need to dig into your queries, but if you want to force/preload 
the index into memory you could try doing something like

cat `find /path/to/solr/index` > /dev/null


if you haven't already reviewed the following, you might take a look here
https://wiki.apache.org/solr/SolrPerformanceProblems

perhaps going back to a very vanilla/default solr configuration and building 
back up from that baseline to better isolate what might specific setting be 
impacting your environment


From: Tang, Rebecca 
Sent: Wednesday, February 25, 2015 11:44
To: solr-user@lucene.apache.org
Subject: RE: how to debug solr performance degradation

Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
&wt=json
&start=0
&rows=50
&facet=true
&facet.mincount=0
&facet.pivot=industry,collection_facet
&facet.pivot=availability_facet,availabilitystatus_facet
&facet.field=dddate
&fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
&facet.field=dt_facet
&facet.field=brd_facet
&facet.field=dg_facet
&hl=true
&hl.simple.pre=%3Ch1%3E
&hl.simple.post=%3C%2Fh1%3E
&hl.requireFieldMatch=false
&hl.preserveMulti=true
&hl.fl=ot,ti
&f.ot.hl.fragsize=300
&f.ot.hl.alternateField=ot
&f.ot.hl.maxAlternateFieldLength=300
&f.ti.hl.fragsize=300
&f.ti.hl.alternateField=ti
&f.ti.hl.maxAlternateFieldLength=300
&fq={!collapse%20field=signature}
&expand=true
&sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in "top" mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace "top" with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn



RE: how to debug solr performance degradation

2015-02-25 Thread Toke Eskildsen
Unfortunately (or luckily, depending on view), attachments does not work with 
this mailing list. You'll have to upload it somewhere and provide an URL. It is 
quite hard _not_ to get your whole index into disk cache, so my guess is that 
it will get there eventually. Just to check: If you re-issue your queries, does 
the response time change? If not, then disk caching is not the problem.

Anyway, with your new information, I would say that pivot faceting is the 
culprit. Does the timing tests in 
https://issues.apache.org/jira/browse/SOLR-6803 line up with the cardinalities 
of your fields?

My next step would be to disable parts of the query (highlight, faceting and 
collapsing one at a time) to check which part is the heaviest.

- Toke Eskildsen

From: Tang, Rebecca [rebecca.t...@ucsf.edu]
Sent: 25 February 2015 20:44
To: solr-user@lucene.apache.org
Subject: RE: how to debug solr performance degradation

Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
&wt=json
&start=0
&rows=50
&facet=true
&facet.mincount=0
&facet.pivot=industry,collection_facet
&facet.pivot=availability_facet,availabilitystatus_facet
&facet.field=dddate
&fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
&facet.field=dt_facet
&facet.field=brd_facet
&facet.field=dg_facet
&hl=true
&hl.simple.pre=%3Ch1%3E
&hl.simple.post=%3C%2Fh1%3E
&hl.requireFieldMatch=false
&hl.preserveMulti=true
&hl.fl=ot,ti
&f.ot.hl.fragsize=300
&f.ot.hl.alternateField=ot
&f.ot.hl.maxAlternateFieldLength=300
&f.ti.hl.fragsize=300
&f.ti.hl.alternateField=ti
&f.ti.hl.maxAlternateFieldLength=300
&fq={!collapse%20field=signature}
&expand=true
&sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in "top" mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace "top" with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn




RE: how to debug solr performance degradation

2015-02-25 Thread Tang, Rebecca
Sorry, I should have been more specific.

I was referring to the solr admin UI page. Today we started up an AWS
instance with 240 G of memory to see if we fit all of our index (183G) in
the memory and have enough for the JMV, could it improve the performance.

I attached the admin UI screen shot with the email.

The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52
GB is used.

The next bar is Swap Space and it¹s at 0.00 MB.

The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G.

My understanding is that when Solr starts up, it reserves some memory for
the JVM, and then it tries to use up as much of the remaining physical
memory as possible.  And I used to see the physical memory at anywhere
between 70% to 90+%.  Is this understanding correct?

And now, even with 240G of memory, our index is performing at 10 - 20
seconds for a query.  Granted that our queries have fq¹s and highlighting
and faceting, I think with a machine this powerful I should be able to get
the queries executed under 5 seconds.

This is what we send to Solr:
q=(phillip%20morris)
&wt=json
&start=0
&rows=50
&facet=true
&facet.mincount=0
&facet.pivot=industry,collection_facet
&facet.pivot=availability_facet,availabilitystatus_facet
&facet.field=dddate
&fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blank%
20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20be
gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20folder
%20end%22%20OR%20dt%3A%22file%20folder%20label%22%20OR%20dt%3A%22file%20she
et%22%20OR%20dt%3A%22file%20sheet%20beginning%22%20OR%20dt%3A%22tab%20page%
22%20OR%20dt%3A%22tab%20sheet%22))
&facet.field=dt_facet
&facet.field=brd_facet
&facet.field=dg_facet
&hl=true
&hl.simple.pre=%3Ch1%3E
&hl.simple.post=%3C%2Fh1%3E
&hl.requireFieldMatch=false
&hl.preserveMulti=true
&hl.fl=ot,ti
&f.ot.hl.fragsize=300
&f.ot.hl.alternateField=ot
&f.ot.hl.maxAlternateFieldLength=300
&f.ti.hl.fragsize=300
&f.ti.hl.alternateField=ti
&f.ti.hl.maxAlternateFieldLength=300
&fq={!collapse%20field=signature}
&expand=true
&sort=score+desc,availability_facet+asc


My guess is that it¹s performing so badly because it¹s only using 4% of
the memory? And searches require disk access.


Rebecca

From: Shawn Heisey [apa...@elyograg.org]
Sent: Tuesday, February 24, 2015 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system.
Numbers in "top" mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace "top" with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn




Re: how to debug solr performance degradation

2015-02-24 Thread Boogie Shafer

meant to type "JMX or sflow agent"

also should have mentioned you want to be running a very recent JDK


From: Boogie Shafer 
Sent: Tuesday, February 24, 2015 18:03
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

rebecca,

i would suggest making sure you have some gc logging configured so you have 
some visibility into the JVM, esp if you don't already have JMX for sflow agent 
configured to give you external visibility of those internal metrics

the options below just print out the gc activity to a log

-Xloggc:gc.log
-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintHeapAtGC
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTLAB
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m




on the memory tuning side if things, as has already been mentioned, try to 
leave as much memory (outside the JVM) available to your OS to cache as much of 
the actual index as possible

in your case, you have a lot of RAM, so i would suggest starting with the gc 
logging options above, plus these very basic JVM memory settings
-XX:+UseG1GC
-Xms2G
-Xmx4G
-XX:+UseAdaptiveSizePolicy
-XX:MaxGCPauseMillis=1000
-XX:GCTimeRatio=19

in short, start by letting the JVM tune itself ;)

then start looking at the actual GC behavior (this will be visible in the gc 
logs)


---
on the OS performance monitoring, a few real time tools which i like to use on 
linux

nmon
dstat
htop

for trending start with the basics (sysstat/sar)
and build from there (hsflowd is super easy to install and get pushing data up 
to a central console like ganglia)
you can add to that by adding the sflow JVM agent to your solr environment

enabling JMX interface on jetty will let you use tools like jconsole or 
jvisualvm





From: François Schiettecatte 
Sent: Tuesday, February 24, 2015 17:06
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


> On Feb 24, 2015, at 7:45 PM, Tang, Rebecca  wrote:
>
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?
>
>
> Rebecca Tang
> Applications Developer, UCSF CKM
> Industry Documents Digital Libraries
> E: rebecca.t...@ucsf.edu
>
>
>
>
>
> On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:
>
>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
>>> Our solr index used to perform OK on our beta production box (anywhere
>>> between 0-3 seconds to complete any query), but today I noticed that the
>>> performance is very bad (queries take between 12 ­ 15 seconds).
>>>
>>> I haven't updated the solr index configuration
>>> (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
>>> every month, I rebuild the solr index from scratch and deploy it to the
>>> box.  We will eventually go to incremental builds. But for now, all
>>> indexes are built from scratch.
>>>
>>> Here are the stats:
>>> Solr index size 183G
>>> Documents in index 14364201
>>> We just have single solr box
>>> It has 100G memory
>>> 500G Harddrive
>>> 16 cpus
>>
>> The bottom line on this problem, and I'm sure it's not something you're
>> going to want to hear:  You don't have enough memory available to cache
>> your index.  I'd plan on at least 192GB of RAM for an index this size,
>> and 256GB would be better.
>>
>> Depending on the exact index schema, the nature of your queries, and how
>> large your Java heap for Solr is, 100GB of RAM could be enough for good
>> performance on an index that size ... or it mi

Re: how to debug solr performance degradation

2015-02-24 Thread Boogie Shafer

rebecca,

i would suggest making sure you have some gc logging configured so you have 
some visibility into the JVM, esp if you don't already have JMX for sflow agent 
configured to give you external visibility of those internal metrics

the options below just print out the gc activity to a log

-Xloggc:gc.log
-verbose:gc 
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram 
-XX:+PrintHeapAtGC 
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure 
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTLAB
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m




on the memory tuning side if things, as has already been mentioned, try to 
leave as much memory (outside the JVM) available to your OS to cache as much of 
the actual index as possible

in your case, you have a lot of RAM, so i would suggest starting with the gc 
logging options above, plus these very basic JVM memory settings
-XX:+UseG1GC
-Xms2G
-Xmx4G
-XX:+UseAdaptiveSizePolicy 
-XX:MaxGCPauseMillis=1000 
-XX:GCTimeRatio=19

in short, start by letting the JVM tune itself ;)

then start looking at the actual GC behavior (this will be visible in the gc 
logs)


---
on the OS performance monitoring, a few real time tools which i like to use on 
linux

nmon
dstat
htop

for trending start with the basics (sysstat/sar) 
and build from there (hsflowd is super easy to install and get pushing data up 
to a central console like ganglia)
you can add to that by adding the sflow JVM agent to your solr environment

enabling JMX interface on jetty will let you use tools like jconsole or 
jvisualvm





From: François Schiettecatte 
Sent: Tuesday, February 24, 2015 17:06
To: solr-user@lucene.apache.org
Subject: Re: how to debug solr performance degradation

Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


> On Feb 24, 2015, at 7:45 PM, Tang, Rebecca  wrote:
>
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?
>
>
> Rebecca Tang
> Applications Developer, UCSF CKM
> Industry Documents Digital Libraries
> E: rebecca.t...@ucsf.edu
>
>
>
>
>
> On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:
>
>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
>>> Our solr index used to perform OK on our beta production box (anywhere
>>> between 0-3 seconds to complete any query), but today I noticed that the
>>> performance is very bad (queries take between 12 ­ 15 seconds).
>>>
>>> I haven't updated the solr index configuration
>>> (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
>>> every month, I rebuild the solr index from scratch and deploy it to the
>>> box.  We will eventually go to incremental builds. But for now, all
>>> indexes are built from scratch.
>>>
>>> Here are the stats:
>>> Solr index size 183G
>>> Documents in index 14364201
>>> We just have single solr box
>>> It has 100G memory
>>> 500G Harddrive
>>> 16 cpus
>>
>> The bottom line on this problem, and I'm sure it's not something you're
>> going to want to hear:  You don't have enough memory available to cache
>> your index.  I'd plan on at least 192GB of RAM for an index this size,
>> and 256GB would be better.
>>
>> Depending on the exact index schema, the nature of your queries, and how
>> large your Java heap for Solr is, 100GB of RAM could be enough for good
>> performance on an index that size ... or it might be nowhere near
>> enough.  I would imagine that one of two things is true here, possibly
>> both:  1) Your queries are very complex and involve accessing a very
>> large percentage of the index data.  2) Your Java heap is enormous,
>> leaving very little RAM 

Re: how to debug solr performance degradation

2015-02-24 Thread Shawn Heisey
On 2/24/2015 5:45 PM, Tang, Rebecca wrote:
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
>
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?

I would like to know what memory numbers in which program you are
looking at, and why you believe those numbers are a problem.

The JVM has a very different view of memory than the operating system. 
Numbers in "top" mean different things than numbers on the dashboard of
the admin UI, or the numbers in jconsole.  If you're on Windows, then
replace "top" with task manager, process explorer, resource monitor, etc.

Please provide as many details as you can about the things you are
looking at.

Thanks,
Shawn



Re: how to debug solr performance degradation

2015-02-24 Thread François Schiettecatte
Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


> On Feb 24, 2015, at 7:45 PM, Tang, Rebecca  wrote:
> 
> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
> 
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?
> 
> 
> Rebecca Tang
> Applications Developer, UCSF CKM
> Industry Documents Digital Libraries
> E: rebecca.t...@ucsf.edu
> 
> 
> 
> 
> 
> On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:
> 
>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
>>> Our solr index used to perform OK on our beta production box (anywhere
>>> between 0-3 seconds to complete any query), but today I noticed that the
>>> performance is very bad (queries take between 12 ­ 15 seconds).
>>> 
>>> I haven't updated the solr index configuration
>>> (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
>>> every month, I rebuild the solr index from scratch and deploy it to the
>>> box.  We will eventually go to incremental builds. But for now, all
>>> indexes are built from scratch.
>>> 
>>> Here are the stats:
>>> Solr index size 183G
>>> Documents in index 14364201
>>> We just have single solr box
>>> It has 100G memory
>>> 500G Harddrive
>>> 16 cpus
>> 
>> The bottom line on this problem, and I'm sure it's not something you're
>> going to want to hear:  You don't have enough memory available to cache
>> your index.  I'd plan on at least 192GB of RAM for an index this size,
>> and 256GB would be better.
>> 
>> Depending on the exact index schema, the nature of your queries, and how
>> large your Java heap for Solr is, 100GB of RAM could be enough for good
>> performance on an index that size ... or it might be nowhere near
>> enough.  I would imagine that one of two things is true here, possibly
>> both:  1) Your queries are very complex and involve accessing a very
>> large percentage of the index data.  2) Your Java heap is enormous,
>> leaving very little RAM for the OS to automatically cache the index.
>> 
>> Adding more memory to the machine, if that's possible, might fix some of
>> the problems.  You can find a discussion of the problem here:
>> 
>> http://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> If you have any questions after reading that wiki article, feel free to
>> ask them.
>> 
>> Thanks,
>> Shawn
>> 
> 



Re: how to debug solr performance degradation

2015-02-24 Thread Erick Erickson
Be careful what you think is being used by Solr since Lucene uses
MMapDirectories under the covers, and this means you might be seeing
virtual memory. See Uwe's excellent blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Tue, Feb 24, 2015 at 5:02 PM, Walter Underwood  wrote:
> The other memory is used by the OS as file buffers. All the important parts 
> of the on-disk search index are buffered in memory. When the Solr process 
> wants a block, it is already right there, no delays for disk access.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Feb 24, 2015, at 4:45 PM, Tang, Rebecca  wrote:
>
>> We gave the machine 180G mem to see if it improves performance.  However,
>> after we increased the memory, Solr started using only 5% of the physical
>> memory.  It has always used 90-something%.
>>
>> What could be causing solr to not grab all the physical memory (grabbing
>> so little of the physical memory)?
>>
>> Rebecca Tang
>> Applications Developer, UCSF CKM
>> Industry Documents Digital Libraries
>> E: rebecca.t...@ucsf.edu
>>
>> On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:
>>
>>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).

 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.

 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus
>>>
>>> The bottom line on this problem, and I'm sure it's not something you're
>>> going to want to hear:  You don't have enough memory available to cache
>>> your index.  I'd plan on at least 192GB of RAM for an index this size,
>>> and 256GB would be better.
>>>
>>> Depending on the exact index schema, the nature of your queries, and how
>>> large your Java heap for Solr is, 100GB of RAM could be enough for good
>>> performance on an index that size ... or it might be nowhere near
>>> enough.  I would imagine that one of two things is true here, possibly
>>> both:  1) Your queries are very complex and involve accessing a very
>>> large percentage of the index data.  2) Your Java heap is enormous,
>>> leaving very little RAM for the OS to automatically cache the index.
>>>
>>> Adding more memory to the machine, if that's possible, might fix some of
>>> the problems.  You can find a discussion of the problem here:
>>>
>>> http://wiki.apache.org/solr/SolrPerformanceProblems
>>>
>>> If you have any questions after reading that wiki article, feel free to
>>> ask them.
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>


Re: how to debug solr performance degradation

2015-02-24 Thread Walter Underwood
The other memory is used by the OS as file buffers. All the important parts of 
the on-disk search index are buffered in memory. When the Solr process wants a 
block, it is already right there, no delays for disk access.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Feb 24, 2015, at 4:45 PM, Tang, Rebecca  wrote:

> We gave the machine 180G mem to see if it improves performance.  However,
> after we increased the memory, Solr started using only 5% of the physical
> memory.  It has always used 90-something%.
> 
> What could be causing solr to not grab all the physical memory (grabbing
> so little of the physical memory)?
> 
> Rebecca Tang
> Applications Developer, UCSF CKM
> Industry Documents Digital Libraries
> E: rebecca.t...@ucsf.edu
> 
> On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:
> 
>> On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
>>> Our solr index used to perform OK on our beta production box (anywhere
>>> between 0-3 seconds to complete any query), but today I noticed that the
>>> performance is very bad (queries take between 12 ­ 15 seconds).
>>> 
>>> I haven't updated the solr index configuration
>>> (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
>>> every month, I rebuild the solr index from scratch and deploy it to the
>>> box.  We will eventually go to incremental builds. But for now, all
>>> indexes are built from scratch.
>>> 
>>> Here are the stats:
>>> Solr index size 183G
>>> Documents in index 14364201
>>> We just have single solr box
>>> It has 100G memory
>>> 500G Harddrive
>>> 16 cpus
>> 
>> The bottom line on this problem, and I'm sure it's not something you're
>> going to want to hear:  You don't have enough memory available to cache
>> your index.  I'd plan on at least 192GB of RAM for an index this size,
>> and 256GB would be better.
>> 
>> Depending on the exact index schema, the nature of your queries, and how
>> large your Java heap for Solr is, 100GB of RAM could be enough for good
>> performance on an index that size ... or it might be nowhere near
>> enough.  I would imagine that one of two things is true here, possibly
>> both:  1) Your queries are very complex and involve accessing a very
>> large percentage of the index data.  2) Your Java heap is enormous,
>> leaving very little RAM for the OS to automatically cache the index.
>> 
>> Adding more memory to the machine, if that's possible, might fix some of
>> the problems.  You can find a discussion of the problem here:
>> 
>> http://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> If you have any questions after reading that wiki article, feel free to
>> ask them.
>> 
>> Thanks,
>> Shawn
>> 
> 



Re: how to debug solr performance degradation

2015-02-24 Thread Tang, Rebecca
We gave the machine 180G mem to see if it improves performance.  However,
after we increased the memory, Solr started using only 5% of the physical
memory.  It has always used 90-something%.

What could be causing solr to not grab all the physical memory (grabbing
so little of the physical memory)?


Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 2/24/15 12:44 PM, "Shawn Heisey"  wrote:

>On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
>> Our solr index used to perform OK on our beta production box (anywhere
>>between 0-3 seconds to complete any query), but today I noticed that the
>>performance is very bad (queries take between 12 ­ 15 seconds).
>>
>> I haven't updated the solr index configuration
>>(schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
>>every month, I rebuild the solr index from scratch and deploy it to the
>>box.  We will eventually go to incremental builds. But for now, all
>>indexes are built from scratch.
>>
>> Here are the stats:
>> Solr index size 183G
>> Documents in index 14364201
>> We just have single solr box
>> It has 100G memory
>> 500G Harddrive
>> 16 cpus
>
>The bottom line on this problem, and I'm sure it's not something you're
>going to want to hear:  You don't have enough memory available to cache
>your index.  I'd plan on at least 192GB of RAM for an index this size,
>and 256GB would be better.
>
>Depending on the exact index schema, the nature of your queries, and how
>large your Java heap for Solr is, 100GB of RAM could be enough for good
>performance on an index that size ... or it might be nowhere near
>enough.  I would imagine that one of two things is true here, possibly
>both:  1) Your queries are very complex and involve accessing a very
>large percentage of the index data.  2) Your Java heap is enormous,
>leaving very little RAM for the OS to automatically cache the index.
>
>Adding more memory to the machine, if that's possible, might fix some of
>the problems.  You can find a discussion of the problem here:
>
>http://wiki.apache.org/solr/SolrPerformanceProblems
>
>If you have any questions after reading that wiki article, feel free to
>ask them.
>
>Thanks,
>Shawn
>



Re: how to debug solr performance degradation

2015-02-24 Thread Shawn Heisey
On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
> Our solr index used to perform OK on our beta production box (anywhere 
> between 0-3 seconds to complete any query), but today I noticed that the 
> performance is very bad (queries take between 12 – 15 seconds).
>
> I haven't updated the solr index configuration (schema.xml/solrconfig.xml) 
> lately.  All that's changed is the data — every month, I rebuild the solr 
> index from scratch and deploy it to the box.  We will eventually go to 
> incremental builds. But for now, all indexes are built from scratch.
>
> Here are the stats:
> Solr index size 183G
> Documents in index 14364201
> We just have single solr box
> It has 100G memory
> 500G Harddrive
> 16 cpus

The bottom line on this problem, and I'm sure it's not something you're
going to want to hear:  You don't have enough memory available to cache
your index.  I'd plan on at least 192GB of RAM for an index this size,
and 256GB would be better.

Depending on the exact index schema, the nature of your queries, and how
large your Java heap for Solr is, 100GB of RAM could be enough for good
performance on an index that size ... or it might be nowhere near
enough.  I would imagine that one of two things is true here, possibly
both:  1) Your queries are very complex and involve accessing a very
large percentage of the index data.  2) Your Java heap is enormous,
leaving very little RAM for the OS to automatically cache the index.

Adding more memory to the machine, if that's possible, might fix some of
the problems.  You can find a discussion of the problem here:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you have any questions after reading that wiki article, feel free to
ask them.

Thanks,
Shawn



RE: how to debug solr performance degradation

2015-02-24 Thread Toke Eskildsen
Tang, Rebecca [rebecca.t...@ucsf.edu] wrote:
[12-15 second response time instead of 0-3]
> Solr index size 183G
> Documents in index 14364201
> We just have single solr box
> It has 100G memory
> 500G Harddrive
> 16 cpus

The usual culprit is memory (if you are using spinning drive as your storage). 
It appears that you have enough raw memory though. Could you check how much 
memory the machine has free for disk caching? If it is a relative small amount, 
let's say below 50GB, then please provide a breakdown of what the memory is 
used for (very large JVM heap for example).

> I want to pinpoint where the performance issue is coming from.  Could I have 
> some suggestions/help on how to benchmark/debug solr performance issues.

Rough checking of IOWait and CPU load is a fine starting point. If if is CPU 
load then you can turn on debug in Solr admin, which should tell you where the 
time is spend resolving the queries. It it is IOWait then ensure a lot of free 
memory for disk cache and/or improve your storage speed (SSDs instead of 
spinning drives, local storage instead of remote).

- Toke Eskildsen, State and University Library, Denmark.