solr4 performance question

2014-04-08 Thread Joshi, Shital
Hi,

We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB 
machine and 40 GB of index. 
We're constantly noticing that Solr queries take longer time while update (with 
commit=false setting) is in progress. The query which usually takes .5 seconds, 
take up to 2 minutes while updates are in progress. And this is not the case 
with all queries, it is very sporadic behavior.
 
Any pointer to nail this issue would be appreciated. 

Is there a way to find how much of a query result came from cache? Can we 
enable any log settings to start printing what came from cache vs. what was 
queried?

Thanks!


Re: solr4 performance question

2014-04-08 Thread Erick Erickson
What do you have for hour _softcommit_ settings in solrconfig.xml? I'm
guessing you're using SolrJ or similar, but the solrconfig settings
will trip a commit as well.

For that matter ,what are all our commit settings in solrconfig.xml,
both hard and soft?

Best,
Erick

On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Hi,

 We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB 
 machine and 40 GB of index.
 We're constantly noticing that Solr queries take longer time while update 
 (with commit=false setting) is in progress. The query which usually takes .5 
 seconds, take up to 2 minutes while updates are in progress. And this is not 
 the case with all queries, it is very sporadic behavior.

 Any pointer to nail this issue would be appreciated.

 Is there a way to find how much of a query result came from cache? Can we 
 enable any log settings to start printing what came from cache vs. what was 
 queried?

 Thanks!


Re: solr4 performance question

2014-04-08 Thread Furkan KAMACI
Hi Joshi;

Click to the Plugins/Stats section under your collection at Solr Admin UI.
You will see the cache statistics for different types of caches. hitratio
and evictions are good statistics to look at first. On the other hand you
should read here: https://wiki.apache.org/solr/SolrPerformanceFactors

Thanks;
Furkan KAMACI


2014-04-08 20:28 GMT+03:00 Joshi, Shital shital.jo...@gs.com:

 Hi,

 We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB
 machine and 40 GB of index.
 We're constantly noticing that Solr queries take longer time while update
 (with commit=false setting) is in progress. The query which usually takes
 .5 seconds, take up to 2 minutes while updates are in progress. And this is
 not the case with all queries, it is very sporadic behavior.

 Any pointer to nail this issue would be appreciated.

 Is there a way to find how much of a query result came from cache? Can we
 enable any log settings to start printing what came from cache vs. what was
 queried?

 Thanks!



RE: solr4 performance question

2014-04-08 Thread Joshi, Shital
We don't do any soft commit. This is our hard commit setting. 

autoCommit
   maxTime${solr.autoCommit.maxTime:60}/maxTime
   maxDocs10/maxDocs
   openSearchertrue/openSearcher   
/autoCommit

We use this update command: 

 solr_command=$(catEnD
time zcat --force $file2load | /usr/bin/curl --proxy  --silent --show-error 
--max-time 3600 \
http://$solr_url/solr/$solr_core/update/csv?\
commit=false\
separator=|\
escape=\\\
trim=true\
header=false\
skipLines=2\
overwrite=true\
_shard_=$shardid\
fieldnames=$fieldnames\
f.cs_rep.split=true\
f.cs_rep.separator=%5E  --data-binary @-  -H 'Content-type:text/plain; 
charset=utf-8'
EnD)


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, April 08, 2014 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: solr4 performance question

What do you have for hour _softcommit_ settings in solrconfig.xml? I'm
guessing you're using SolrJ or similar, but the solrconfig settings
will trip a commit as well.

For that matter ,what are all our commit settings in solrconfig.xml,
both hard and soft?

Best,
Erick

On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Hi,

 We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB 
 machine and 40 GB of index.
 We're constantly noticing that Solr queries take longer time while update 
 (with commit=false setting) is in progress. The query which usually takes .5 
 seconds, take up to 2 minutes while updates are in progress. And this is not 
 the case with all queries, it is very sporadic behavior.

 Any pointer to nail this issue would be appreciated.

 Is there a way to find how much of a query result came from cache? Can we 
 enable any log settings to start printing what came from cache vs. what was 
 queried?

 Thanks!


Re: solr4 performance question

2014-04-08 Thread Erick Erickson
bq:   solr.autoCommit.maxTime:60
   maxDocs10/maxDocs
   openSearchertrue/openSearcher

Every 100K documents or 10 minutes (whichever comes first) your
current searchers will be closed and a new searcher opened, all the
warmup queries etc. might happen. I suspect you're not doing much with
autwarming and/or newSearcher queries. So occasionally your search has
to wait for caches to be read, terms to be populated, etc.

Some possibilities to test this:
1 create some newSearcher queries in solrconfig.xml
2 specify a reasonable autowarm count for queryResultCache (don't go
crazy here, start with 16 or some similiar)
3 set openSearcher to false above. In this case you won't be able to
see the documents until either a hard or soft commit happens, you
could cure this with a single hard commit at the end of your indexing
run. It all depends on what latency you can tolerate in terms of
searching newly-indexed documents.

Here's a reference...

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Apr 8, 2014 at 12:11 PM, Joshi, Shital shital.jo...@gs.com wrote:
 We don't do any soft commit. This is our hard commit setting.

 autoCommit
maxTime${solr.autoCommit.maxTime:60}/maxTime
maxDocs10/maxDocs
openSearchertrue/openSearcher
 /autoCommit

 We use this update command:

  solr_command=$(catEnD
 time zcat --force $file2load | /usr/bin/curl --proxy  --silent --show-error 
 --max-time 3600 \
 http://$solr_url/solr/$solr_core/update/csv?\
 commit=false\
 separator=|\
 escape=\\\
 trim=true\
 header=false\
 skipLines=2\
 overwrite=true\
 _shard_=$shardid\
 fieldnames=$fieldnames\
 f.cs_rep.split=true\
 f.cs_rep.separator=%5E  --data-binary @-  -H 'Content-type:text/plain; 
 charset=utf-8'
 EnD)


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, April 08, 2014 2:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: solr4 performance question

 What do you have for hour _softcommit_ settings in solrconfig.xml? I'm
 guessing you're using SolrJ or similar, but the solrconfig settings
 will trip a commit as well.

 For that matter ,what are all our commit settings in solrconfig.xml,
 both hard and soft?

 Best,
 Erick

 On Tue, Apr 8, 2014 at 10:28 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Hi,

 We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB 
 machine and 40 GB of index.
 We're constantly noticing that Solr queries take longer time while update 
 (with commit=false setting) is in progress. The query which usually takes .5 
 seconds, take up to 2 minutes while updates are in progress. And this is not 
 the case with all queries, it is very sporadic behavior.

 Any pointer to nail this issue would be appreciated.

 Is there a way to find how much of a query result came from cache? Can we 
 enable any log settings to start printing what came from cache vs. what was 
 queried?

 Thanks!


RE: Solr4 performance

2014-02-28 Thread Joshi, Shital
Thanks. 

We find little evidence that page/disk cache is causing this issue. We use sar 
to collect statistics. Here is the statistics on a node where the query took 
maximum time. (out of 5 shards, one with most data takes long time) However, 
we're reducing heap size and testing in QA. 

  CPU %user %nice   %system   %iowait%steal 
%idle   
17:00:01  all  2.11  0.00  0.04 0.00  0.00  
   97.85
Average:  all  7.52  0.00  0.16 0.02  0.00  
   92.31

tps rtps  wtps  bread/s   bwrtn/s
17:00:0110.63  0.0010.63 0.00 140.56
Average:73.90  2.65 71.24314.241507.93

pgpgin/s pgpgout/s   fault/s  majflt/s
17:00:010.00 23.42367.95  0.00
Average:52.37 251.32  586.79  0.82

Our current JVM is 30G and usage is ~26G. If we reduce JVM to 25G, we're afraid 
of hitting OOM error in Java.


-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Thursday, February 27, 2014 3:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

You would get more room for disk cache by reducing your large heap.
Otherwise, you'd have to add more RAM to your systems or shard your index
to more nodes to gain more RAM that way.

The Linux VM subsystem actually has a number of tuning parameters (like
vm.bdflush, vm.swappiness and vm.pagecache), but I don't know if there's
any definitive information about how to set them appropriately for Solr.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Thu, Feb 27, 2014 at 3:09 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi Michael,

 If page cache is the issue, what is the solution?

 Thanks!

 -Original Message-
 From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Sent: Monday, February 24, 2014 9:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 I'm not sure how you're measuring free RAM. Maybe this will help:

 http://www.linuxatemyram.com/play.html

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 The Science of Influence Marketing

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Thanks.
 
  We found some evidence that this could be the issue. We're monitoring
  closely to confirm this.
 
  One question though: none of our nodes show more that 50% of physical
  memory used. So there is enough memory available for memory mapped files.
  Can this kind of pause still happen?
 
 
  -Original Message-
  From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
  Sent: Friday, February 21, 2014 5:28 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr4 performance
 
  It could be that your query is churning the page cache on that node
  sometimes, so Solr pauses so the OS can drag those pages off of disk.
 Have
  you tried profiling your iowait in top or iostat during these pauses?
  (assuming you're using linux).
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  The Science of Influence Marketing
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  w: appinions.com http://www.appinions.com/
 
 
  On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
 
   Thanks for your answer.
  
   We confirmed that it is not GC issue.
  
   The auto warming query looks good too and queries before and after the
   long running query comes back really quick. The only thing stands out
 is
   shard on which query takes long time has couple million more documents
  than
   other shards.
  
   -Original Message-
   From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
   Sent: Thursday, February 20, 2014 5:26 PM
   To: solr-user@lucene.apache.org
   Subject: RE: Solr4 performance
  
   Hi,
  
   As for your first question, setting openSearcher to true means you will
  see
   the new docs after every hard commit. Soft and hard commits only become
   isolated from one another with that set to false.
  
   Your

RE: Solr4 performance

2014-02-27 Thread Joshi, Shital
Hi Michael,

If page cache is the issue, what is the solution? 

Thanks!

-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Monday, February 24, 2014 9:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

I'm not sure how you're measuring free RAM. Maybe this will help:

http://www.linuxatemyram.com/play.html

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Thanks.

 We found some evidence that this could be the issue. We're monitoring
 closely to confirm this.

 One question though: none of our nodes show more that 50% of physical
 memory used. So there is enough memory available for memory mapped files.
 Can this kind of pause still happen?


 -Original Message-
 From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Sent: Friday, February 21, 2014 5:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 It could be that your query is churning the page cache on that node
 sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
 you tried profiling your iowait in top or iostat during these pauses?
 (assuming you're using linux).

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 The Science of Influence Marketing

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Thanks for your answer.
 
  We confirmed that it is not GC issue.
 
  The auto warming query looks good too and queries before and after the
  long running query comes back really quick. The only thing stands out is
  shard on which query takes long time has couple million more documents
 than
  other shards.
 
  -Original Message-
  From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
  Sent: Thursday, February 20, 2014 5:26 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Solr4 performance
 
  Hi,
 
  As for your first question, setting openSearcher to true means you will
 see
  the new docs after every hard commit. Soft and hard commits only become
  isolated from one another with that set to false.
 
  Your second problem might be explained by your large heap and garbage
  collection. Walking a heap that large can take an appreciable amount of
  time. You might consider turning on the JVM options for logging GC and
  seeing if you can correlate your slow responses to times when your JVM is
  garbage collecting.
 
  Hope that helps,
  On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:
 
   Hi!
  
   I have few other questions regarding Solr4 performance issue we're
  facing.
  
   We're committing data to Solr4 every ~30 seconds (up to 20K rows). We
 use
   commit=false in update URL. We have only hard commit setting in Solr4
   config.
  
   autoCommit
  maxTime${solr.autoCommit.maxTime:60}/maxTime
  maxDocs10/maxDocs
  openSearchertrue/openSearcher
/autoCommit
  
  
   Since we're not using Soft commit at all (commit=false), the caches
 will
   not get reloaded for every commit and recently added documents will not
  be
   visible, correct?
  
   What we see is queries which usually take few milli seconds, takes ~40
   seconds once in a while. Can high IO during hard commit cause queries
 to
   slow down?
  
   For some shards we see 98% full physical memory. We have 60GB machine
 (30
   GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
   physical memory would cause queries to slow down. We're in process of
   reducing JVM size anyways.
  
   We have never run optimization till now. QA optimization didn't yield
 in
   performance gain.
  
   Thanks much for all help.
  
   -Original Message-
   From: Shawn Heisey [mailto:s...@elyograg.org]
   Sent: Tuesday, February 18, 2014 4:55 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Solr4 performance
  
   On 2/18/2014 2:14 PM, Joshi, Shital wrote:
Thanks much for all suggestions. We're looking into reducing
 allocated
   heap size of Solr4 JVM.
   
We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
   internally? Can someone please confirm?
  
   In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
   default delegate.  That's probably also the case with Lucene -- these
   are Lucene classes, after all.
  
   MMapDirectory

Re: Solr4 performance

2014-02-27 Thread Shawn Heisey

On 2/27/2014 1:09 PM, Joshi, Shital wrote:

If page cache is the issue, what is the solution?


What operating system are you using, and what tool are you looking at to 
see your memory usage?  Can you share a screenshot with us?  Use a file 
sharing website for that - the list generally doesn't like attachments.


Thanks,
Shawn



Re: Solr4 performance

2014-02-27 Thread Michael Della Bitta
You would get more room for disk cache by reducing your large heap.
Otherwise, you'd have to add more RAM to your systems or shard your index
to more nodes to gain more RAM that way.

The Linux VM subsystem actually has a number of tuning parameters (like
vm.bdflush, vm.swappiness and vm.pagecache), but I don't know if there's
any definitive information about how to set them appropriately for Solr.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Thu, Feb 27, 2014 at 3:09 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi Michael,

 If page cache is the issue, what is the solution?

 Thanks!

 -Original Message-
 From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Sent: Monday, February 24, 2014 9:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 I'm not sure how you're measuring free RAM. Maybe this will help:

 http://www.linuxatemyram.com/play.html

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 The Science of Influence Marketing

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Thanks.
 
  We found some evidence that this could be the issue. We're monitoring
  closely to confirm this.
 
  One question though: none of our nodes show more that 50% of physical
  memory used. So there is enough memory available for memory mapped files.
  Can this kind of pause still happen?
 
 
  -Original Message-
  From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
  Sent: Friday, February 21, 2014 5:28 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr4 performance
 
  It could be that your query is churning the page cache on that node
  sometimes, so Solr pauses so the OS can drag those pages off of disk.
 Have
  you tried profiling your iowait in top or iostat during these pauses?
  (assuming you're using linux).
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  The Science of Influence Marketing
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
 
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
  w: appinions.com http://www.appinions.com/
 
 
  On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
 
   Thanks for your answer.
  
   We confirmed that it is not GC issue.
  
   The auto warming query looks good too and queries before and after the
   long running query comes back really quick. The only thing stands out
 is
   shard on which query takes long time has couple million more documents
  than
   other shards.
  
   -Original Message-
   From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
   Sent: Thursday, February 20, 2014 5:26 PM
   To: solr-user@lucene.apache.org
   Subject: RE: Solr4 performance
  
   Hi,
  
   As for your first question, setting openSearcher to true means you will
  see
   the new docs after every hard commit. Soft and hard commits only become
   isolated from one another with that set to false.
  
   Your second problem might be explained by your large heap and garbage
   collection. Walking a heap that large can take an appreciable amount of
   time. You might consider turning on the JVM options for logging GC and
   seeing if you can correlate your slow responses to times when your JVM
 is
   garbage collecting.
  
   Hope that helps,
   On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:
  
Hi!
   
I have few other questions regarding Solr4 performance issue we're
   facing.
   
We're committing data to Solr4 every ~30 seconds (up to 20K rows). We
  use
commit=false in update URL. We have only hard commit setting in Solr4
config.
   
autoCommit
   maxTime${solr.autoCommit.maxTime:60}/maxTime
   maxDocs10/maxDocs
   openSearchertrue/openSearcher
 /autoCommit
   
   
Since we're not using Soft commit at all (commit=false), the caches
  will
not get reloaded for every commit and recently added documents will
 not
   be
visible, correct?
   
What we see is queries which usually take few milli seconds, takes
 ~40
seconds once in a while. Can high IO during hard commit cause queries
  to
slow down?
   
For some shards we see 98% full physical memory. We have 60GB machine
  (30
GB

RE: Solr4 performance

2014-02-24 Thread Joshi, Shital
Thanks. 

We found some evidence that this could be the issue. We're monitoring closely 
to confirm this. 

One question though: none of our nodes show more that 50% of physical memory 
used. So there is enough memory available for memory mapped files. Can this 
kind of pause still happen? 


-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Friday, February 21, 2014 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

It could be that your query is churning the page cache on that node
sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
you tried profiling your iowait in top or iostat during these pauses?
(assuming you're using linux).

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Thanks for your answer.

 We confirmed that it is not GC issue.

 The auto warming query looks good too and queries before and after the
 long running query comes back really quick. The only thing stands out is
 shard on which query takes long time has couple million more documents than
 other shards.

 -Original Message-
 From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Sent: Thursday, February 20, 2014 5:26 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr4 performance

 Hi,

 As for your first question, setting openSearcher to true means you will see
 the new docs after every hard commit. Soft and hard commits only become
 isolated from one another with that set to false.

 Your second problem might be explained by your large heap and garbage
 collection. Walking a heap that large can take an appreciable amount of
 time. You might consider turning on the JVM options for logging GC and
 seeing if you can correlate your slow responses to times when your JVM is
 garbage collecting.

 Hope that helps,
 On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi!
 
  I have few other questions regarding Solr4 performance issue we're
 facing.
 
  We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
  commit=false in update URL. We have only hard commit setting in Solr4
  config.
 
  autoCommit
 maxTime${solr.autoCommit.maxTime:60}/maxTime
 maxDocs10/maxDocs
 openSearchertrue/openSearcher
   /autoCommit
 
 
  Since we're not using Soft commit at all (commit=false), the caches will
  not get reloaded for every commit and recently added documents will not
 be
  visible, correct?
 
  What we see is queries which usually take few milli seconds, takes ~40
  seconds once in a while. Can high IO during hard commit cause queries to
  slow down?
 
  For some shards we see 98% full physical memory. We have 60GB machine (30
  GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
  physical memory would cause queries to slow down. We're in process of
  reducing JVM size anyways.
 
  We have never run optimization till now. QA optimization didn't yield in
  performance gain.
 
  Thanks much for all help.
 
  -Original Message-
  From: Shawn Heisey [mailto:s...@elyograg.org]
  Sent: Tuesday, February 18, 2014 4:55 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr4 performance
 
  On 2/18/2014 2:14 PM, Joshi, Shital wrote:
   Thanks much for all suggestions. We're looking into reducing allocated
  heap size of Solr4 JVM.
  
   We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
  internally? Can someone please confirm?
 
  In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
  default delegate.  That's probably also the case with Lucene -- these
  are Lucene classes, after all.
 
  MMapDirectory is almost always the most efficient way to handle on-disk
  indexes.
 
  Thanks,
  Shawn
 
 



Re: Solr4 performance

2014-02-24 Thread Michael Della Bitta
I'm not sure how you're measuring free RAM. Maybe this will help:

http://www.linuxatemyram.com/play.html

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Thanks.

 We found some evidence that this could be the issue. We're monitoring
 closely to confirm this.

 One question though: none of our nodes show more that 50% of physical
 memory used. So there is enough memory available for memory mapped files.
 Can this kind of pause still happen?


 -Original Message-
 From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
 Sent: Friday, February 21, 2014 5:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 It could be that your query is churning the page cache on that node
 sometimes, so Solr pauses so the OS can drag those pages off of disk. Have
 you tried profiling your iowait in top or iostat during these pauses?
 (assuming you're using linux).

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 The Science of Influence Marketing

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
 w: appinions.com http://www.appinions.com/


 On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Thanks for your answer.
 
  We confirmed that it is not GC issue.
 
  The auto warming query looks good too and queries before and after the
  long running query comes back really quick. The only thing stands out is
  shard on which query takes long time has couple million more documents
 than
  other shards.
 
  -Original Message-
  From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
  Sent: Thursday, February 20, 2014 5:26 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Solr4 performance
 
  Hi,
 
  As for your first question, setting openSearcher to true means you will
 see
  the new docs after every hard commit. Soft and hard commits only become
  isolated from one another with that set to false.
 
  Your second problem might be explained by your large heap and garbage
  collection. Walking a heap that large can take an appreciable amount of
  time. You might consider turning on the JVM options for logging GC and
  seeing if you can correlate your slow responses to times when your JVM is
  garbage collecting.
 
  Hope that helps,
  On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:
 
   Hi!
  
   I have few other questions regarding Solr4 performance issue we're
  facing.
  
   We're committing data to Solr4 every ~30 seconds (up to 20K rows). We
 use
   commit=false in update URL. We have only hard commit setting in Solr4
   config.
  
   autoCommit
  maxTime${solr.autoCommit.maxTime:60}/maxTime
  maxDocs10/maxDocs
  openSearchertrue/openSearcher
/autoCommit
  
  
   Since we're not using Soft commit at all (commit=false), the caches
 will
   not get reloaded for every commit and recently added documents will not
  be
   visible, correct?
  
   What we see is queries which usually take few milli seconds, takes ~40
   seconds once in a while. Can high IO during hard commit cause queries
 to
   slow down?
  
   For some shards we see 98% full physical memory. We have 60GB machine
 (30
   GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
   physical memory would cause queries to slow down. We're in process of
   reducing JVM size anyways.
  
   We have never run optimization till now. QA optimization didn't yield
 in
   performance gain.
  
   Thanks much for all help.
  
   -Original Message-
   From: Shawn Heisey [mailto:s...@elyograg.org]
   Sent: Tuesday, February 18, 2014 4:55 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Solr4 performance
  
   On 2/18/2014 2:14 PM, Joshi, Shital wrote:
Thanks much for all suggestions. We're looking into reducing
 allocated
   heap size of Solr4 JVM.
   
We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
   internally? Can someone please confirm?
  
   In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
   default delegate.  That's probably also the case with Lucene -- these
   are Lucene classes, after all.
  
   MMapDirectory is almost always the most efficient way to handle on-disk
   indexes.
  
   Thanks,
   Shawn
  
  
 



RE: Solr4 performance

2014-02-20 Thread Joshi, Shital
Hi!

I have few other questions regarding Solr4 performance issue we're facing. 

We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use 
commit=false in update URL. We have only hard commit setting in Solr4 config. 

autoCommit
   maxTime${solr.autoCommit.maxTime:60}/maxTime
   maxDocs10/maxDocs
   openSearchertrue/openSearcher   
 /autoCommit


Since we're not using Soft commit at all (commit=false), the caches will not 
get reloaded for every commit and recently added documents will not be visible, 
correct? 

What we see is queries which usually take few milli seconds, takes ~40 seconds 
once in a while. Can high IO during hard commit cause queries to slow down? 

For some shards we see 98% full physical memory. We have 60GB machine (30 GB 
JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high physical 
memory would cause queries to slow down. We're in process of reducing JVM size 
anyways. 

We have never run optimization till now. QA optimization didn't yield in 
performance gain. 

Thanks much for all help.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, February 18, 2014 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

On 2/18/2014 2:14 PM, Joshi, Shital wrote:
 Thanks much for all suggestions. We're looking into reducing allocated heap 
 size of Solr4 JVM.

 We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
 Can someone please confirm?

In Solr, NRTCachingDirectory does indeed use MMapDirectory as its 
default delegate.  That's probably also the case with Lucene -- these 
are Lucene classes, after all.

MMapDirectory is almost always the most efficient way to handle on-disk 
indexes.

Thanks,
Shawn



RE: Solr4 performance

2014-02-20 Thread Michael Della Bitta
Hi,

As for your first question, setting openSearcher to true means you will see
the new docs after every hard commit. Soft and hard commits only become
isolated from one another with that set to false.

Your second problem might be explained by your large heap and garbage
collection. Walking a heap that large can take an appreciable amount of
time. You might consider turning on the JVM options for logging GC and
seeing if you can correlate your slow responses to times when your JVM is
garbage collecting.

Hope that helps,
On Feb 20, 2014 4:52 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi!

 I have few other questions regarding Solr4 performance issue we're facing.

 We're committing data to Solr4 every ~30 seconds (up to 20K rows). We use
 commit=false in update URL. We have only hard commit setting in Solr4
 config.

 autoCommit
maxTime${solr.autoCommit.maxTime:60}/maxTime
maxDocs10/maxDocs
openSearchertrue/openSearcher
  /autoCommit


 Since we're not using Soft commit at all (commit=false), the caches will
 not get reloaded for every commit and recently added documents will not be
 visible, correct?

 What we see is queries which usually take few milli seconds, takes ~40
 seconds once in a while. Can high IO during hard commit cause queries to
 slow down?

 For some shards we see 98% full physical memory. We have 60GB machine (30
 GB JVM, 28 GB free RAM, ~35 GB of index). We're ruling out that high
 physical memory would cause queries to slow down. We're in process of
 reducing JVM size anyways.

 We have never run optimization till now. QA optimization didn't yield in
 performance gain.

 Thanks much for all help.

 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: Tuesday, February 18, 2014 4:55 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr4 performance

 On 2/18/2014 2:14 PM, Joshi, Shital wrote:
  Thanks much for all suggestions. We're looking into reducing allocated
 heap size of Solr4 JVM.
 
  We're using NRTCachingDirectoryFactory. Does it use MMapDirectory
 internally? Can someone please confirm?

 In Solr, NRTCachingDirectory does indeed use MMapDirectory as its
 default delegate.  That's probably also the case with Lucene -- these
 are Lucene classes, after all.

 MMapDirectory is almost always the most efficient way to handle on-disk
 indexes.

 Thanks,
 Shawn




RE: Solr4 performance

2014-02-18 Thread Joshi, Shital
Hi,

Thanks much for all suggestions. We're looking into reducing allocated heap 
size of Solr4 JVM. 

We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
Can someone please confirm?

Would optimization help with performance? We did that in QA (took about 13 
hours for 700 mil documents) 

Thanks!

-Original Message-
From: Roman Chyla [mailto:roman.ch...@gmail.com] 
Sent: Wednesday, February 12, 2014 3:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.

Best,

  roman


On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/12/2014 12:07 PM, Greg Walters wrote:

 Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
 on-64bit.html as it's a pretty decent explanation of memory mapped
 files. I don't believe that the default configuration for solr is to use
 MMapDirectory but even if it does my understanding is that the entire file
 won't be forcibly cached by solr. The OS's filesystem cache should control
 what's actually in ram and the eviction process will depend on the OS.


 I only have a little bit to add.  Here's the first thing that Uwe's blog
 post (linked above) says:

 Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default
 on 64bit Windows and Solaris systems; since version 3.3 also for 64bit
 Linux systems.

 The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory
 by default under the hood.

 A summary about all this that should be relevant to the original question:

 It's the *operating system* that handles memory mapping, including any
 caching that happens.  Assuming that you don't have a badly configured
 virtual machine setup, I'm fairly sure that only real memory gets used,
 never swap space on the disk.  If something else on the system makes a
 memory allocation, the operating system will instantly give up memory used
 for caching and mapping.  One of the strengths of mmap is that it can't
 exceed available resources unless it's used incorrectly.

 Thanks,
 Shawn




Re: Solr4 performance

2014-02-18 Thread Shawn Heisey

On 2/18/2014 2:14 PM, Joshi, Shital wrote:

Thanks much for all suggestions. We're looking into reducing allocated heap 
size of Solr4 JVM.

We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
Can someone please confirm?


In Solr, NRTCachingDirectory does indeed use MMapDirectory as its 
default delegate.  That's probably also the case with Lucene -- these 
are Lucene classes, after all.


MMapDirectory is almost always the most efficient way to handle on-disk 
indexes.


Thanks,
Shawn



RE: Solr4 performance

2014-02-12 Thread Joshi, Shital
Does Solr4 load entire index in Memory mapped file? What is the eviction policy 
of this memory mapped file? Can we control it?

_
From: Joshi, Shital [Tech]
Sent: Wednesday, February 05, 2014 12:00 PM
To: 'solr-user@lucene.apache.org'
Subject: Solr4 performance


Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes 
(cloud). We're using local disk (/local/data) to store solr index files. All 
hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far 
we have 470 million documents. We are using custom sharding and all shards have 
~9-10 million documents. We have a GUI sending queries to this cloud and GUI 
has 30 seconds of timeout.

Lately we're getting many timeouts on GUI and upon checking we found that all 
timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% 
of physical memory but the other host looks perfectly good. Both hosts are for 
different shards. Would increasing ram of these two hosts make these timeouts 
go away? What else we can check?

Many Thanks!




Re: Solr4 performance

2014-02-12 Thread Shalin Shekhar Mangar
No, Solr doesn't load the entire index in memory. I think you'll find
Uwe's blog most helpful on this matter:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital shital.jo...@gs.com wrote:
 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?

 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance


 Hi,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.

 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?

 Many Thanks!





-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr4 performance

2014-02-12 Thread Greg Walters
Shital,

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.

Thanks,
Greg

On Feb 12, 2014, at 12:57 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Does Solr4 load entire index in Memory mapped file? What is the eviction 
 policy of this memory mapped file? Can we control it?
 
 _
 From: Joshi, Shital [Tech]
 Sent: Wednesday, February 05, 2014 12:00 PM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr4 performance
 
 
 Hi,
 
 We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute 
 boxes (cloud). We're using local disk (/local/data) to store solr index 
 files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap 
 size. So far we have 470 million documents. We are using custom sharding and 
 all shards have ~9-10 million documents. We have a GUI sending queries to 
 this cloud and GUI has 30 seconds of timeout.
 
 Lately we're getting many timeouts on GUI and upon checking we found that all 
 timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 
 96% of physical memory but the other host looks perfectly good. Both hosts 
 are for different shards. Would increasing ram of these two hosts make these 
 timeouts go away? What else we can check?
 
 Many Thanks!
 
 



Re: Solr4 performance

2014-02-12 Thread Shawn Heisey

On 2/12/2014 12:07 PM, Greg Walters wrote:

Take a look at 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's 
a pretty decent explanation of memory mapped files. I don't believe that the 
default configuration for solr is to use MMapDirectory but even if it does my 
understanding is that the entire file won't be forcibly cached by solr. The 
OS's filesystem cache should control what's actually in ram and the eviction 
process will depend on the OS.


I only have a little bit to add.  Here's the first thing that Uwe's blog 
post (linked above) says:


Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby 
default on 64bit Windows and Solaris systems; since version 3.3 also for 
64bit Linux systems.


The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory 
by default under the hood.


A summary about all this that should be relevant to the original question:

It's the *operating system* that handles memory mapping, including any 
caching that happens.  Assuming that you don't have a badly configured 
virtual machine setup, I'm fairly sure that only real memory gets used, 
never swap space on the disk.  If something else on the system makes a 
memory allocation, the operating system will instantly give up memory 
used for caching and mapping.  One of the strengths of mmap is that it 
can't exceed available resources unless it's used incorrectly.


Thanks,
Shawn



Re: Solr4 performance

2014-02-12 Thread Roman Chyla
And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.

Best,

  roman


On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/12/2014 12:07 PM, Greg Walters wrote:

 Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
 on-64bit.html as it's a pretty decent explanation of memory mapped
 files. I don't believe that the default configuration for solr is to use
 MMapDirectory but even if it does my understanding is that the entire file
 won't be forcibly cached by solr. The OS's filesystem cache should control
 what's actually in ram and the eviction process will depend on the OS.


 I only have a little bit to add.  Here's the first thing that Uwe's blog
 post (linked above) says:

 Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default
 on 64bit Windows and Solaris systems; since version 3.3 also for 64bit
 Linux systems.

 The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory
 by default under the hood.

 A summary about all this that should be relevant to the original question:

 It's the *operating system* that handles memory mapping, including any
 caching that happens.  Assuming that you don't have a badly configured
 virtual machine setup, I'm fairly sure that only real memory gets used,
 never swap space on the disk.  If something else on the system makes a
 memory allocation, the operating system will instantly give up memory used
 for caching and mapping.  One of the strengths of mmap is that it can't
 exceed available resources unless it's used incorrectly.

 Thanks,
 Shawn




Solr4 performance

2014-02-05 Thread Joshi, Shital
Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes 
(cloud). We're using local disk (/local/data) to store solr index files. All 
hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far 
we have 470 million documents. We are using custom sharding and all shards have 
~9-10 million documents. We have a GUI sending queries to this cloud and GUI 
has 30 seconds of timeout.

Lately we're getting many timeouts on GUI and upon checking we found that all 
timeouts are happening on 2 hosts. The admin GUI for one of the hosts show 96% 
of physical memory but the other host looks perfectly good. Both hosts are for 
different shards. Would increasing ram of these two hosts make these timeouts 
go away? What else we can check?

Many Thanks!