Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Hi Shawn,

Wow! Thank you for your considered reply!

I'm going to dig into these issues, but I have a few questions:

Regarding memory: Including duplicate data in shard replicas the entire
index is 350GB. Each server hosts a total of 44GB of data. Each server has
28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java
would take the memory it needs and leave the rest to the OS for cache.

Given that I'll never need to serve 200 concurrent connections in
production, do you think my servers need more memory?
Should I be tinkering with -Xmx and -Xms?

Regarding commits: My end-users want new data to be made available quickly.
Thankfully I'm only inserting between 1 and 3 documents per second so the
change-rate isn't crazy.

Should I just slow down my commit frequency, and depend on soft-commits? If
I do this, will the commits take even longer?
Given 1000 documents, is it generally faster to do 10 commits of 100, or 1
commit of 1000?

Thanks so much!

-D



On Fri, Nov 22, 2013 at 2:27 AM, Shawn Heisey s...@elyograg.org wrote:

 On 11/21/2013 6:41 PM, Dave Seltzer wrote:
  In digging a little deeper and looking at the config I see that
  nrtModetrue/nrtMode is commented out.  I believe this is the default
  setting. So I don't know if NRT is enabled or not. Maybe just a red
 herring.

 I had never seen this setting before.  The default is true.  SolrCloud
 requires that it be set to true.  Looks like it's a new parameter in
 4.5, added by SOLR-4909.  From what I can tell reading the issue,
 turning it off effectively disables soft commits.

 https://issues.apache.org/jira/browse/SOLR-4909

 You've said that you are adding about 3 documents per second, but you
 haven't said anything about how often you are doing commits.  Erick's
 question basically boils down to this:  How quickly after indexing do
 you expect the changes to be visible on a search, and how often are you
 doing commits?

 Generally speaking (and ignoring the fact that nrtMode now exists), NRT
 is not something you enable, it's something you try to achieve, by using
 soft commits quickly and often, and by adjusting the configuration to
 make the commits go faster.

 If you are trying to keep the interval between indexing and document
 visibility down to less than a few seconds (especially if it's less than
 one second), then you are trying to achieve NRT.

 There's a lot of information on the following wiki page about
 performance problems.  This specific link is to the last part of that
 page, which deals with slow commits:

 http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

  I don't know what Garbage Collector we're using. In this test I'm running
  Solr 4.5.1 using Jetty from the example directory.

 If you aren't using any tuning parameters beyond setting the max heap,
 then you are using the default parallel collector.  It's a poor choice
 for Solr unless your heap is very small.  At 6GB, yours isn't very
 small.  It's not particularly huge either, but not small.

  The CPU on the 8 nodes all stay around 70% use during the test. The nodes
  have 28GB of RAM. Java is using about 6GB and the rest is being used by
 OS
  cache.

 How big is your index?  If it's larger than about 30 GB, you probably
 need more memory.  If it's much larger than about 40 GB, you definitely
 need more memory.

  To perform the test we're running 200 concurrent threads in JMeter. The
  threads hit HAProxy which loadbalances the requests among the nodes. Each
  query is for a random word out of a list of about 10,000 words. Some of
 the
  queries have faceting turned on.

 That's a pretty high query load.  If you want to get anywhere near top
 performance out of it, you'll want to have enough memory to fit your
 entire index into RAM.  You'll also need to reduce the load introduced
 by indexing.  A large part of the load from indexing comes from commits.

  Because we're heavily loading the system the queries are returning quite
  slowly. For a simple search, the average response time was 300ms. The
 peak
  response time was 11,000ms. The spikes in latency seem to occur about
 every
  2.5 minutes.

 I would bet that you're having one or both of the following issues:

 1) Garbage collection issues from one or more of the following:
  a) Heap too small.
  b) Using the default GC instead of CMS with tuning.
 2) General performance issues from one or more of the following:
  a) Not enough cache memory for your index size.
  b) Too-frequent commits.
  c) Commits taking a lot of time and resources due to cache warming.

 With a high query and index load, any problems become magnified.

  I haven't spent that much time messing with SolrConfig, so most of the
  settings are the out-of-the-box defaults.

 The defaults are very good for small to medium indexes and low to medium
 query load.  If you have a big index and/or high query load, you'll
 generally need to tune.

 Thanks,
 Shawn




Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 8:13 AM, Dave Seltzer wrote:

Regarding memory: Including duplicate data in shard replicas the entire
index is 350GB. Each server hosts a total of 44GB of data. Each server has
28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java
would take the memory it needs and leave the rest to the OS for cache.


That's not how Java works.  Java has a min heap and max heap setting.  
If you (or the auto-detected settings) tell it that the max heap is 4GB, 
it will only ever use slightly more than 4GB of RAM.  If the app needs 
more than that, this will lead to terrible performance and/or out of 
memory errors.


You can see how much the max heap is in the Solr admin UI dashboard - 
it'll be the right-most number on the JVM-Memory graph.  On my 64-bit 
linux development machine with 16GB of RAM, it looks like Java defaults 
to a 4GB max heap.  I have the heap size manually set to 7GB for Solr on 
that machine.  The 6GB heap you have mentioned might not be enough, or 
it might be more than you need.  It all depends on the kind of queries 
you are doing and exactly how Solr is configured.


If it were me, I'd want a memory size between 48 and 64GB for a total 
index size of 44GB.  Whether you really need that much is very dependent 
on your exact requirements, index makeup, and queries.  To support the 
high query load you're sending, it probably is a requirement.  More 
memory is likely to help performance, but I can't guarantee it without 
looking a lot deeper into your setup, and that's difficult to do via email.


One thing I can tell you about checking performance - see how much of 
your 70% CPU usage is going to I/O wait.  If it's more than a few 
percent, more memory might help.  First try increasing the max heap by 1 
or 2GB.



Given that I'll never need to serve 200 concurrent connections in
production, do you think my servers need more memory?
Should I be tinkering with -Xmx and -Xms?


If you'll never need to serve that many, test with a lower number.  Make 
it higher than you'll need, but not a lot higher. The test with 200 
connections isn't a bad idea -- you do want to stress test things way 
beyond your actual requirements, but you'll also want to see how it does 
with a more realistic load.


Those are the min/max heap settings I just mentioned.  IMHO you should 
set at least the max heap.  If you want to handle a high load, it's a 
good idea to set the min heap to the same value as the max heap, so that 
it doesn't need to worry about hitting limits in order to allocate 
additional memory.  It'll eventually allocate the max heap anyway.



Regarding commits: My end-users want new data to be made available quickly.
Thankfully I'm only inserting between 1 and 3 documents per second so the
change-rate isn't crazy.

Should I just slow down my commit frequency, and depend on soft-commits? If
I do this, will the commits take even longer?
Given 1000 documents, is it generally faster to do 10 commits of 100, or 1
commit of 1000?


Fewer commits is always better.  The amount of time they take isn't 
strongly affected by the number of new documents, unless there are a LOT 
of them.  Figure out the timeframe that's the maximum amount of time (in 
milliseconds) that you think people are willing to wait for new data to 
become visible.  Use that as your autoSoftCommit interval, or as the 
commitWithin parameter on your indexing requests.  Set your autoCommit 
interval to around five minutes, as described on the wiki page I 
linked.  If you are using auto settings and/or commitWithin, then you 
will never need to send an explicit commit command.  Reducing commit 
frequency is one of the first things you'll want to try.  Frequent 
commits use a *lot* of I/O and CPU resources.


Although there are exceptions, most installs rarely NEED commits to 
happen more often than about once a minute, and longer intervals are 
often perfectly acceptable.  Even in situations where a higher frequency 
is required, 10-15 seconds is often good enough.  Getting sub-second 
commit times is *possible*, but usually requires significant hardware 
investment or changing the config in a way that is detrimental to query 
performance.


Thanks,
Shawn



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 10:01 AM, Shawn Heisey wrote:
You can see how much the max heap is in the Solr admin UI dashboard - 
it'll be the right-most number on the JVM-Memory graph.  On my 64-bit 
linux development machine with 16GB of RAM, it looks like Java 
defaults to a 4GB max heap.  I have the heap size manually set to 7GB 
for Solr on that machine.  The 6GB heap you have mentioned might not 
be enough, or it might be more than you need.  It all depends on the 
kind of queries you are doing and exactly how Solr is configured.


Followup: I would also recommend starting with my garbage collection 
settings.  This wiki page is linked on the wiki page I've already given you.


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

You might need a script to start Solr.  There is also a redhat-specific 
init script on that wiki page.  I haven't included any instructions for 
installing it.  Someone who already knows about init scripts won't have 
much trouble getting it working on a redhat-derived OS, and someone who 
doesn't will need extensive instructions or an install script, neither 
of which has been written.


Thanks,
Shawn



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Thanks so much Shawn,

I think you (and others) are completely right about this being heap and GC
related. I just did a test while not indexing data and the same periodic
slowness was observable.

On to GC/Memory Tuning!

Many Thanks!

-Dave


On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/22/2013 10:01 AM, Shawn Heisey wrote:

 You can see how much the max heap is in the Solr admin UI dashboard -
 it'll be the right-most number on the JVM-Memory graph.  On my 64-bit linux
 development machine with 16GB of RAM, it looks like Java defaults to a 4GB
 max heap.  I have the heap size manually set to 7GB for Solr on that
 machine.  The 6GB heap you have mentioned might not be enough, or it might
 be more than you need.  It all depends on the kind of queries you are doing
 and exactly how Solr is configured.


 Followup: I would also recommend starting with my garbage collection
 settings.  This wiki page is linked on the wiki page I've already given you.

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 You might need a script to start Solr.  There is also a redhat-specific
 init script on that wiki page.  I haven't included any instructions for
 installing it.  Someone who already knows about init scripts won't have
 much trouble getting it working on a redhat-derived OS, and someone who
 doesn't will need extensive instructions or an install script, neither of
 which has been written.

 Thanks,
 Shawn



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Raymond Wiker
You mentioned earlier that you are not setting -Xms/-Xmx; the values actually 
in use would then depend on the Java version, whether you're running 32- or 
64-bit Java, whether Java thinks your machines are servers, and whether you 
have specified the -server flag – and possibly a few other things.

What do you get if you run the command below?

java -XX:+PrintFlagsFinal -version

(Ref: 
http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5
 for details; I stole the incantation above from that location, but there are 
more complete examples of how it could be used there.)

Note: you need to adjust the command line so that it uses the same java version 
as the one you're using, and also add whatever JRE-modifying parameters that 
you use when starting Solr.

On 22 Nov 2013, at 18:12 , Dave Seltzer dselt...@tveyes.com wrote:

 Thanks so much Shawn,
 
 I think you (and others) are completely right about this being heap and GC
 related. I just did a test while not indexing data and the same periodic
 slowness was observable.
 
 On to GC/Memory Tuning!



Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
Wow. That is one noisy command!

Full output is below. The grepped output looks like:

[solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version | grep -i -E
'heapsize|permsize|version'
uintx AdaptivePermSizeWeight= 20
 {product}
uintx ErgoHeapSizeLimit = 0
{product}
uintx HeapSizePerGCThread   = 87241520
 {product}
uintx InitialHeapSize  := 447247104
{product}
uintx LargePageHeapSizeThreshold= 134217728
{product}
uintx MaxHeapSize  := 7157579776
 {product}
uintx MaxPermSize   = 85983232{pd
product}
uintx PermSize  = 21757952{pd
product}
java version 1.7.0_45
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

It looks like Java is correctly determining that this is in-fact a
server. It seems to start with an Xmx of 25% of the RAM or around 7GB.

So, in addition to tweaking GC I'm going to increase Xmx. Any advise as to
how much memory should go to the Heap and how much should go to the OS disk
cache? Should I split it 50/50?

Again. Many Thanks.

-Dave


 Full output from printflags
--
[solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version
[Global flags]
uintx AdaptivePermSizeWeight= 20
 {product}
uintx AdaptiveSizeDecrementScaleFactor  = 4
{product}
uintx AdaptiveSizeMajorGCDecayTimeScale = 10
 {product}
uintx AdaptiveSizePausePolicy   = 0
{product}
uintx AdaptiveSizePolicyCollectionCostMargin= 50
 {product}
uintx AdaptiveSizePolicyInitializingSteps   = 20
 {product}
uintx AdaptiveSizePolicyOutputInterval  = 0
{product}
uintx AdaptiveSizePolicyWeight  = 10
 {product}
uintx AdaptiveSizeThroughPutPolicy  = 0
{product}
uintx AdaptiveTimeWeight= 25
 {product}
 bool AdjustConcurrency = false
{product}
 bool AggressiveOpts= false
{product}
 intx AliasLevel= 3   {C2
product}
 bool AlignVector   = false   {C2
product}
 intx AllocateInstancePrefetchLines = 1
{product}
 intx AllocatePrefetchDistance  = 192
{product}
 intx AllocatePrefetchInstr = 0
{product}
 intx AllocatePrefetchLines = 4
{product}
 intx AllocatePrefetchStepSize  = 64
 {product}
 intx AllocatePrefetchStyle = 1
{product}
 bool AllowJNIEnvProxy  = false
{product}
 bool AllowNonVirtualCalls  = false
{product}
 bool AllowParallelDefineClass  = false
{product}
 bool AllowUserSignalHandlers   = false
{product}
 bool AlwaysActAsServerClassMachine = false
{product}
 bool AlwaysCompileLoopMethods  = false
{product}
 bool AlwaysLockClassLoader = false
{product}
 bool AlwaysPreTouch= false
{product}
 bool AlwaysRestoreFPU  = false
{product}
 bool AlwaysTenure  = false
{product}
 bool AssertOnSuspendWaitFailure= false
{product}
 intx Atomics   = 0
{product}
 intx AutoBoxCacheMax   = 128 {C2
product}
uintx AutoGCSelectPauseMillis   = 5000
 {product}
 intx BCEATraceLevel= 0
{product}
 intx BackEdgeThreshold = 10  {pd
product}
 bool BackgroundCompilation = true{pd
product}
uintx BaseFootPrintEstimate = 268435456
{product}
 intx BiasedLockingBulkRebiasThreshold  = 20
 {product}
 intx BiasedLockingBulkRevokeThreshold  = 40
 {product}
 intx BiasedLockingDecayTime= 25000
{product}
 intx BiasedLockingStartupDelay = 4000
 {product}
 bool BindGCTaskThreadsToCPUs   = false
{product}
 bool BlockLayoutByFrequency= true{C2
product}
 intx BlockLayoutMinDiamondPercentage   = 20  {C2
product}
 bool BlockLayoutRotateLoops= true{C2
product}
 bool BranchOnRegister  = false   {C2
product}
 bool BytecodeVerificationLocal = false
{product}
 bool BytecodeVerificationRemote= true
 {product}
 bool C1OptimizeVirtualCallProfiling   

Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Dave Seltzer
So I made a few changes, but I still seem to be dealing with this pesky
periodic slowness.

Changes:
1) I'm now only forcing commits every 5 minutes. This was done by
specifying commitWithin=30 when doing document adds.
2) I'm specifying an -Xmx12g to force the java heap to take more memory
3) I'm using the GC configuration parameters from the wiki (
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning)

The new startup args are:
-DzkRun
-Xmx12g
-XX:+AggressiveOpts
-XX:+UseLargePages
-XX:+ParallelRefProcEnabled
-XX:+CMSParallelRemarkEnabled
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:CMSTriggerPermRatio=80
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSFullGCsBeforeCompaction=1
-XX:PretenureSizeThreshold=64m
-XX:+CMSScavengeBeforeRemark
-XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=8
-XX:TargetSurvivorRatio=90
-XX:SurvivorRatio=4
-XX:NewRatio=3

I'm still seeing the same periodic slowness about every 3.5 minutes. This
slowness occurs whether or not I'm indexing content, so it appears to be
unrelated to my commit schedule.

See the most recent graph here:
http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png

To keep things consistent I'm still testing with 200 threads. When I test
with 10 threads everything is much faster, but I still get the same
periodic slowness.

One thing I've noticed is that while Java is aware of the 12 gig heap, Solr
doesn't seem to be using much of it. The system panel of the Web UI shows
11.5GB of JVM-Memory available, but only 2.11GB in use.

Screenshot: http://farm4.staticflickr.com/3822/10999509515_72a9013ec7_o.jpg

So I've told Java to use more memory. Do I need to tell Solr to use more as
well?

Thanks everyone!

-Dave



On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/22/2013 10:01 AM, Shawn Heisey wrote:

 You can see how much the max heap is in the Solr admin UI dashboard -
 it'll be the right-most number on the JVM-Memory graph.  On my 64-bit linux
 development machine with 16GB of RAM, it looks like Java defaults to a 4GB
 max heap.  I have the heap size manually set to 7GB for Solr on that
 machine.  The 6GB heap you have mentioned might not be enough, or it might
 be more than you need.  It all depends on the kind of queries you are doing
 and exactly how Solr is configured.


 Followup: I would also recommend starting with my garbage collection
 settings.  This wiki page is linked on the wiki page I've already given you.

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 You might need a script to start Solr.  There is also a redhat-specific
 init script on that wiki page.  I haven't included any instructions for
 installing it.  Someone who already knows about init scripts won't have
 much trouble getting it working on a redhat-derived OS, and someone who
 doesn't will need extensive instructions or an install script, neither of
 which has been written.

 Thanks,
 Shawn




Re: Periodic Slowness on Solr Cloud

2013-11-22 Thread Shawn Heisey

On 11/22/2013 2:17 PM, Dave Seltzer wrote:

So I made a few changes, but I still seem to be dealing with this pesky
periodic slowness.

Changes:
1) I'm now only forcing commits every 5 minutes. This was done by
specifying commitWithin=30 when doing document adds.
2) I'm specifying an -Xmx12g to force the java heap to take more memory
3) I'm using the GC configuration parameters from the wiki (
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning)


snip


I'm still seeing the same periodic slowness about every 3.5 minutes. This
slowness occurs whether or not I'm indexing content, so it appears to be
unrelated to my commit schedule.


It sounds like your heap isn't too small.  Try reducing it to 5GB, then 
to 4GB after some testing, so more memory gets used by the OS disk 
cache.  I would also recommend trying perhaps 100 threads on your test 
app rather than 200.  Work your way up until you find the point where it 
just can't handle the load.



See the most recent graph here:
http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png

To keep things consistent I'm still testing with 200 threads. When I test
with 10 threads everything is much faster, but I still get the same
periodic slowness.

One thing I've noticed is that while Java is aware of the 12 gig heap, Solr
doesn't seem to be using much of it. The system panel of the Web UI shows
11.5GB of JVM-Memory available, but only 2.11GB in use.


The memory usage in the admin UI is an instantaneous snapshot.  If you 
use jvisualvm or jconsole (included in the Java JDK) to get a graph of 
memory usage, you'll see it change over time.  As Java allocates 
objects, memory usage increases until it's using all the heap.  Some 
amount of that allocation will be objects that are no longer in use -- 
garbage.  Then garbage collection will kick in and memory usage will 
drop down to however much is actually in use in the particular memory 
pool that's being collected.  This is what people often refer to as the 
sawtooth pattern.


Here's a couple of screenshots.  The jconsole program is running on 
Windows 7, Solr is running on Linux.  One screenshot is the graph, the 
other is the VM summary where you can see that Solr has been running for 
nearly 8 days.  This is one of my production Solr servers, so some of 
the parameters are slightly different than what's on my wiki:


https://dl.dropboxusercontent.com/u/97770508/solr-jconsole.png
https://dl.dropboxusercontent.com/u/97770508/solr-jconsole-summary.png

If you do not have a GUI installed on the actual Solr machine, you'll 
need to use remote JMX to connect jconsole.  In the init script on my 
wiki page, you can see JMX options.  With those, you can tell a remote 
jconsole to use server.example.com:8686 instead of a local PID.  You can 
use any port you want that's not already in use instead of 8686.  
Running jconsole with -interval=1 will make the graph update once a 
second, I think it's every 5 seconds by default.


You can also hit reload on the dashboard page to see how memory usage 
changes over time, but it's not as useful as a graph.  Memory usage will 
not change by much if you are not actively querying or indexing.


Thanks,
Shawn



Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
I'm doing some performance testing against an 8-node Solr cloud cluster,
and I'm noticing some periodic slowness.


http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png

I'm doing random test searches against an Alias Collection made up of four
smaller (monthly) collections. Like this:

MasterCollection
|- Collection201308
|- Collection201309
|- Collection201310
|- Collection201311

The last collection is constantly updated. New documents are being added at
the rate of about 3 documents per second.

I believe the slowness may due be to NRT, but I'm not sure. How should I
investigate this?

If the slowness is related to NRT, how can I alleviate the issue without
disabling NRT?

Thanks Much!

-Dave


Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Erick Erickson
How real time is NRT? In particular, what are you commit settings?

And can you characterize periodic slowness? Queries that usually
take 500ms not tail 10s? Or 1s? How often? How are you measuring?

Details matter, a lot...

Best,
Erick




On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote:

 I'm doing some performance testing against an 8-node Solr cloud cluster,
 and I'm noticing some periodic slowness.


 http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png

 I'm doing random test searches against an Alias Collection made up of four
 smaller (monthly) collections. Like this:

 MasterCollection
 |- Collection201308
 |- Collection201309
 |- Collection201310
 |- Collection201311

 The last collection is constantly updated. New documents are being added at
 the rate of about 3 documents per second.

 I believe the slowness may due be to NRT, but I'm not sure. How should I
 investigate this?

 If the slowness is related to NRT, how can I alleviate the issue without
 disabling NRT?

 Thanks Much!

 -Dave



Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Mark Miller
Yes, more details…

Solr version, which garbage collector, how does heap usage look, cpu, etc.

- Mark

On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote:

 How real time is NRT? In particular, what are you commit settings?
 
 And can you characterize periodic slowness? Queries that usually
 take 500ms not tail 10s? Or 1s? How often? How are you measuring?
 
 Details matter, a lot...
 
 Best,
 Erick
 
 
 
 
 On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote:
 
 I'm doing some performance testing against an 8-node Solr cloud cluster,
 and I'm noticing some periodic slowness.
 
 
 http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
 
 I'm doing random test searches against an Alias Collection made up of four
 smaller (monthly) collections. Like this:
 
 MasterCollection
 |- Collection201308
 |- Collection201309
 |- Collection201310
 |- Collection201311
 
 The last collection is constantly updated. New documents are being added at
 the rate of about 3 documents per second.
 
 I believe the slowness may due be to NRT, but I'm not sure. How should I
 investigate this?
 
 If the slowness is related to NRT, how can I alleviate the issue without
 disabling NRT?
 
 Thanks Much!
 
 -Dave
 



Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
Lots of questions. Okay.

In digging a little deeper and looking at the config I see that
nrtModetrue/nrtMode is commented out.  I believe this is the default
setting. So I don't know if NRT is enabled or not. Maybe just a red herring.

I don't know what Garbage Collector we're using. In this test I'm running
Solr 4.5.1 using Jetty from the example directory.

The CPU on the 8 nodes all stay around 70% use during the test. The nodes
have 28GB of RAM. Java is using about 6GB and the rest is being used by OS
cache.

To perform the test we're running 200 concurrent threads in JMeter. The
threads hit HAProxy which loadbalances the requests among the nodes. Each
query is for a random word out of a list of about 10,000 words. Some of the
queries have faceting turned on.

Because we're heavily loading the system the queries are returning quite
slowly. For a simple search, the average response time was 300ms. The peak
response time was 11,000ms. The spikes in latency seem to occur about every
2.5 minutes.

I haven't spent that much time messing with SolrConfig, so most of the
settings are the out-of-the-box defaults.

Where should I start to look?

Thanks so much!

-Dave





On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote:

 Yes, more details…

 Solr version, which garbage collector, how does heap usage look, cpu, etc.

 - Mark

 On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  How real time is NRT? In particular, what are you commit settings?
 
  And can you characterize periodic slowness? Queries that usually
  take 500ms not tail 10s? Or 1s? How often? How are you measuring?
 
  Details matter, a lot...
 
  Best,
  Erick
 
 
 
 
  On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com
 wrote:
 
  I'm doing some performance testing against an 8-node Solr cloud cluster,
  and I'm noticing some periodic slowness.
 
 
  http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
 
  I'm doing random test searches against an Alias Collection made up of
 four
  smaller (monthly) collections. Like this:
 
  MasterCollection
  |- Collection201308
  |- Collection201309
  |- Collection201310
  |- Collection201311
 
  The last collection is constantly updated. New documents are being
 added at
  the rate of about 3 documents per second.
 
  I believe the slowness may due be to NRT, but I'm not sure. How should I
  investigate this?
 
  If the slowness is related to NRT, how can I alleviate the issue without
  disabling NRT?
 
  Thanks Much!
 
  -Dave
 


RE: Periodic Slowness on Solr Cloud

2013-11-21 Thread Doug Turnbull
Dave you might want to connect JVisualVm and see if there's any pattern
with latency and garbage collection. That's a frequent culprit for
periodic hits in latency.

More info here
http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html

There's a couple GC implementations in Java that can be tuned as needed

With JvisualVM You can also add the mbeans plugin to get a ton of
performance stats out of Solr that might help debug latency issues.

Doug

Sent from my Windows Phone From: Dave Seltzer
Sent: 11/21/2013 8:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Periodic Slowness on Solr Cloud
Lots of questions. Okay.

In digging a little deeper and looking at the config I see that
nrtModetrue/nrtMode is commented out.  I believe this is the default
setting. So I don't know if NRT is enabled or not. Maybe just a red herring.

I don't know what Garbage Collector we're using. In this test I'm running
Solr 4.5.1 using Jetty from the example directory.

The CPU on the 8 nodes all stay around 70% use during the test. The nodes
have 28GB of RAM. Java is using about 6GB and the rest is being used by OS
cache.

To perform the test we're running 200 concurrent threads in JMeter. The
threads hit HAProxy which loadbalances the requests among the nodes. Each
query is for a random word out of a list of about 10,000 words. Some of the
queries have faceting turned on.

Because we're heavily loading the system the queries are returning quite
slowly. For a simple search, the average response time was 300ms. The peak
response time was 11,000ms. The spikes in latency seem to occur about every
2.5 minutes.

I haven't spent that much time messing with SolrConfig, so most of the
settings are the out-of-the-box defaults.

Where should I start to look?

Thanks so much!

-Dave





On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote:

 Yes, more details…

 Solr version, which garbage collector, how does heap usage look, cpu, etc.

 - Mark

 On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  How real time is NRT? In particular, what are you commit settings?
 
  And can you characterize periodic slowness? Queries that usually
  take 500ms not tail 10s? Or 1s? How often? How are you measuring?
 
  Details matter, a lot...
 
  Best,
  Erick
 
 
 
 
  On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com
 wrote:
 
  I'm doing some performance testing against an 8-node Solr cloud cluster,
  and I'm noticing some periodic slowness.
 
 
  http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
 
  I'm doing random test searches against an Alias Collection made up of
 four
  smaller (monthly) collections. Like this:
 
  MasterCollection
  |- Collection201308
  |- Collection201309
  |- Collection201310
  |- Collection201311
 
  The last collection is constantly updated. New documents are being
 added at
  the rate of about 3 documents per second.
 
  I believe the slowness may due be to NRT, but I'm not sure. How should I
  investigate this?
 
  If the slowness is related to NRT, how can I alleviate the issue without
  disabling NRT?
 
  Thanks Much!
 
  -Dave
 


Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Doug Turnbull
Additional info on GC selection
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors

 If response time is more important than overall throughput and garbage
collection pauses must be kept shorter than approximately one second, then
select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one
or two processors are available, consider using incremental mode, described
below.

I'm not entirely certain of the implications of GC tuning for SolrCloud. I
imagine distributed searching is going to be as slow as the slowest core
being queried.

I'd also be curious as to the root-cause of any excess GC churn. It sounds
like you're doing a ton of random queries. This probably creates a lot of
evictions your caches. There's nothing really worth caching, so the caches
fill up and empty frequently, causing a lot of heap activity. If you expect
to have high-load and a ton of turnover in queries, then tuning down cache
size might help minimize GC churn.

Solr Meter is another great tool for your perf testing that can help get at
some of these caching issues. It gives you some higher-level stats about
cache eviction, etc.
https://code.google.com/p/solrmeter/

-Doug



On Thu, Nov 21, 2013 at 10:24 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 Dave you might want to connect JVisualVm and see if there's any pattern
 with latency and garbage collection. That's a frequent culprit for
 periodic hits in latency.

 More info here

 http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html

 There's a couple GC implementations in Java that can be tuned as needed

 With JvisualVM You can also add the mbeans plugin to get a ton of
 performance stats out of Solr that might help debug latency issues.

 Doug

 Sent from my Windows Phone From: Dave Seltzer
 Sent: 11/21/2013 8:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Periodic Slowness on Solr Cloud
 Lots of questions. Okay.

 In digging a little deeper and looking at the config I see that
 nrtModetrue/nrtMode is commented out.  I believe this is the default
 setting. So I don't know if NRT is enabled or not. Maybe just a red
 herring.

 I don't know what Garbage Collector we're using. In this test I'm running
 Solr 4.5.1 using Jetty from the example directory.

 The CPU on the 8 nodes all stay around 70% use during the test. The nodes
 have 28GB of RAM. Java is using about 6GB and the rest is being used by OS
 cache.

 To perform the test we're running 200 concurrent threads in JMeter. The
 threads hit HAProxy which loadbalances the requests among the nodes. Each
 query is for a random word out of a list of about 10,000 words. Some of the
 queries have faceting turned on.

 Because we're heavily loading the system the queries are returning quite
 slowly. For a simple search, the average response time was 300ms. The peak
 response time was 11,000ms. The spikes in latency seem to occur about every
 2.5 minutes.

 I haven't spent that much time messing with SolrConfig, so most of the
 settings are the out-of-the-box defaults.

 Where should I start to look?

 Thanks so much!

 -Dave





 On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com
 wrote:

  Yes, more details…
 
  Solr version, which garbage collector, how does heap usage look, cpu,
 etc.
 
  - Mark
 
  On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   How real time is NRT? In particular, what are you commit settings?
  
   And can you characterize periodic slowness? Queries that usually
   take 500ms not tail 10s? Or 1s? How often? How are you measuring?
  
   Details matter, a lot...
  
   Best,
   Erick
  
  
  
  
   On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com
  wrote:
  
   I'm doing some performance testing against an 8-node Solr cloud
 cluster,
   and I'm noticing some periodic slowness.
  
  
   http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
  
   I'm doing random test searches against an Alias Collection made up of
  four
   smaller (monthly) collections. Like this:
  
   MasterCollection
   |- Collection201308
   |- Collection201309
   |- Collection201310
   |- Collection201311
  
   The last collection is constantly updated. New documents are being
  added at
   the rate of about 3 documents per second.
  
   I believe the slowness may due be to NRT, but I'm not sure. How
 should I
   investigate this?
  
   If the slowness is related to NRT, how can I alleviate the issue
 without
   disabling NRT?
  
   Thanks Much!
  
   -Dave
  




-- 
Doug Turnbull
Search  Big Data Architect
OpenSource Connections http://o19s.com


Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
Thanks Doug!

One thing I'm not clear on is how do I know if this is in-fact related to
Garbage Collection. If you're right, and the cluster is only as slow as its
slowest link, how do I determine that this is GC. Do I have to run the
profiler on all eight nodes?

Or is it a matter of turning on the correct logging and then watching and
waiting.

Thanks!

-D


On Thu, Nov 21, 2013 at 11:20 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 Additional info on GC selection

 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors

  If response time is more important than overall throughput and garbage
 collection pauses must be kept shorter than approximately one second, then
 select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one
 or two processors are available, consider using incremental mode, described
 below.

 I'm not entirely certain of the implications of GC tuning for SolrCloud. I
 imagine distributed searching is going to be as slow as the slowest core
 being queried.

 I'd also be curious as to the root-cause of any excess GC churn. It sounds
 like you're doing a ton of random queries. This probably creates a lot of
 evictions your caches. There's nothing really worth caching, so the caches
 fill up and empty frequently, causing a lot of heap activity. If you expect
 to have high-load and a ton of turnover in queries, then tuning down cache
 size might help minimize GC churn.

 Solr Meter is another great tool for your perf testing that can help get
 at some of these caching issues. It gives you some higher-level stats about
 cache eviction, etc.
 https://code.google.com/p/solrmeter/

 -Doug



 On Thu, Nov 21, 2013 at 10:24 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

 Dave you might want to connect JVisualVm and see if there's any pattern
 with latency and garbage collection. That's a frequent culprit for
 periodic hits in latency.

 More info here

 http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html

 There's a couple GC implementations in Java that can be tuned as needed

 With JvisualVM You can also add the mbeans plugin to get a ton of
 performance stats out of Solr that might help debug latency issues.

 Doug

 Sent from my Windows Phone From: Dave Seltzer
 Sent: 11/21/2013 8:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Periodic Slowness on Solr Cloud
 Lots of questions. Okay.

 In digging a little deeper and looking at the config I see that
 nrtModetrue/nrtMode is commented out.  I believe this is the default
 setting. So I don't know if NRT is enabled or not. Maybe just a red
 herring.

 I don't know what Garbage Collector we're using. In this test I'm running
 Solr 4.5.1 using Jetty from the example directory.

 The CPU on the 8 nodes all stay around 70% use during the test. The nodes
 have 28GB of RAM. Java is using about 6GB and the rest is being used by OS
 cache.

 To perform the test we're running 200 concurrent threads in JMeter. The
 threads hit HAProxy which loadbalances the requests among the nodes. Each
 query is for a random word out of a list of about 10,000 words. Some of
 the
 queries have faceting turned on.

 Because we're heavily loading the system the queries are returning quite
 slowly. For a simple search, the average response time was 300ms. The peak
 response time was 11,000ms. The spikes in latency seem to occur about
 every
 2.5 minutes.

 I haven't spent that much time messing with SolrConfig, so most of the
 settings are the out-of-the-box defaults.

 Where should I start to look?

 Thanks so much!

 -Dave





 On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com
 wrote:

  Yes, more details…
 
  Solr version, which garbage collector, how does heap usage look, cpu,
 etc.
 
  - Mark
 
  On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   How real time is NRT? In particular, what are you commit settings?
  
   And can you characterize periodic slowness? Queries that usually
   take 500ms not tail 10s? Or 1s? How often? How are you measuring?
  
   Details matter, a lot...
  
   Best,
   Erick
  
  
  
  
   On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com
  wrote:
  
   I'm doing some performance testing against an 8-node Solr cloud
 cluster,
   and I'm noticing some periodic slowness.
  
  
   http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
  
   I'm doing random test searches against an Alias Collection made up of
  four
   smaller (monthly) collections. Like this:
  
   MasterCollection
   |- Collection201308
   |- Collection201309
   |- Collection201310
   |- Collection201311
  
   The last collection is constantly updated. New documents are being
  added at
   the rate of about 3 documents per second.
  
   I believe the slowness may due be to NRT, but I'm not sure. How
 should I
   investigate this?
  
   If the slowness is related to NRT, how can I

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Shawn Heisey
On 11/21/2013 6:41 PM, Dave Seltzer wrote:
 In digging a little deeper and looking at the config I see that
 nrtModetrue/nrtMode is commented out.  I believe this is the default
 setting. So I don't know if NRT is enabled or not. Maybe just a red herring.

I had never seen this setting before.  The default is true.  SolrCloud
requires that it be set to true.  Looks like it's a new parameter in
4.5, added by SOLR-4909.  From what I can tell reading the issue,
turning it off effectively disables soft commits.

https://issues.apache.org/jira/browse/SOLR-4909

You've said that you are adding about 3 documents per second, but you
haven't said anything about how often you are doing commits.  Erick's
question basically boils down to this:  How quickly after indexing do
you expect the changes to be visible on a search, and how often are you
doing commits?

Generally speaking (and ignoring the fact that nrtMode now exists), NRT
is not something you enable, it's something you try to achieve, by using
soft commits quickly and often, and by adjusting the configuration to
make the commits go faster.

If you are trying to keep the interval between indexing and document
visibility down to less than a few seconds (especially if it's less than
one second), then you are trying to achieve NRT.

There's a lot of information on the following wiki page about
performance problems.  This specific link is to the last part of that
page, which deals with slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

 I don't know what Garbage Collector we're using. In this test I'm running
 Solr 4.5.1 using Jetty from the example directory.

If you aren't using any tuning parameters beyond setting the max heap,
then you are using the default parallel collector.  It's a poor choice
for Solr unless your heap is very small.  At 6GB, yours isn't very
small.  It's not particularly huge either, but not small.

 The CPU on the 8 nodes all stay around 70% use during the test. The nodes
 have 28GB of RAM. Java is using about 6GB and the rest is being used by OS
 cache.

How big is your index?  If it's larger than about 30 GB, you probably
need more memory.  If it's much larger than about 40 GB, you definitely
need more memory.

 To perform the test we're running 200 concurrent threads in JMeter. The
 threads hit HAProxy which loadbalances the requests among the nodes. Each
 query is for a random word out of a list of about 10,000 words. Some of the
 queries have faceting turned on.

That's a pretty high query load.  If you want to get anywhere near top
performance out of it, you'll want to have enough memory to fit your
entire index into RAM.  You'll also need to reduce the load introduced
by indexing.  A large part of the load from indexing comes from commits.

 Because we're heavily loading the system the queries are returning quite
 slowly. For a simple search, the average response time was 300ms. The peak
 response time was 11,000ms. The spikes in latency seem to occur about every
 2.5 minutes.

I would bet that you're having one or both of the following issues:

1) Garbage collection issues from one or more of the following:
 a) Heap too small.
 b) Using the default GC instead of CMS with tuning.
2) General performance issues from one or more of the following:
 a) Not enough cache memory for your index size.
 b) Too-frequent commits.
 c) Commits taking a lot of time and resources due to cache warming.

With a high query and index load, any problems become magnified.

 I haven't spent that much time messing with SolrConfig, so most of the
 settings are the out-of-the-box defaults.

The defaults are very good for small to medium indexes and low to medium
query load.  If you have a big index and/or high query load, you'll
generally need to tune.

Thanks,
Shawn