Re: Periodic Slowness on Solr Cloud
Hi Shawn, Wow! Thank you for your considered reply! I'm going to dig into these issues, but I have a few questions: Regarding memory: Including duplicate data in shard replicas the entire index is 350GB. Each server hosts a total of 44GB of data. Each server has 28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java would take the memory it needs and leave the rest to the OS for cache. Given that I'll never need to serve 200 concurrent connections in production, do you think my servers need more memory? Should I be tinkering with -Xmx and -Xms? Regarding commits: My end-users want new data to be made available quickly. Thankfully I'm only inserting between 1 and 3 documents per second so the change-rate isn't crazy. Should I just slow down my commit frequency, and depend on soft-commits? If I do this, will the commits take even longer? Given 1000 documents, is it generally faster to do 10 commits of 100, or 1 commit of 1000? Thanks so much! -D On Fri, Nov 22, 2013 at 2:27 AM, Shawn Heisey s...@elyograg.org wrote: On 11/21/2013 6:41 PM, Dave Seltzer wrote: In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I had never seen this setting before. The default is true. SolrCloud requires that it be set to true. Looks like it's a new parameter in 4.5, added by SOLR-4909. From what I can tell reading the issue, turning it off effectively disables soft commits. https://issues.apache.org/jira/browse/SOLR-4909 You've said that you are adding about 3 documents per second, but you haven't said anything about how often you are doing commits. Erick's question basically boils down to this: How quickly after indexing do you expect the changes to be visible on a search, and how often are you doing commits? Generally speaking (and ignoring the fact that nrtMode now exists), NRT is not something you enable, it's something you try to achieve, by using soft commits quickly and often, and by adjusting the configuration to make the commits go faster. If you are trying to keep the interval between indexing and document visibility down to less than a few seconds (especially if it's less than one second), then you are trying to achieve NRT. There's a lot of information on the following wiki page about performance problems. This specific link is to the last part of that page, which deals with slow commits: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. If you aren't using any tuning parameters beyond setting the max heap, then you are using the default parallel collector. It's a poor choice for Solr unless your heap is very small. At 6GB, yours isn't very small. It's not particularly huge either, but not small. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. How big is your index? If it's larger than about 30 GB, you probably need more memory. If it's much larger than about 40 GB, you definitely need more memory. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. That's a pretty high query load. If you want to get anywhere near top performance out of it, you'll want to have enough memory to fit your entire index into RAM. You'll also need to reduce the load introduced by indexing. A large part of the load from indexing comes from commits. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I would bet that you're having one or both of the following issues: 1) Garbage collection issues from one or more of the following: a) Heap too small. b) Using the default GC instead of CMS with tuning. 2) General performance issues from one or more of the following: a) Not enough cache memory for your index size. b) Too-frequent commits. c) Commits taking a lot of time and resources due to cache warming. With a high query and index load, any problems become magnified. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. The defaults are very good for small to medium indexes and low to medium query load. If you have a big index and/or high query load, you'll generally need to tune. Thanks, Shawn
Re: Periodic Slowness on Solr Cloud
On 11/22/2013 8:13 AM, Dave Seltzer wrote: Regarding memory: Including duplicate data in shard replicas the entire index is 350GB. Each server hosts a total of 44GB of data. Each server has 28GB of memory. I haven't been setting -Xmx or -Xms, in the hopes that Java would take the memory it needs and leave the rest to the OS for cache. That's not how Java works. Java has a min heap and max heap setting. If you (or the auto-detected settings) tell it that the max heap is 4GB, it will only ever use slightly more than 4GB of RAM. If the app needs more than that, this will lead to terrible performance and/or out of memory errors. You can see how much the max heap is in the Solr admin UI dashboard - it'll be the right-most number on the JVM-Memory graph. On my 64-bit linux development machine with 16GB of RAM, it looks like Java defaults to a 4GB max heap. I have the heap size manually set to 7GB for Solr on that machine. The 6GB heap you have mentioned might not be enough, or it might be more than you need. It all depends on the kind of queries you are doing and exactly how Solr is configured. If it were me, I'd want a memory size between 48 and 64GB for a total index size of 44GB. Whether you really need that much is very dependent on your exact requirements, index makeup, and queries. To support the high query load you're sending, it probably is a requirement. More memory is likely to help performance, but I can't guarantee it without looking a lot deeper into your setup, and that's difficult to do via email. One thing I can tell you about checking performance - see how much of your 70% CPU usage is going to I/O wait. If it's more than a few percent, more memory might help. First try increasing the max heap by 1 or 2GB. Given that I'll never need to serve 200 concurrent connections in production, do you think my servers need more memory? Should I be tinkering with -Xmx and -Xms? If you'll never need to serve that many, test with a lower number. Make it higher than you'll need, but not a lot higher. The test with 200 connections isn't a bad idea -- you do want to stress test things way beyond your actual requirements, but you'll also want to see how it does with a more realistic load. Those are the min/max heap settings I just mentioned. IMHO you should set at least the max heap. If you want to handle a high load, it's a good idea to set the min heap to the same value as the max heap, so that it doesn't need to worry about hitting limits in order to allocate additional memory. It'll eventually allocate the max heap anyway. Regarding commits: My end-users want new data to be made available quickly. Thankfully I'm only inserting between 1 and 3 documents per second so the change-rate isn't crazy. Should I just slow down my commit frequency, and depend on soft-commits? If I do this, will the commits take even longer? Given 1000 documents, is it generally faster to do 10 commits of 100, or 1 commit of 1000? Fewer commits is always better. The amount of time they take isn't strongly affected by the number of new documents, unless there are a LOT of them. Figure out the timeframe that's the maximum amount of time (in milliseconds) that you think people are willing to wait for new data to become visible. Use that as your autoSoftCommit interval, or as the commitWithin parameter on your indexing requests. Set your autoCommit interval to around five minutes, as described on the wiki page I linked. If you are using auto settings and/or commitWithin, then you will never need to send an explicit commit command. Reducing commit frequency is one of the first things you'll want to try. Frequent commits use a *lot* of I/O and CPU resources. Although there are exceptions, most installs rarely NEED commits to happen more often than about once a minute, and longer intervals are often perfectly acceptable. Even in situations where a higher frequency is required, 10-15 seconds is often good enough. Getting sub-second commit times is *possible*, but usually requires significant hardware investment or changing the config in a way that is detrimental to query performance. Thanks, Shawn
Re: Periodic Slowness on Solr Cloud
On 11/22/2013 10:01 AM, Shawn Heisey wrote: You can see how much the max heap is in the Solr admin UI dashboard - it'll be the right-most number on the JVM-Memory graph. On my 64-bit linux development machine with 16GB of RAM, it looks like Java defaults to a 4GB max heap. I have the heap size manually set to 7GB for Solr on that machine. The 6GB heap you have mentioned might not be enough, or it might be more than you need. It all depends on the kind of queries you are doing and exactly how Solr is configured. Followup: I would also recommend starting with my garbage collection settings. This wiki page is linked on the wiki page I've already given you. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning You might need a script to start Solr. There is also a redhat-specific init script on that wiki page. I haven't included any instructions for installing it. Someone who already knows about init scripts won't have much trouble getting it working on a redhat-derived OS, and someone who doesn't will need extensive instructions or an install script, neither of which has been written. Thanks, Shawn
Re: Periodic Slowness on Solr Cloud
Thanks so much Shawn, I think you (and others) are completely right about this being heap and GC related. I just did a test while not indexing data and the same periodic slowness was observable. On to GC/Memory Tuning! Many Thanks! -Dave On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey s...@elyograg.org wrote: On 11/22/2013 10:01 AM, Shawn Heisey wrote: You can see how much the max heap is in the Solr admin UI dashboard - it'll be the right-most number on the JVM-Memory graph. On my 64-bit linux development machine with 16GB of RAM, it looks like Java defaults to a 4GB max heap. I have the heap size manually set to 7GB for Solr on that machine. The 6GB heap you have mentioned might not be enough, or it might be more than you need. It all depends on the kind of queries you are doing and exactly how Solr is configured. Followup: I would also recommend starting with my garbage collection settings. This wiki page is linked on the wiki page I've already given you. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning You might need a script to start Solr. There is also a redhat-specific init script on that wiki page. I haven't included any instructions for installing it. Someone who already knows about init scripts won't have much trouble getting it working on a redhat-derived OS, and someone who doesn't will need extensive instructions or an install script, neither of which has been written. Thanks, Shawn
Re: Periodic Slowness on Solr Cloud
You mentioned earlier that you are not setting -Xms/-Xmx; the values actually in use would then depend on the Java version, whether you're running 32- or 64-bit Java, whether Java thinks your machines are servers, and whether you have specified the -server flag – and possibly a few other things. What do you get if you run the command below? java -XX:+PrintFlagsFinal -version (Ref: http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5 for details; I stole the incantation above from that location, but there are more complete examples of how it could be used there.) Note: you need to adjust the command line so that it uses the same java version as the one you're using, and also add whatever JRE-modifying parameters that you use when starting Solr. On 22 Nov 2013, at 18:12 , Dave Seltzer dselt...@tveyes.com wrote: Thanks so much Shawn, I think you (and others) are completely right about this being heap and GC related. I just did a test while not indexing data and the same periodic slowness was observable. On to GC/Memory Tuning!
Re: Periodic Slowness on Solr Cloud
Wow. That is one noisy command! Full output is below. The grepped output looks like: [solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version | grep -i -E 'heapsize|permsize|version' uintx AdaptivePermSizeWeight= 20 {product} uintx ErgoHeapSizeLimit = 0 {product} uintx HeapSizePerGCThread = 87241520 {product} uintx InitialHeapSize := 447247104 {product} uintx LargePageHeapSizeThreshold= 134217728 {product} uintx MaxHeapSize := 7157579776 {product} uintx MaxPermSize = 85983232{pd product} uintx PermSize = 21757952{pd product} java version 1.7.0_45 Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) It looks like Java is correctly determining that this is in-fact a server. It seems to start with an Xmx of 25% of the RAM or around 7GB. So, in addition to tweaking GC I'm going to increase Xmx. Any advise as to how much memory should go to the Heap and how much should go to the OS disk cache? Should I split it 50/50? Again. Many Thanks. -Dave Full output from printflags -- [solr@searchtest07 ~]$ java -XX:+PrintFlagsFinal -version [Global flags] uintx AdaptivePermSizeWeight= 20 {product} uintx AdaptiveSizeDecrementScaleFactor = 4 {product} uintx AdaptiveSizeMajorGCDecayTimeScale = 10 {product} uintx AdaptiveSizePausePolicy = 0 {product} uintx AdaptiveSizePolicyCollectionCostMargin= 50 {product} uintx AdaptiveSizePolicyInitializingSteps = 20 {product} uintx AdaptiveSizePolicyOutputInterval = 0 {product} uintx AdaptiveSizePolicyWeight = 10 {product} uintx AdaptiveSizeThroughPutPolicy = 0 {product} uintx AdaptiveTimeWeight= 25 {product} bool AdjustConcurrency = false {product} bool AggressiveOpts= false {product} intx AliasLevel= 3 {C2 product} bool AlignVector = false {C2 product} intx AllocateInstancePrefetchLines = 1 {product} intx AllocatePrefetchDistance = 192 {product} intx AllocatePrefetchInstr = 0 {product} intx AllocatePrefetchLines = 4 {product} intx AllocatePrefetchStepSize = 64 {product} intx AllocatePrefetchStyle = 1 {product} bool AllowJNIEnvProxy = false {product} bool AllowNonVirtualCalls = false {product} bool AllowParallelDefineClass = false {product} bool AllowUserSignalHandlers = false {product} bool AlwaysActAsServerClassMachine = false {product} bool AlwaysCompileLoopMethods = false {product} bool AlwaysLockClassLoader = false {product} bool AlwaysPreTouch= false {product} bool AlwaysRestoreFPU = false {product} bool AlwaysTenure = false {product} bool AssertOnSuspendWaitFailure= false {product} intx Atomics = 0 {product} intx AutoBoxCacheMax = 128 {C2 product} uintx AutoGCSelectPauseMillis = 5000 {product} intx BCEATraceLevel= 0 {product} intx BackEdgeThreshold = 10 {pd product} bool BackgroundCompilation = true{pd product} uintx BaseFootPrintEstimate = 268435456 {product} intx BiasedLockingBulkRebiasThreshold = 20 {product} intx BiasedLockingBulkRevokeThreshold = 40 {product} intx BiasedLockingDecayTime= 25000 {product} intx BiasedLockingStartupDelay = 4000 {product} bool BindGCTaskThreadsToCPUs = false {product} bool BlockLayoutByFrequency= true{C2 product} intx BlockLayoutMinDiamondPercentage = 20 {C2 product} bool BlockLayoutRotateLoops= true{C2 product} bool BranchOnRegister = false {C2 product} bool BytecodeVerificationLocal = false {product} bool BytecodeVerificationRemote= true {product} bool C1OptimizeVirtualCallProfiling
Re: Periodic Slowness on Solr Cloud
So I made a few changes, but I still seem to be dealing with this pesky periodic slowness. Changes: 1) I'm now only forcing commits every 5 minutes. This was done by specifying commitWithin=30 when doing document adds. 2) I'm specifying an -Xmx12g to force the java heap to take more memory 3) I'm using the GC configuration parameters from the wiki ( http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning) The new startup args are: -DzkRun -Xmx12g -XX:+AggressiveOpts -XX:+UseLargePages -XX:+ParallelRefProcEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSMaxAbortablePrecleanTime=6000 -XX:CMSTriggerPermRatio=80 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=1 -XX:PretenureSizeThreshold=64m -XX:+CMSScavengeBeforeRemark -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8 -XX:TargetSurvivorRatio=90 -XX:SurvivorRatio=4 -XX:NewRatio=3 I'm still seeing the same periodic slowness about every 3.5 minutes. This slowness occurs whether or not I'm indexing content, so it appears to be unrelated to my commit schedule. See the most recent graph here: http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png To keep things consistent I'm still testing with 200 threads. When I test with 10 threads everything is much faster, but I still get the same periodic slowness. One thing I've noticed is that while Java is aware of the 12 gig heap, Solr doesn't seem to be using much of it. The system panel of the Web UI shows 11.5GB of JVM-Memory available, but only 2.11GB in use. Screenshot: http://farm4.staticflickr.com/3822/10999509515_72a9013ec7_o.jpg So I've told Java to use more memory. Do I need to tell Solr to use more as well? Thanks everyone! -Dave On Fri, Nov 22, 2013 at 12:09 PM, Shawn Heisey s...@elyograg.org wrote: On 11/22/2013 10:01 AM, Shawn Heisey wrote: You can see how much the max heap is in the Solr admin UI dashboard - it'll be the right-most number on the JVM-Memory graph. On my 64-bit linux development machine with 16GB of RAM, it looks like Java defaults to a 4GB max heap. I have the heap size manually set to 7GB for Solr on that machine. The 6GB heap you have mentioned might not be enough, or it might be more than you need. It all depends on the kind of queries you are doing and exactly how Solr is configured. Followup: I would also recommend starting with my garbage collection settings. This wiki page is linked on the wiki page I've already given you. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning You might need a script to start Solr. There is also a redhat-specific init script on that wiki page. I haven't included any instructions for installing it. Someone who already knows about init scripts won't have much trouble getting it working on a redhat-derived OS, and someone who doesn't will need extensive instructions or an install script, neither of which has been written. Thanks, Shawn
Re: Periodic Slowness on Solr Cloud
On 11/22/2013 2:17 PM, Dave Seltzer wrote: So I made a few changes, but I still seem to be dealing with this pesky periodic slowness. Changes: 1) I'm now only forcing commits every 5 minutes. This was done by specifying commitWithin=30 when doing document adds. 2) I'm specifying an -Xmx12g to force the java heap to take more memory 3) I'm using the GC configuration parameters from the wiki ( http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning) snip I'm still seeing the same periodic slowness about every 3.5 minutes. This slowness occurs whether or not I'm indexing content, so it appears to be unrelated to my commit schedule. It sounds like your heap isn't too small. Try reducing it to 5GB, then to 4GB after some testing, so more memory gets used by the OS disk cache. I would also recommend trying perhaps 100 threads on your test app rather than 200. Work your way up until you find the point where it just can't handle the load. See the most recent graph here: http://farm4.staticflickr.com/3819/10999523464_328814e358_o.png To keep things consistent I'm still testing with 200 threads. When I test with 10 threads everything is much faster, but I still get the same periodic slowness. One thing I've noticed is that while Java is aware of the 12 gig heap, Solr doesn't seem to be using much of it. The system panel of the Web UI shows 11.5GB of JVM-Memory available, but only 2.11GB in use. The memory usage in the admin UI is an instantaneous snapshot. If you use jvisualvm or jconsole (included in the Java JDK) to get a graph of memory usage, you'll see it change over time. As Java allocates objects, memory usage increases until it's using all the heap. Some amount of that allocation will be objects that are no longer in use -- garbage. Then garbage collection will kick in and memory usage will drop down to however much is actually in use in the particular memory pool that's being collected. This is what people often refer to as the sawtooth pattern. Here's a couple of screenshots. The jconsole program is running on Windows 7, Solr is running on Linux. One screenshot is the graph, the other is the VM summary where you can see that Solr has been running for nearly 8 days. This is one of my production Solr servers, so some of the parameters are slightly different than what's on my wiki: https://dl.dropboxusercontent.com/u/97770508/solr-jconsole.png https://dl.dropboxusercontent.com/u/97770508/solr-jconsole-summary.png If you do not have a GUI installed on the actual Solr machine, you'll need to use remote JMX to connect jconsole. In the init script on my wiki page, you can see JMX options. With those, you can tell a remote jconsole to use server.example.com:8686 instead of a local PID. You can use any port you want that's not already in use instead of 8686. Running jconsole with -interval=1 will make the graph update once a second, I think it's every 5 seconds by default. You can also hit reload on the dashboard page to see how memory usage changes over time, but it's not as useful as a graph. Memory usage will not change by much if you are not actively querying or indexing. Thanks, Shawn
Periodic Slowness on Solr Cloud
I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave
Re: Periodic Slowness on Solr Cloud
How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave
Re: Periodic Slowness on Solr Cloud
Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave
Re: Periodic Slowness on Solr Cloud
Lots of questions. Okay. In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. Where should I start to look? Thanks so much! -Dave On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote: Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave
RE: Periodic Slowness on Solr Cloud
Dave you might want to connect JVisualVm and see if there's any pattern with latency and garbage collection. That's a frequent culprit for periodic hits in latency. More info here http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html There's a couple GC implementations in Java that can be tuned as needed With JvisualVM You can also add the mbeans plugin to get a ton of performance stats out of Solr that might help debug latency issues. Doug Sent from my Windows Phone From: Dave Seltzer Sent: 11/21/2013 8:42 PM To: solr-user@lucene.apache.org Subject: Re: Periodic Slowness on Solr Cloud Lots of questions. Okay. In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. Where should I start to look? Thanks so much! -Dave On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote: Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave
Re: Periodic Slowness on Solr Cloud
Additional info on GC selection http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one or two processors are available, consider using incremental mode, described below. I'm not entirely certain of the implications of GC tuning for SolrCloud. I imagine distributed searching is going to be as slow as the slowest core being queried. I'd also be curious as to the root-cause of any excess GC churn. It sounds like you're doing a ton of random queries. This probably creates a lot of evictions your caches. There's nothing really worth caching, so the caches fill up and empty frequently, causing a lot of heap activity. If you expect to have high-load and a ton of turnover in queries, then tuning down cache size might help minimize GC churn. Solr Meter is another great tool for your perf testing that can help get at some of these caching issues. It gives you some higher-level stats about cache eviction, etc. https://code.google.com/p/solrmeter/ -Doug On Thu, Nov 21, 2013 at 10:24 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Dave you might want to connect JVisualVm and see if there's any pattern with latency and garbage collection. That's a frequent culprit for periodic hits in latency. More info here http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html There's a couple GC implementations in Java that can be tuned as needed With JvisualVM You can also add the mbeans plugin to get a ton of performance stats out of Solr that might help debug latency issues. Doug Sent from my Windows Phone From: Dave Seltzer Sent: 11/21/2013 8:42 PM To: solr-user@lucene.apache.org Subject: Re: Periodic Slowness on Solr Cloud Lots of questions. Okay. In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. Where should I start to look? Thanks so much! -Dave On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote: Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I alleviate the issue without disabling NRT? Thanks Much! -Dave -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: Periodic Slowness on Solr Cloud
Thanks Doug! One thing I'm not clear on is how do I know if this is in-fact related to Garbage Collection. If you're right, and the cluster is only as slow as its slowest link, how do I determine that this is GC. Do I have to run the profiler on all eight nodes? Or is it a matter of turning on the correct logging and then watching and waiting. Thanks! -D On Thu, Nov 21, 2013 at 11:20 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Additional info on GC selection http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then select the concurrent collector with -XX:+UseConcMarkSweepGC. If only one or two processors are available, consider using incremental mode, described below. I'm not entirely certain of the implications of GC tuning for SolrCloud. I imagine distributed searching is going to be as slow as the slowest core being queried. I'd also be curious as to the root-cause of any excess GC churn. It sounds like you're doing a ton of random queries. This probably creates a lot of evictions your caches. There's nothing really worth caching, so the caches fill up and empty frequently, causing a lot of heap activity. If you expect to have high-load and a ton of turnover in queries, then tuning down cache size might help minimize GC churn. Solr Meter is another great tool for your perf testing that can help get at some of these caching issues. It gives you some higher-level stats about cache eviction, etc. https://code.google.com/p/solrmeter/ -Doug On Thu, Nov 21, 2013 at 10:24 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Dave you might want to connect JVisualVm and see if there's any pattern with latency and garbage collection. That's a frequent culprit for periodic hits in latency. More info here http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html There's a couple GC implementations in Java that can be tuned as needed With JvisualVM You can also add the mbeans plugin to get a ton of performance stats out of Solr that might help debug latency issues. Doug Sent from my Windows Phone From: Dave Seltzer Sent: 11/21/2013 8:42 PM To: solr-user@lucene.apache.org Subject: Re: Periodic Slowness on Solr Cloud Lots of questions. Okay. In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. Where should I start to look? Thanks so much! -Dave On Thu, Nov 21, 2013 at 6:53 PM, Mark Miller markrmil...@gmail.com wrote: Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer dselt...@tveyes.com wrote: I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this: MasterCollection |- Collection201308 |- Collection201309 |- Collection201310 |- Collection201311 The last collection is constantly updated. New documents are being added at the rate of about 3 documents per second. I believe the slowness may due be to NRT, but I'm not sure. How should I investigate this? If the slowness is related to NRT, how can I
Re: Periodic Slowness on Solr Cloud
On 11/21/2013 6:41 PM, Dave Seltzer wrote: In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I had never seen this setting before. The default is true. SolrCloud requires that it be set to true. Looks like it's a new parameter in 4.5, added by SOLR-4909. From what I can tell reading the issue, turning it off effectively disables soft commits. https://issues.apache.org/jira/browse/SOLR-4909 You've said that you are adding about 3 documents per second, but you haven't said anything about how often you are doing commits. Erick's question basically boils down to this: How quickly after indexing do you expect the changes to be visible on a search, and how often are you doing commits? Generally speaking (and ignoring the fact that nrtMode now exists), NRT is not something you enable, it's something you try to achieve, by using soft commits quickly and often, and by adjusting the configuration to make the commits go faster. If you are trying to keep the interval between indexing and document visibility down to less than a few seconds (especially if it's less than one second), then you are trying to achieve NRT. There's a lot of information on the following wiki page about performance problems. This specific link is to the last part of that page, which deals with slow commits: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits I don't know what Garbage Collector we're using. In this test I'm running Solr 4.5.1 using Jetty from the example directory. If you aren't using any tuning parameters beyond setting the max heap, then you are using the default parallel collector. It's a poor choice for Solr unless your heap is very small. At 6GB, yours isn't very small. It's not particularly huge either, but not small. The CPU on the 8 nodes all stay around 70% use during the test. The nodes have 28GB of RAM. Java is using about 6GB and the rest is being used by OS cache. How big is your index? If it's larger than about 30 GB, you probably need more memory. If it's much larger than about 40 GB, you definitely need more memory. To perform the test we're running 200 concurrent threads in JMeter. The threads hit HAProxy which loadbalances the requests among the nodes. Each query is for a random word out of a list of about 10,000 words. Some of the queries have faceting turned on. That's a pretty high query load. If you want to get anywhere near top performance out of it, you'll want to have enough memory to fit your entire index into RAM. You'll also need to reduce the load introduced by indexing. A large part of the load from indexing comes from commits. Because we're heavily loading the system the queries are returning quite slowly. For a simple search, the average response time was 300ms. The peak response time was 11,000ms. The spikes in latency seem to occur about every 2.5 minutes. I would bet that you're having one or both of the following issues: 1) Garbage collection issues from one or more of the following: a) Heap too small. b) Using the default GC instead of CMS with tuning. 2) General performance issues from one or more of the following: a) Not enough cache memory for your index size. b) Too-frequent commits. c) Commits taking a lot of time and resources due to cache warming. With a high query and index load, any problems become magnified. I haven't spent that much time messing with SolrConfig, so most of the settings are the out-of-the-box defaults. The defaults are very good for small to medium indexes and low to medium query load. If you have a big index and/or high query load, you'll generally need to tune. Thanks, Shawn