Thanks Shawn, As mentioned previously, we are hard committing every 60 seconds, which we have been doing for years, and have had no issues until enabling CDCR. We have never seen large tlog sizes before, and even manually issuing a hard commit to the collection does not reduce the size of the tlogs. I believe this is because when using the CDCRUpdateLog the tlogs are not purged until the docs have been replicated over. Anyway, since we manually purged the tlogs they seem to now be staying at an acceptable size, so I don't think that is the cause. The documents are not abnormally large, maybe ~20 string/numeric fields with simple whitespace tokenization.
To answer your questions: -Solr version: 7.2.1 -What OS vendor and version Solr is running on: CentOS 6 -Total document count on the server (counting all index cores): 13 collections totaling ~60 million docs -Total index size on the server (counting all cores): ~60GB -What the total of all Solr heaps on the server is - 16GB heap (we had to increase for CDCR because it was using a lot more heap). -Whether there is software other than Solr on the server - No -How much total memory the server has installed - 64 GB All of this has been consistent for multiple years across multiple Solr versions and we have only started seeing this issue once we started using the CDCRUpdateLog and CDCR, hence why that is the only real thing we can point to. And again, the issue is only affecting 1 of the 13 collections on the server, so if it was hardware/heap/GC related then I would think we would be seeing it for every collection, not just one, as they all share the same resources. I will take a look at the GC logs, but I don't think that is the cause. The consistent nature of the slow performance doesn't really point to GC issues, and we have profiling set up in New Relic and it does not show any long/frequent GC pauses. We are going to try and rebuild the collection from scratch again this weekend as that has solved the issue in some lower environments, although it's not really consistent. At this point it's all we can think of to do. Thanks, Chris On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey <[email protected]> wrote: > On 6/12/2018 12:06 PM, Chris Troullis wrote: > > The issue we are seeing is with 1 collection in particular, after we set > up > > CDCR, we are getting extremely slow response times when retrieving > > documents. Debugging the query shows QTime is almost nothing, but the > > overall responseTime is like 5x what it should be. The problem is > > exacerbated by larger result sizes. IE retrieving 25 results is almost > > normal, but 200 results is way slower than normal. I can run the exact > same > > query multiple times in a row (so everything should be cached), and I > still > > see response times way higher than another environment that is not using > > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that > > we are using the CDCRUpdateLog. The problem started happening even before > > we enabled CDCR. > > > > In a lower environment we noticed that the transaction logs were huge > > (multiple gigs), so we tried stopping solr and deleting the tlogs then > > restarting, and that seemed to fix the performance issue. We tried the > same > > thing in production the other day but it had no effect, so now I don't > know > > if it was a coincidence or not. > > There is one other cause besides CDCR buffering that I know of for huge > transaction logs, and it has nothing to do with CDCR: A lack of hard > commits. It is strongly recommended to have autoCommit set to a > reasonably short interval (about a minute in my opinion, but 15 seconds > is VERY common). Most of the time openSearcher should be set to false > in the autoCommit config, and other mechanisms (which might include > autoSoftCommit) should be used for change visibility. The example > autoCommit settings might seem superfluous because they don't affect > what's searchable, but it is actually a very important configuration to > keep. > > Are the docs in this collection really big, by chance? > > As I went through previous threads you've started on the mailing list, I > have noticed that none of your messages provided some details that would > be useful for looking into performance problems: > > * What OS vendor and version Solr is running on. > * Total document count on the server (counting all index cores). > * Total index size on the server (counting all cores). > * What the total of all Solr heaps on the server is. > * Whether there is software other than Solr on the server. > * How much total memory the server has installed. > > If you name the OS, I can use that information to help you gather some > additional info which will actually show me most of that list. Total > document count is something that I cannot get from the info I would help > you gather. > > Something else that can cause performance issues is GC pauses. If you > provide a GC log (The script that starts Solr logs this by default), we > can analyze it to see if that's a problem. > > Attachments to messages on the mailing list typically do not make it to > the list, so a file sharing website is a better way to share large > logfiles. A paste website is good for log data that's smaller. > > Thanks, > Shawn > >
