Thanks for the quick responses Shawn & Erick! Just to clarify another few points: 1. Does having a larger heap size impact ingesting additional documents to the index (all CRUD operations) onto a TLOG? 2. Does having a larger ram configured machine (in this case 32gb) affect ingestion on TLOGS also? 3. We are currently routing queries via Amazon ASG / Load Balancer. Is this one of the recommended ways to set up SOLR infrastructure?
Best Regards, Ash On Thu, Jul 19, 2018 at 12:56 AM Erick Erickson <erickerick...@gmail.com> wrote: > There's little good reason to _not_ route searches to your TLOG > replicas. The only difference between the PULL and TLOG replicas is > that the TLOG replicas get a raw copy of the incoming document from > the leader and write them to the TLOG. I.e. there's some additional > I/O. > > It's possible that if you have extremely heavy indexing you might > notice some additional load on the TLOG .vs. PULL replicas, but from > what you've said I doubt you have that much indexing traffic. > > So basically I'd configure my TLOG and PULL replicas pretty much > identically and search them both. > > Best, > Erick > > On Wed, Jul 18, 2018 at 7:46 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 7/18/2018 12:04 AM, Ash Ramesh wrote: > >> > >> I have a quick question about what the memory requirements for TLOG > >> machines are on 7.3.1. We currently run replication where there are 3 > >> TLOGs > >> with 8gb ram (2gb heap) and N PULL replicas with 32gb ram (4gb heap). We > >> have > 10M documents (1 collection) with the index size being ~ 17gb. We > >> send all read traffic to the PULLs and send Updates and Additions to the > >> Leader TLOG. > >> > >> We are wondering how this setup can affect performance for replication, > >> etc. We are thinking of increasing the heap of the TLOG to 4gb but > leaving > >> the total memory on the machine at 8gb. What will that do to > performance? > >> We also expect our index to grow 3/4x in the next 6 months. > > > > > > Performance has more to do with index size and memory size than the type > of > > replication you're doing. > > > > SolrCloud will load balance queries across the cloud, so your low-memory > > TLOG replicas are most likely handling queries as well. In a SolrCloud > > cluster, a query is not necessarily handled by the machine that you send > the > > query to. > > > > With memory resources that low compared to index size, the 8GB machines > > probably do not perform queries as well as the 32GB machines. If you > > increase the heap to 4GB, that will only leave 4GB available for the OS > disk > > cache, and that's going to drop query performance even further. > > > > There is a feature in Solr 7.4 that will allow you to prefer certain > replica > > types, so you can tell Solr that it should prefer PULL replicas. But > since > > you're running 7.3.1, you don't have that feature. > > > > https://issues.apache.org/jira/browse/SOLR-11982 > > > > There is also a "preferLocalShards" parameter that has existed for longer > > than the new feature mentioned above. This tells Solr that it should not > > load balance queries in the cloud if there is a local index that can > satisfy > > the query. This parameter should only be used if you have an external > load > > balancer. > > > > Indexing is a heap-intensive operation that doesn't benefit much from > having > > a lot of extra memory for the operating system. I have no idea whether > 2GB > > of heap is enough or not. Increasing the heap size MIGHT make > performance > > better, or it might make no difference at all. > > > > Thanks, > > Shawn > > > -- *P.S. We've launched a new blog to share the latest ideas and case studies from our team. Check it out here: product.canva.com <http://product.canva.com/>. *** ** <https://canva.com>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>