Hello, We have a 3 node SolrCloud instance with 3 collections (roughly 1mn, 2mn, and 8mn docs) in 1 shard each. We have set this up to have 2 replicas for each of the collections for redundancy in Production.
Each night I get a burst of indexing due to updates of the documents that we index changing. And each night I see timeout errors (and sometimes the Task Queue errors) for the transaction collection. (We’re using pysolr to do the indexing with 5 parallel processes). Typically they will resolve themselves after a few tries, but sometimes with larger amounts of changes it’s taking a few hours and longer than I’d like. Most common error: Connection to server 'https://<redacted>/solr/transaction_solrize/update/' timed out: HTTPSConnectionPool(host=’<redacted>', port=443): Read timed out. (read timeout=60) Second most common error: Solr responded with an error (HTTP 500): [Reason: Task queue processing has stalled for 20011 ms with 0 remaining elements to process.] When we were designing the system, we played around autoCommit and softCommit settings following this guide<https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/>. This was mainly to optimize the initial bulk load or ReIndex after a Schema change. We landed on a 15sec autoCommit and no soft-commit, as the only thing that seemed to mitigate the timeout issue was to not have any Replicas while Indexing heavily, then add them back after. <autoCommit> <maxTime>15000</maxTime> <openSearcher>true</openSearcher> </autoCommit> This works fine for a ReIndex as I do it side-by-side using aliases, but this doesn’t really work for our nightly updates, and I’d like the keep the replicas active at all times for the active collections. I also still do get the “Read timed out” errors even with no replicas, so that doesn’t seem to be the total silver bullet, but does really help. Can anyone recommend some next steps for troubleshooting this issue and how to reduce the timeouts? I haven’t tried extending the autoCommit time as I was concerned with tlog growing too large. We don’t have any strong requirement for Real Time Searching. Config details: Each Node: 9 GB RAM -Xms3g -Xmx3g -Xss256k https://iatistandard.org/en/ Thank you! Nik Osvalds | IATI Developer Development Initiatives, First Floor Centre, The Quorum, Bond Street South, Bristol, BS1 3AE, UK T: +44 (0) 1179 272 505 devinit.org<http://www.devinit.org/> | @devinitorg<https://twitter.com/devinitorg> Sign up<https://us11.list-manage.com/subscribe?u=a829237ca0cf1470615c7f059&id=ce30e2af0f> to our newsletter and topic updates [signature_1120764141] Read more about our work and its impact<https://devinit.org/what-we-do> Development Initiatives is the trading name for: Development Initiatives Poverty Research Ltd, a not-for-profit company. Company No. 06368740 and DI International Ltd. Company No. 5802543, both registered in the UK. Registered Address: First Floor Centre, The Quorum, Bond Street South, Bristol, BS1 3AE, UK. Development Initiatives Poverty Research America Inc, is a 501(c)3 charity registered in the US. Registration number 5737757. Registered Address: 1209 Orange Street, Wilmington, New Castle County, Delaware 19801, US. DISCLAIMER: This email and any attachments are confidential and intended solely for the use of the individual or organisation to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Development Initiatives. If you have received this email in error, please delete it and notify the sender. ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________