On 7/2/2018 1:40 PM, Joe Obernberger wrote: > Hi All - having this same problem again with a large index in HDFS. A > replica needs to recover, and it just spins retrying over and over > again. Any ideas? Is there an adjustable timeout? > > Screenshot: > http://lovehorsepower.com/images/SolrShot1.jpg
There is considerably more log detail available than can be seen in the screenshot. Can you please make your solr.log file from this server available so we can see full error and warning log messages, and let us know the exact Solr version that wrote the log? You'll probably need to use a file sharing site, and make sure the file is available until after the problem has been examined. Attachments sent to the mailing list are almost always stripped. Based on the timestamps in the screenshot, it is taking about 22 to 24 seconds to transfer 1750073344 bytes. Which calculates to right around the 75 MB per second rate that you were configuring in your last email thread. In order for that single large file to transfer successfully, you're going to need a timeout of at least 40 seconds. Based on what I see, it sounds like the timeout has been set to 20 seconds. The default client socket timeout on replication should be about two minutes, which would be plenty for a file of that size to transfer. This might be a timeout issue, but without seeing the full log and knowing the exact version of Solr that created it, it is difficult to know for sure where the problem might be or what can be done to fix it. We will need that logfile. If there are multiple servers involved, we may need logfiles from both ends of the replication. Do you have any config in solrconfig.xml for the /replication handler other than the maxWriteMBPerSec config you showed last time? Have you configured anything (particularly a socket timeout or sotimeout setting) to a value near 20 or 20000? Thanks, Shawn