Re: Can't recover - HDFS

Shawn Heisey Mon, 02 Jul 2018 17:32:52 -0700

On 7/2/2018 1:40 PM, Joe Obernberger wrote:
> Hi All - having this same problem again with a large index in HDFS.  A
> replica needs to recover, and it just spins retrying over and over
> again.  Any ideas?  Is there an adjustable timeout?
>
> Screenshot:
> http://lovehorsepower.com/images/SolrShot1.jpg


There is considerably more log detail available than can be seen in the
screenshot.  Can you please make your solr.log file from this server
available so we can see full error and warning log messages, and let us
know the exact Solr version that wrote the log?  You'll probably need to
use a file sharing site, and make sure the file is available until after
the problem has been examined.  Attachments sent to the mailing list are
almost always stripped.

Based on the timestamps in the screenshot, it is taking about 22 to 24
seconds to transfer 1750073344 bytes.  Which calculates to right around
the 75 MB per second rate that you were configuring in your last email
thread.  In order for that single large file to transfer successfully,
you're going to need a timeout of at least 40 seconds.  Based on what I
see, it sounds like the timeout has been set to 20 seconds.  The default
client socket timeout on replication should be about two minutes, which
would be plenty for a file of that size to transfer.

This might be a timeout issue, but without seeing the full log and
knowing the exact version of Solr that created it, it is difficult to
know for sure where the problem might be or what can be done to fix it. 
We will need that logfile.  If there are multiple servers involved, we
may need logfiles from both ends of the replication.

Do you have any config in solrconfig.xml for the /replication handler
other than the maxWriteMBPerSec config you showed last time?

Have you configured anything (particularly a socket timeout or sotimeout
setting) to a value near 20 or 20000?

Thanks,
Shawn

Re: Can't recover - HDFS

Reply via email to