Both servers has the errors in the logs like these: ======== 2015-10-22 03:28:00,599 ERROR org.apache.accumulo.core.client.impl.Writer: error sending update to 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for re ad. ch : java.nio.channels.SocketChannel[connected local=/10.2.142.1:36148 remote=/10.2.130.1:9997] 2015-10-22 03:28:04,283 ERROR org.apache.accumulo.core.client.impl.Writer: error sending update to 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for re ad. ch : java.nio.channels.SocketChannel[connected local=/10.2.142.1:37047 remote=/10.2.130.1:9997] 2015-10-22 03:28:06,116 ERROR org.apache.accumulo.core.client.impl.Writer: error sending update to 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for re ad. ch : java.nio.channels.SocketChannel[connected local=/10.2.142.1:37167 remote=/10.2.130.1:9997] ========
On 10/22/15, Denis <[email protected]> wrote: > Hi > > Sometimes my Tablet Servers go into a strange state: they have some > very old scans (see picture: http://i.imgur.com/2sOUM99.png) and being > in this state they cannot be decomissioned gracefully using "accumulo > stop" - number of their tablets decreases down to some fixed number > (say from 6K tablets to 2K), not to zero. > It is diffucult to reproduce. > Now I have a live system with 2 tabletservers in this state. > Any suggestions how to catch the bug? >
