Thanks, I'll check it out. There weren't any obviously errors around hardware issues.
Is it likely that the TTransportException and commits held are related? On Fri, 12 Jul 2019, 18:56 Josh Elser, <[email protected]> wrote: > "Commits are held" can be for a couple of different reasons, some from > within Accumulo and some from outside. > > In general, there is an expected ordering of mutations that a > TabletServer has to apply. A "commit" here is the application of some > mutations by a TabletServer to the memory map and the WAL. > > This could be completely normal and you have some clients which are just > writing "faster" than your TabletServers can keep up with. This could be > indicative of slow flushes from memory maps to HDFS. This could be GC > pressure causing slowness in the TServer. > > I'd suggest to take a step back: > > * Look at other messages in the DEBUG log for the tabletserver to see if > you Accumulo is telling you what it's waiting on (before and after you > see the message about commits being held) > * Check that you're using the Accumulo native memory maps > * Sanity-check performance of HDFS > * Get a thread dump from a TabletServer in this state. > > If the problem truly only happens on two servers, it might indicate some > bad hardware on that device (memory with errors, a disk that flips to r/o). > > - Josh > > On 7/12/19 10:57 AM, James Srinivasan wrote: > > Hi all, > > > > We have a Kerberized Accumulo 1.7.0 (HDP3) cluster with 25 tservers. > > Recently, a couple of clients were reporting errrors writing data (fat > > fingered from cluster, apologies for typos): > > > > > org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures > > ... > > Caused by: > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer > > > > Digging into the logs on the problematic tservers, I think the > > following was firing, but don't know why: > > > > > https://github.com/apache/thrift/blob/0.9.1/lib/java/src/org/apache/thrift/transport/TIOStreamTransport.java#L132 > > > > Also, the tserver logs report: > > > > Internal error processing closeUpdate....TException: Commits are held > > > > For now, I have stopped the two problematic tservers but any help > > debugging would be much appreciated. > > > > James > > >
