Oh, it occurs to me that this may be related to the WAL bugs that Keith fixed for 1.9.1... which could affect the metadata table recovery after a failure.
On Fri, May 11, 2018 at 6:11 PM Michael Wall <mjw...@gmail.com> wrote: > Adam, > > Do you have GC logs? Can you see if those missing RFiles were removed by > the GC process? That could indicate you somehow got old metadata info > replayed. Also, the rfiles increment so compare the current rfile names in > the srv.dir directory vs what is in the metadata table. Are the existing > files after files in the metadata. Finally, pick a few of the missing > files and grep all your master and tserver logs to see if you can learn > anything. This sounds ungood. > > Mike > > On Fri, May 11, 2018 at 6:06 PM Christopher <ctubb...@apache.org> wrote: > >> This is strange. I've only ever seen this when HDFS has reported >> problems, such as missing blocks, or another obvious failure. What is your >> durability settings (were WALs turned on)? >> >> On Fri, May 11, 2018 at 12:45 PM Adam J. Shook <adamjsh...@gmail.com> >> wrote: >> >>> Hello all, >>> >>> On one of our clusters, there are a good number of missing RFiles from >>> HDFS, however HDFS is not/has not reported any missing blocks. We were >>> experiencing issues with HDFS; some flapping DataNode processes that needed >>> more heap. >>> >>> I don't anticipate I can do much besides create a bunch of empty RFiles >>> (open to suggestions). My question is, Is it possible that Accumulo could >>> have written the metadata for these RFiles but failed to write it to HDFS? >>> In which case it would have been re-tried later and the data was persisted >>> to a different RFile? Or is it an 'RFile is in Accumulo metadata if and >>> only if it is in HDFS' situation? >>> >>> Accumulo 1.8.1 on HDFS 2.6.0. >>> >>> Thank you, >>> --Adam >>> >>