WALs are turned on.  Durability is set to flush for all tables except for
root and metadata which are sync.  The current rfile names on HDFS and in
the metadata table are greater than the files that are missing.   Searched
through all of our current and historical logs in Splunk (which are only
INFO level or higher).  Issues from the logs:

* Problem reports saying the files are not found
* IllegalStateException saying the rfile is closed when it tried to load
the Bloom filter (likely the flappy DataNode)
* IOException when reading the file saying Stream is closed (likely the
flappy DataNode)

Nothing in the GC logs -- all the above errors are in the tablet server
logs.  The logs may have rolled over, though, and our debug logs don't make
it into Splunk.

--Adam

On Fri, May 11, 2018 at 6:16 PM, Christopher <ctubb...@apache.org> wrote:

> Oh, it occurs to me that this may be related to the WAL bugs that Keith
> fixed for 1.9.1... which could affect the metadata table recovery after a
> failure.
>
> On Fri, May 11, 2018 at 6:11 PM Michael Wall <mjw...@gmail.com> wrote:
>
>> Adam,
>>
>> Do you have GC logs?  Can you see if those missing RFiles were removed by
>> the GC process?  That could indicate you somehow got old metadata info
>> replayed.  Also, the rfiles increment so compare the current rfile names in
>> the srv.dir directory vs what is in the metadata table.  Are the existing
>> files after files in the metadata.  Finally, pick a few of the missing
>> files and grep all your master and tserver logs to see if you can learn
>> anything.  This sounds ungood.
>>
>> Mike
>>
>> On Fri, May 11, 2018 at 6:06 PM Christopher <ctubb...@apache.org> wrote:
>>
>>> This is strange. I've only ever seen this when HDFS has reported
>>> problems, such as missing blocks, or another obvious failure. What is your
>>> durability settings (were WALs turned on)?
>>>
>>> On Fri, May 11, 2018 at 12:45 PM Adam J. Shook <adamjsh...@gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> On one of our clusters, there are a good number of missing RFiles from
>>>> HDFS, however HDFS is not/has not reported any missing blocks.  We were
>>>> experiencing issues with HDFS; some flapping DataNode processes that needed
>>>> more heap.
>>>>
>>>> I don't anticipate I can do much besides create a bunch of empty RFiles
>>>> (open to suggestions).  My question is, Is it possible that Accumulo could
>>>> have written the metadata for these RFiles but failed to write it to HDFS?
>>>> In which case it would have been re-tried later and the data was persisted
>>>> to a different RFile?  Or is it an 'RFile is in Accumulo metadata if and
>>>> only if it is in HDFS' situation?
>>>>
>>>> Accumulo 1.8.1 on HDFS 2.6.0.
>>>>
>>>> Thank you,
>>>> --Adam
>>>>
>>>

Reply via email to