Re: Question on missing RFiles

2018-05-16 Thread Adam J. Shook
Thanks for all of your help. We have a peer cluster that we'll be using to do some data reconciliation. On Wed, May 16, 2018 at 11:29 AM, Michael Wall wrote: > Since the rfiles on disk are "later" then the ones references, I tend to > think old metadata got rewritten. Since

Re: Question on missing RFiles

2018-05-16 Thread Michael Wall
Since the rfiles on disk are "later" then the ones references, I tend to think old metadata got rewritten. Since you can't get a timeline to better understand what happened, the only think I can think of is reingest all data since a known good point. And then do thing to make the future better

Re: Question on missing RFiles

2018-05-16 Thread Adam J. Shook
I tried building a timeline but the logs are just not there. We weren't sending the debug logs to Splunk due to the verbosity, but we may be tweaking the log4j settings a bit to make sure we get the log data stored in the event this happens again. This very well could be attributed to the

Re: Question on missing RFiles

2018-05-14 Thread Michael Wall
Can you pick some of the files that are missing and search through your logs to put together a timeline? See if you can find that file for a specific tablet. Then grab all the logs for when a file was created as result of a compaction, and a when a file was included in compaction for that table.

Re: Question on missing RFiles

2018-05-12 Thread Adam J. Shook
WALs are turned on. Durability is set to flush for all tables except for root and metadata which are sync. The current rfile names on HDFS and in the metadata table are greater than the files that are missing. Searched through all of our current and historical logs in Splunk (which are only

Re: Question on missing RFiles

2018-05-11 Thread Christopher
Oh, it occurs to me that this may be related to the WAL bugs that Keith fixed for 1.9.1... which could affect the metadata table recovery after a failure. On Fri, May 11, 2018 at 6:11 PM Michael Wall wrote: > Adam, > > Do you have GC logs? Can you see if those missing RFiles

Re: Question on missing RFiles

2018-05-11 Thread Michael Wall
Adam, Do you have GC logs? Can you see if those missing RFiles were removed by the GC process? That could indicate you somehow got old metadata info replayed. Also, the rfiles increment so compare the current rfile names in the srv.dir directory vs what is in the metadata table. Are the

Re: Question on missing RFiles

2018-05-11 Thread Christopher
This is strange. I've only ever seen this when HDFS has reported problems, such as missing blocks, or another obvious failure. What is your durability settings (were WALs turned on)? On Fri, May 11, 2018 at 12:45 PM Adam J. Shook wrote: > Hello all, > > On one of our

Question on missing RFiles

2018-05-11 Thread Adam J. Shook
Hello all, On one of our clusters, there are a good number of missing RFiles from HDFS, however HDFS is not/has not reported any missing blocks. We were experiencing issues with HDFS; some flapping DataNode processes that needed more heap. I don't anticipate I can do much besides create a bunch