Hi, i would be very grateful if somebody could clarify the following for me please. (0.20.5)
yesterday we lost a short table (~100 rows) in production without a trace. no matter how deep i looked in the logs of regionservers and the master, i haven't got a clue how it might have happened. When i looked at the table though, i did not find any files, which may mean that it never got flushed from WAL and got compacted. When i reconstituted it and ran compaction, the file did finally appear . also, in the .log (hlog) directory, i noticed some duplicate entries with both long and short host names (i.e. something like 'data4,...' and 'data4.foo.bar,...' ) which may result from the moment we decided to switch to /etc/hosts name resolution instead of dns (just to see if that'll improve our networking issues). So i have the following questions : 1 -- when is WAL (or hlog, if it's the same?) is triggered and the actual tablet file is built? is it a size-based threshold? is there an max age threshold? 2 -- if the region server crashes, its hlog is supposed to be split and recovered, right? is there a situation when hlog can be lost? I suppose it doesn't matter that the region server with the same name never goes online again (e.g. if it starts using short name instead of FQDN)? Thanks. -Dmitriy
