On Mon, Sep 27, 2010 at 7:52 PM, Dmitriy Lyubimov <[email protected]> wrote:
> Hi,
>
> i would be very grateful if somebody could clarify the following for me
> please.  (0.20.5)
>
> yesterday we lost a short table (~100 rows) in production without a trace.
> no matter how deep i looked in the logs of regionservers and the master, i
> haven't got a clue how it might have happened.
>
> When i looked at the table though, i did not find any files, which may mean
> that it never got flushed from WAL and got compacted. When i reconstituted
> it and ran compaction, the file did  finally appear .
>

What did you do to 'reconstitute'?

FYI, edits go first to WAL and then to memstore.  A file will not
appear in the filesystem until memstore flushes.  100 rows is probably
not enough to bring on a flush.


> also, in the .log (hlog) directory, i noticed some duplicate entries with
> both long and short host names (i.e. something like 'data4,...' and
> 'data4.foo.bar,...' ) which may result from the moment we decided to switch
> to /etc/hosts name resolution instead of dns (just to see if that'll improve
> our networking issues).
>

Yeah, probably.

Our hostname lookup is done once in 0.90.x and forever after we keep
on w/ that name regardless so this should not happen going forward.


> So i have the following questions :
>
> 1 -- when is WAL (or hlog, if it's the same?) is triggered and the actual
> tablet file is built? is it a size-based threshold? is there an max age
> threshold?

Yes, the hbase WAL is also referred to as hlog and our WAL is
implemented by the o.a.h.h.regionserver.wal.HLog class.

Every edit first is appended to the WAL that each regionserver keeps up.

But, in 0.20 hbase, our WAL is mostly ineffective given as there is no
append support in hadoop 0.20 hdfs; basically only if the file is
successfully closed will edits be preserved (This state has changed in
0.89.x in that the WAL append now works  Make sure you are running
with the hadoop 0.20-append branch or CDH3b2 to ensure your 0.89.x
install WAL works properly).


> 2 -- if the region server crashes, its hlog is supposed to be split and
> recovered, right?

Yes


 is there a situation when hlog can be lost? I suppose it
> doesn't matter that the region server with the same name never goes online
> again (e.g. if it starts using short name instead of FQDN)?
>

I'd have to check the code but I think the messing w/ hostnames would
not be a reason to lose edits.  I think we go ahead and split the logs
for files even if they are not associated with a particular host; i.e.
though you "changed" hostnames, and our WALs are associated with
hosts, we pick up any straggler log files anyways.  Check your master
logs to confirm.

St.Ack

Reply via email to