St. Ack, just to make sure: by cdh3b2 aka 'hadoop-append' you imply hadoop 0.20.2+320 from cdh3 distro, right?
Thank you. -Dmitriy On Tue, Sep 28, 2010 at 9:15 AM, Stack <[email protected]> wrote: > On Mon, Sep 27, 2010 at 7:52 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > Hi, > > > > i would be very grateful if somebody could clarify the following for me > > please. (0.20.5) > > > > yesterday we lost a short table (~100 rows) in production without a > trace. > > no matter how deep i looked in the logs of regionservers and the master, > i > > haven't got a clue how it might have happened. > > > > When i looked at the table though, i did not find any files, which may > mean > > that it never got flushed from WAL and got compacted. When i > reconstituted > > it and ran compaction, the file did finally appear . > > > > What did you do to 'reconstitute'? > > FYI, edits go first to WAL and then to memstore. A file will not > appear in the filesystem until memstore flushes. 100 rows is probably > not enough to bring on a flush. > > > > also, in the .log (hlog) directory, i noticed some duplicate entries with > > both long and short host names (i.e. something like 'data4,...' and > > 'data4.foo.bar,...' ) which may result from the moment we decided to > switch > > to /etc/hosts name resolution instead of dns (just to see if that'll > improve > > our networking issues). > > > > Yeah, probably. > > Our hostname lookup is done once in 0.90.x and forever after we keep > on w/ that name regardless so this should not happen going forward. > > > > So i have the following questions : > > > > 1 -- when is WAL (or hlog, if it's the same?) is triggered and the actual > > tablet file is built? is it a size-based threshold? is there an max age > > threshold? > > Yes, the hbase WAL is also referred to as hlog and our WAL is > implemented by the o.a.h.h.regionserver.wal.HLog class. > > Every edit first is appended to the WAL that each regionserver keeps up. > > But, in 0.20 hbase, our WAL is mostly ineffective given as there is no > append support in hadoop 0.20 hdfs; basically only if the file is > successfully closed will edits be preserved (This state has changed in > 0.89.x in that the WAL append now works Make sure you are running > with the hadoop 0.20-append branch or CDH3b2 to ensure your 0.89.x > install WAL works properly). > > > > 2 -- if the region server crashes, its hlog is supposed to be split and > > recovered, right? > > Yes > > > is there a situation when hlog can be lost? I suppose it > > doesn't matter that the region server with the same name never goes > online > > again (e.g. if it starts using short name instead of FQDN)? > > > > I'd have to check the code but I think the messing w/ hostnames would > not be a reason to lose edits. I think we go ahead and split the logs > for files even if they are not associated with a particular host; i.e. > though you "changed" hostnames, and our WALs are associated with > hosts, we pick up any straggler log files anyways. Check your master > logs to confirm. > > St.Ack >
