HBCK can give you false positives while regions are in transition, during normal operations such as splits. Stack could probably say more here.
I don't know firsthand but I would assume that Facebook starts with HBCK results and then has some additional process for determining if something is really wrong. I think with a little effort it can be taught to be smart enough that it could become the basis for a Nagios sensor or what have you for widespread use. - Andy ----- Original Message ----- > From: Joseph Pallas <[email protected]> > To: [email protected] > Cc: > Sent: Sunday, July 3, 2011 10:13 PM > Subject: HBase operations > > One of the really useful things about the Hadoop Summit and HBase meetup was > hearing about what people are doing to manage, monitor and maintain the > health > of their systems. > > So, I was just reading the Facebook SIGMOD 2011 paper > <http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf>, and I came across > this line: “Nowadays we run HBCK almost continuously against our production > clusters to catch problems as early as possible.” > > Does anyone else do this? Can anyone from Facebook comment on what sort of > problems you've caught this way? My naïve thought is that hbck would be > recommended after some kind of notable failure but I wouldn't have thought > it likely to turn up problems if run routinely. Maybe my perspective will be > different when I get a real cluster going instead of a virtual one. It would > be > nice to know what to watch out for. > > Thanks. > joe >
