HBCK can give you false positives while regions are in transition, during 
normal operations such as splits. Stack could probably say more here. 

I don't know firsthand but I would assume that Facebook starts with HBCK 
results and then has some additional process for determining if something is 
really wrong. 

I think with a little effort it can be taught to be smart enough that it could 
become the basis for a Nagios sensor or what have you for widespread use.

   - Andy



----- Original Message -----
> From: Joseph Pallas <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Sunday, July 3, 2011 10:13 PM
> Subject: HBase operations
> 
> One of the really useful things about the Hadoop Summit and HBase meetup was 
> hearing about what people are doing to manage, monitor and maintain the 
> health 
> of their systems.
> 
> So, I was just reading the Facebook SIGMOD 2011 paper 
> <http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf>, and I came across 
> this line: “Nowadays we run HBCK almost continuously against our production 
> clusters to catch problems as early as possible.”
> 
> Does anyone else do this?  Can anyone from Facebook comment on what sort of 
> problems you've caught this way?  My naïve thought is that hbck would be 
> recommended after some kind of notable failure but I wouldn't have thought 
> it likely to turn up problems if run routinely.  Maybe my perspective will be 
> different when I get a real cluster going instead of a virtual one.  It would 
> be 
> nice to know what to watch out for.
> 
> Thanks.
> joe
>

Reply via email to