One of the really useful things about the Hadoop Summit and HBase meetup was 
hearing about what people are doing to manage, monitor and maintain the health 
of their systems.

So, I was just reading the Facebook SIGMOD 2011 paper 
<http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf>, and I came across this 
line: “Nowadays we run HBCK almost continuously against our production clusters 
to catch problems as early as possible.”

Does anyone else do this?  Can anyone from Facebook comment on what sort of 
problems you've caught this way?  My naïve thought is that hbck would be 
recommended after some kind of notable failure but I wouldn't have thought it 
likely to turn up problems if run routinely.  Maybe my perspective will be 
different when I get a real cluster going instead of a virtual one.  It would 
be nice to know what to watch out for.

Thanks.
joe

Reply via email to