Very good explanation (and food for thinking) about using bloom filters in HBase in answers here: http://www.quora.com/How-are-bloom-filters-used-in-HBase.
Should we put the link to it from Apache HBase book (ref guide)? Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia <[email protected]>wrote: > On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen <[email protected]> > wrote: > > > Mohit, > > > > According to HBase: The Definitive Guide, > > > > The row+column Bloom filter is useful when you cannot batch updates for a > > specific row, and end up with store files which all contain parts of the > > row. The more specific row+column filter can then identify which of the > > files contain the data you are requesting. Obviously, if you always load > > the entire row, this filter is once again hardly useful, as the region > > server will need to load the matching block out of each file anyway. > Since > > the row+column filter will require more storage, you need to do the math > to > > determine whether it is worth the extra resources. > > > > Thanks! I have a timeseries data so I am thinking I should enable bloom > filters for only rows > > > > > > > ~ Minh > > > > On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <[email protected] > > >wrote: > > > > > Is it advisable to enable bloom filters on the column family? > > > > > > Also, why is it called global kill switch? > > > > > > Bloom Filter Configuration > > > 2.9.1. io.hfile.bloom.enabled global kill switch > > > > > > io.hfile.bloom.enabled in Configuration serves as the kill switch in > case > > > something goes wrong. Default = true. > > > > > > -- Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
