Re: Bloom Filter

Alex Baranau Fri, 27 Jul 2012 07:26:05 -0700

Very good explanation (and food for thinking) about using bloom filters in
HBase in answers here:
http://www.quora.com/How-are-bloom-filters-used-in-HBase.


Should we put the link to it from Apache HBase book (ref guide)?

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia <[email protected]>wrote:

> On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen <[email protected]>
> wrote:
>
> > Mohit,
> >
> > According to HBase: The Definitive Guide,
> >
> > The row+column Bloom filter is useful when you cannot batch updates for a
> > specific row, and end up with store files which all contain parts of the
> > row. The more specific row+column filter can then identify which of the
> > files contain the data you are requesting. Obviously, if you always load
> > the entire row, this filter is once again hardly useful, as the region
> > server will need to load the matching block out of each file anyway.
>  Since
> > the row+column filter will require more storage, you need to do the math
> to
> > determine whether it is worth the extra resources.
> >
>
> Thanks! I have a timeseries data so I am thinking I should enable bloom
> filters for only rows
>
> >
> >
> >    ~ Minh
> >
> > On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia <[email protected]
> > >wrote:
> >
> > > Is it advisable to enable bloom filters on the column family?
> > >
> > > Also, why is it called global kill switch?
> > >
> > > Bloom Filter Configuration
> > >   2.9.1. io.hfile.bloom.enabled global kill switch
> > >
> > > io.hfile.bloom.enabled in Configuration serves as the kill switch in
> case
> > > something goes wrong. Default = true.
> > >
> >
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Bloom Filter

Reply via email to