Re: scan filter: how to ignore everything with a prefix except one particular column with that prefix

Ted Yu Fri, 06 Jun 2014 09:19:37 -0700

bq. to do some test to compare the 2 solutions against the dataset.

We're on the same page, JMS.



On Fri, Jun 6, 2014 at 5:00 AM, Jean-Marc Spaggiari <[email protected]
> wrote:

> Yep, it's exactly my point. In one case we call 2 binary comparator (very
> fast) in another case with call one regex comparator (slower). Now,
> depending of the size of the strings, the columns names, etc. One solution
> might be faster than the other one. But I can not tell which one. And I was
> just suggesting to do some test to compare the 2 solutions against the
> dataset.
>
>
> 2014-06-05 22:08 GMT-04:00 Ted Yu <[email protected]>:
>
> > For FilterList approach, a row where no qualifier starts with 'c!', each
> > qualifier would go through both sub-filters.
> >
> > For RegexStringComparator, each qualifier in such row would be evaluated
> > once - since prefix doesn't match, result is drawn quickly.
> >
> > Cheers
> >
> >
> > On Thu, Jun 5, 2014 at 5:33 PM, Jean-Marc Spaggiari <
> > [email protected]
> > > wrote:
> >
> > > I just re-used what Vrushali sent. I write that in the email so might
> not
> > > compile. But will give the idea.
> > >
> > > FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
> > > SingleColumnValueFilter filter1 = new
> > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL,
> > >  new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!"),
> > > Constants.SEP_BYTES))));
> > >
> > > list.add(filter1);
> > >
> > > SingleColumnValueFilter filter2 = new
> > > QualifierFilter(CompareFilter.CompareOp.EQUAL,
> > >
> > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!someName"),
> > > Constants.SEP_BYTES))))
> > > list.add(filter2);
> > > scan.setFilter(list);
> > >
> > >
> > > To pass the first, value should NOT be starting with c!.
> > > To pass the 2nd, value SHOULD start with c!someName.
> > >
> > > So c!notThis will fail for the first since it start with c!. and it
> will
> > > fail for the second since it's not starting with c!someName.
> > >
> > > Make sense?
> > >
> > >
> > > 2014-06-05 20:27 GMT-04:00 Ted Yu <[email protected]>:
> > >
> > > > If we test c!notThis first will give false, second too. We rejest.
> > > > If we test d!this first will give true, second false. We take it.
> > > >
> > > > Assuming the first filter compares against c!someName (negated), why
> > > > would 'c!notThis'
> > > > give false ?
> > > >
> > > > Mind showing the definition of the FilterList ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Thu, Jun 5, 2014 at 4:52 PM, Jean-Marc Spaggiari <
> > > > [email protected]
> > > > > wrote:
> > > >
> > > > > He want to excluse everything starting with "c!" and keep
> c!someName.
> > > > >
> > > > > So. First filter is a NOT, second is a include.
> > > > >
> > > > > If we test c!notThis first will give false, second too. We rejest.
> > > > > If we test d!this first will give true, second false. We take it.
> > > > > If we test c!someName first will give false, second will give true.
> > We
> > > > take
> > > > > it.
> > > > >
> > > > > Do I miss something? It's possible because it's confusing ;) But I
> > > think
> > > > it
> > > > > might work.
> > > > >
> > > > > JM
> > > > >
> > > > >
> > > > > 2014-06-05 19:47 GMT-04:00 Ted Yu <[email protected]>:
> > > > >
> > > > > > MUST_PASS_ONE represents boolean OR operator.
> > > > > >
> > > > > > According to Vrushali's description, "c!someName" should be
> > excluded.
> > > > > >
> > > > > > Would MUST_PASS_ONE achieve what Vrushali wanted ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 5, 2014 at 4:33 PM, Jean-Marc Spaggiari <
> > > > > > [email protected]
> > > > > > > wrote:
> > > > > >
> > > > > > > I will still give a try to the 2 filters options.
> > > > > > >
> > > > > > > RegEx are nice and powerful but very expensive. It's non
> trivial.
> > > > While
> > > > > > the
> > > > > > > prefix comparator is pretty simple and fast. So I'm not sure
> > which
> > > of
> > > > > > the 2
> > > > > > > options will be faster.
> > > > > > >
> > > > > > > My opinion: Code wise, RegEx will be simpler, 2 filters will be
> > > > faster.
> > > > > > >
> > > > > > >
> > > > > > > 2014-06-05 18:55 GMT-04:00 Ted Yu <[email protected]>:
> > > > > > >
> > > > > > > > You're welcome.
> > > > > > > >
> > > > > > > > Filters / comparators shipped with HBase are pretty powerful.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 5, 2014 at 3:04 PM, Vrushali C <
> [email protected]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Ted! Using that regex comparator helped me resolve
> > this.
> > > > > > > > Appreciate
> > > > > > > > > it very much!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >   On Thursday, June 5, 2014 2:23 PM, Ted Yu <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Or, you can use RegexStringComparator.
> > > > > > > > >
> > > > > > > > > Here is a regex string, in Java, that matches columns with
> > > prefix
> > > > > c!
> > > > > > > > except
> > > > > > > > > column called c!someName :
> > > > > > > > >
> > > > > > > > > "^c\\!((?!someName).)*$"
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jun 5, 2014 at 1:26 PM, Ted Yu <
> [email protected]>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > One option is to write your own Comparator (similar to
> > > > > > > > > BinaryPrefixComparator
> > > > > > > > > > in essence) that treats the known column name specially.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 5, 2014 at 12:52 PM, Vrushali C <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Hi
> > > > > > > > > >> Is there a way to do this kind of filtering : In my
> scan,
> > I
> > > > want
> > > > > > to
> > > > > > > > > >> retrieve all columns except for columns starting with a
> > > > certain
> > > > > > > > prefix.
> > > > > > > > > But
> > > > > > > > > >> within that set of columns being ignored, I have one
> known
> > > > > column
> > > > > > > name
> > > > > > > > > that
> > > > > > > > > >> I want to retrieve but ignore the rest. The reason is
> that
> > > > > columns
> > > > > > > > with
> > > > > > > > > >> this prefix have a lot of data and I am not interested
> in
> > > > > > everything
> > > > > > > > > EXCEPT
> > > > > > > > > >> one of those.
> > > > > > > > > >>
> > > > > > > > > >> So for ignoring the columns with a certain prefix in the
> > > > scan, I
> > > > > > am
> > > > > > > > > doing
> > > > > > > > > >> something like
> > > > > > > > > >> filters.addFilter(
> > > > > > > > > >>      new
> > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL,
> > > > > > > > > >>        new BinaryPrefixComparator(
> > > > > > > > > >>                Bytes.add(Bytes.toBytes("c!"),
> > > > > > > Constants.SEP_BYTES))))
> > > > > > > > > >>
> > > > > > > > > >> Which works. But what I also want to add, is something
> > like
> > > > this
> > > > > > > > > >>
> > > > > > > > > >> filters.addFilter(
> > > > > > > > > >>      new QualifierFilter(CompareFilter.CompareOp.EQUAL,
> > > > > > > > > >>        new BinaryPrefixComparator(
> > > > > > > > > >>                Bytes.add(Bytes.toBytes("c!someName"),
> > > > > > > > > >> Constants.SEP_BYTES))))
> > > > > > > > > >>
> > > > > > > > > >> I realize both filters are contradictory to each other,
> so
> > > how
> > > > > do
> > > > > > I
> > > > > > > > > >> achieve this?
> > > > > > > > > >>
> > > > > > > > > >> thanks
> > > > > > > > > >> Vrushali
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: scan filter: how to ignore everything with a prefix except one particular column with that prefix

Reply via email to