bq. merge your other filters into the RegEx?

Please keep in mind of keeping the regex readable for other developers.

bq. I need to have a MUST_PASS_ALL

Listing requirement clearly would have helped making discussion more
focused.

Cheers


On Fri, Jun 6, 2014 at 2:22 PM, Jean-Marc Spaggiari <[email protected]
> wrote:

> Then you need to use Ted's approach... Because with the 2 filters you
> listed, you can not pass all as you said in your first message.
>
> You might be able to merge your other filters into the RegEx?
>
> JM
>
>
> 2014-06-06 17:17 GMT-04:00 Vrushali C <[email protected]>:
>
>  Thanks for the discussion! This helps me understand these filters better.
>>
>> FWIW, I need to have a MUST_PASS_ALL since I have some other filters as
>> well in this scan.
>>
>>
>>   On Friday, June 6, 2014 9:18 AM, Ted Yu <[email protected]> wrote:
>>
>>
>> bq. to do some test to compare the 2 solutions against the dataset.
>>
>> We're on the same page, JMS.
>>
>>
>> On Fri, Jun 6, 2014 at 5:00 AM, Jean-Marc Spaggiari <
>> [email protected]
>> > wrote:
>>
>> > Yep, it's exactly my point. In one case we call 2 binary comparator
>> (very
>> > fast) in another case with call one regex comparator (slower). Now,
>> > depending of the size of the strings, the columns names, etc. One
>> solution
>> > might be faster than the other one. But I can not tell which one. And I
>> was
>> > just suggesting to do some test to compare the 2 solutions against the
>> > dataset.
>> >
>> >
>> > 2014-06-05 22:08 GMT-04:00 Ted Yu <[email protected]>:
>> >
>> > > For FilterList approach, a row where no qualifier starts with 'c!',
>> each
>> > > qualifier would go through both sub-filters.
>> > >
>> > > For RegexStringComparator, each qualifier in such row would be
>> evaluated
>> > > once - since prefix doesn't match, result is drawn quickly.
>> > >
>> > > Cheers
>> > >
>> > >
>> > > On Thu, Jun 5, 2014 at 5:33 PM, Jean-Marc Spaggiari <
>> > > [email protected]
>> > > > wrote:
>> > >
>> > > > I just re-used what Vrushali sent. I write that in the email so
>> might
>> > not
>> > > > compile. But will give the idea.
>> > > >
>> > > > FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);
>> > > > SingleColumnValueFilter filter1 = new
>> > > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL,
>> > > >  new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!"),
>> > > > Constants.SEP_BYTES))));
>> > > >
>> > > > list.add(filter1);
>> > > >
>> > > > SingleColumnValueFilter filter2 = new
>> > > > QualifierFilter(CompareFilter.CompareOp.EQUAL,
>> > > >
>> > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!someName"),
>> > > > Constants.SEP_BYTES))))
>> > > > list.add(filter2);
>> > > > scan.setFilter(list);
>> > > >
>> > > >
>> > > > To pass the first, value should NOT be starting with c!.
>> > > > To pass the 2nd, value SHOULD start with c!someName.
>> > > >
>> > > > So c!notThis will fail for the first since it start with c!. and it
>> > will
>> > > > fail for the second since it's not starting with c!someName.
>> > > >
>> > > > Make sense?
>> > > >
>> > > >
>> > > > 2014-06-05 20:27 GMT-04:00 Ted Yu <[email protected]>:
>> > > >
>> > > > > If we test c!notThis first will give false, second too. We rejest.
>> > > > > If we test d!this first will give true, second false. We take it.
>> > > > >
>> > > > > Assuming the first filter compares against c!someName (negated),
>> why
>> > > > > would 'c!notThis'
>> > > > > give false ?
>> > > > >
>> > > > > Mind showing the definition of the FilterList ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > >
>> > > > > On Thu, Jun 5, 2014 at 4:52 PM, Jean-Marc Spaggiari <
>> > > > > [email protected]
>> > > > > > wrote:
>> > > > >
>> > > > > > He want to excluse everything starting with "c!" and keep
>> > c!someName.
>> > > > > >
>> > > > > > So. First filter is a NOT, second is a include.
>> > > > > >
>> > > > > > If we test c!notThis first will give false, second too. We
>> rejest.
>> > > > > > If we test d!this first will give true, second false. We take
>> it.
>> > > > > > If we test c!someName first will give false, second will give
>> true.
>> > > We
>> > > > > take
>> > > > > > it.
>> > > > > >
>> > > > > > Do I miss something? It's possible because it's confusing ;)
>> But I
>> > > > think
>> > > > > it
>> > > > > > might work.
>> > > > > >
>> > > > > > JM
>> > > > > >
>> > > > > >
>> > > > > > 2014-06-05 19:47 GMT-04:00 Ted Yu <[email protected]>:
>> > > > > >
>> > > > > > > MUST_PASS_ONE represents boolean OR operator.
>> > > > > > >
>> > > > > > > According to Vrushali's description, "c!someName" should be
>> > > excluded.
>> > > > > > >
>> > > > > > > Would MUST_PASS_ONE achieve what Vrushali wanted ?
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Jun 5, 2014 at 4:33 PM, Jean-Marc Spaggiari <
>> > > > > > > [email protected]
>> > > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I will still give a try to the 2 filters options.
>> > > > > > > >
>> > > > > > > > RegEx are nice and powerful but very expensive. It's non
>> > trivial.
>> > > > > While
>> > > > > > > the
>> > > > > > > > prefix comparator is pretty simple and fast. So I'm not sure
>> > > which
>> > > > of
>> > > > > > > the 2
>> > > > > > > > options will be faster.
>> > > > > > > >
>> > > > > > > > My opinion: Code wise, RegEx will be simpler, 2 filters
>> will be
>> > > > > faster.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 2014-06-05 18:55 GMT-04:00 Ted Yu <[email protected]>:
>> > > > > > > >
>> > > > > > > > > You're welcome.
>> > > > > > > > >
>> > > > > > > > > Filters / comparators shipped with HBase are pretty
>> powerful.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Thu, Jun 5, 2014 at 3:04 PM, Vrushali C <
>> > [email protected]
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Thanks Ted! Using that regex comparator helped me
>> resolve
>> > > this.
>> > > > > > > > > Appreciate
>> > > > > > > > > > it very much!
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >  On Thursday, June 5, 2014 2:23 PM, Ted Yu <
>> > > > [email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Or, you can use RegexStringComparator.
>> > > > > > > > > >
>> > > > > > > > > > Here is a regex string, in Java, that matches columns
>> with
>> > > > prefix
>> > > > > > c!
>> > > > > > > > > except
>> > > > > > > > > > column called c!someName :
>> > > > > > > > > >
>> > > > > > > > > > "^c\\!((?!someName).)*$"
>> > > > > > > > > >
>> > > > > > > > > > Cheers
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Jun 5, 2014 at 1:26 PM, Ted Yu <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > One option is to write your own Comparator (similar to
>> > > > > > > > > > BinaryPrefixComparator
>> > > > > > > > > > > in essence) that treats the known column name
>> specially.
>> > > > > > > > > > >
>> > > > > > > > > > > Cheers
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, Jun 5, 2014 at 12:52 PM, Vrushali C <
>> > > > > [email protected]>
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >>
>> > > > > > > > > > >>
>> > > > > > > > > > >> Hi
>> > > > > > > > > > >> Is there a way to do this kind of filtering : In my
>> > scan,
>> > > I
>> > > > > want
>> > > > > > > to
>> > > > > > > > > > >> retrieve all columns except for columns starting
>> with a
>> > > > > certain
>> > > > > > > > > prefix.
>> > > > > > > > > > But
>> > > > > > > > > > >> within that set of columns being ignored, I have one
>> > known
>> > > > > > column
>> > > > > > > > name
>> > > > > > > > > > that
>> > > > > > > > > > >> I want to retrieve but ignore the rest. The reason is
>> > that
>> > > > > > columns
>> > > > > > > > > with
>> > > > > > > > > > >> this prefix have a lot of data and I am not
>> interested
>> > in
>> > > > > > > everything
>> > > > > > > > > > EXCEPT
>> > > > > > > > > > >> one of those.
>> > > > > > > > > > >>
>> > > > > > > > > > >> So for ignoring the columns with a certain prefix in
>> the
>> > > > > scan, I
>> > > > > > > am
>> > > > > > > > > > doing
>> > > > > > > > > > >> something like
>> > > > > > > > > > >> filters.addFilter(
>> > > > > > > > > > >>      new
>> > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL,
>> > > > > > > > > > >>        new BinaryPrefixComparator(
>> > > > > > > > > > >>                Bytes.add(Bytes.toBytes("c!"),
>> > > > > > > > Constants.SEP_BYTES))))
>> > > > > > > > > > >>
>> > > > > > > > > > >> Which works. But what I also want to add, is
>> something
>> > > like
>> > > > > this
>> > > > > > > > > > >>
>> > > > > > > > > > >> filters.addFilter(
>> > > > > > > > > > >>      new
>> QualifierFilter(CompareFilter.CompareOp.EQUAL,
>> > > > > > > > > > >>        new BinaryPrefixComparator(
>> > > > > > > > > > >>                Bytes.add(Bytes.toBytes("c!someName"),
>> > > > > > > > > > >> Constants.SEP_BYTES))))
>> > > > > > > > > > >>
>> > > > > > > > > > >> I realize both filters are contradictory to each
>> other,
>> > so
>> > > > how
>> > > > > > do
>> > > > > > > I
>> > > > > > > > > > >> achieve this?
>> > > > > > > > > > >>
>> > > > > > > > > > >> thanks
>> > > > > > > > > > >> Vrushali
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>

Reply via email to