Yep, it's exactly my point. In one case we call 2 binary comparator (very fast) in another case with call one regex comparator (slower). Now, depending of the size of the strings, the columns names, etc. One solution might be faster than the other one. But I can not tell which one. And I was just suggesting to do some test to compare the 2 solutions against the dataset.
2014-06-05 22:08 GMT-04:00 Ted Yu <[email protected]>: > For FilterList approach, a row where no qualifier starts with 'c!', each > qualifier would go through both sub-filters. > > For RegexStringComparator, each qualifier in such row would be evaluated > once - since prefix doesn't match, result is drawn quickly. > > Cheers > > > On Thu, Jun 5, 2014 at 5:33 PM, Jean-Marc Spaggiari < > [email protected] > > wrote: > > > I just re-used what Vrushali sent. I write that in the email so might not > > compile. But will give the idea. > > > > FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE); > > SingleColumnValueFilter filter1 = new > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!"), > > Constants.SEP_BYTES)))); > > > > list.add(filter1); > > > > SingleColumnValueFilter filter2 = new > > QualifierFilter(CompareFilter.CompareOp.EQUAL, > > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!someName"), > > Constants.SEP_BYTES)))) > > list.add(filter2); > > scan.setFilter(list); > > > > > > To pass the first, value should NOT be starting with c!. > > To pass the 2nd, value SHOULD start with c!someName. > > > > So c!notThis will fail for the first since it start with c!. and it will > > fail for the second since it's not starting with c!someName. > > > > Make sense? > > > > > > 2014-06-05 20:27 GMT-04:00 Ted Yu <[email protected]>: > > > > > If we test c!notThis first will give false, second too. We rejest. > > > If we test d!this first will give true, second false. We take it. > > > > > > Assuming the first filter compares against c!someName (negated), why > > > would 'c!notThis' > > > give false ? > > > > > > Mind showing the definition of the FilterList ? > > > > > > Cheers > > > > > > > > > On Thu, Jun 5, 2014 at 4:52 PM, Jean-Marc Spaggiari < > > > [email protected] > > > > wrote: > > > > > > > He want to excluse everything starting with "c!" and keep c!someName. > > > > > > > > So. First filter is a NOT, second is a include. > > > > > > > > If we test c!notThis first will give false, second too. We rejest. > > > > If we test d!this first will give true, second false. We take it. > > > > If we test c!someName first will give false, second will give true. > We > > > take > > > > it. > > > > > > > > Do I miss something? It's possible because it's confusing ;) But I > > think > > > it > > > > might work. > > > > > > > > JM > > > > > > > > > > > > 2014-06-05 19:47 GMT-04:00 Ted Yu <[email protected]>: > > > > > > > > > MUST_PASS_ONE represents boolean OR operator. > > > > > > > > > > According to Vrushali's description, "c!someName" should be > excluded. > > > > > > > > > > Would MUST_PASS_ONE achieve what Vrushali wanted ? > > > > > > > > > > Cheers > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 4:33 PM, Jean-Marc Spaggiari < > > > > > [email protected] > > > > > > wrote: > > > > > > > > > > > I will still give a try to the 2 filters options. > > > > > > > > > > > > RegEx are nice and powerful but very expensive. It's non trivial. > > > While > > > > > the > > > > > > prefix comparator is pretty simple and fast. So I'm not sure > which > > of > > > > > the 2 > > > > > > options will be faster. > > > > > > > > > > > > My opinion: Code wise, RegEx will be simpler, 2 filters will be > > > faster. > > > > > > > > > > > > > > > > > > 2014-06-05 18:55 GMT-04:00 Ted Yu <[email protected]>: > > > > > > > > > > > > > You're welcome. > > > > > > > > > > > > > > Filters / comparators shipped with HBase are pretty powerful. > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 3:04 PM, Vrushali C <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > Thanks Ted! Using that regex comparator helped me resolve > this. > > > > > > > Appreciate > > > > > > > > it very much! > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, June 5, 2014 2:23 PM, Ted Yu < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Or, you can use RegexStringComparator. > > > > > > > > > > > > > > > > Here is a regex string, in Java, that matches columns with > > prefix > > > > c! > > > > > > > except > > > > > > > > column called c!someName : > > > > > > > > > > > > > > > > "^c\\!((?!someName).)*$" > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 1:26 PM, Ted Yu <[email protected]> > > > > wrote: > > > > > > > > > > > > > > > > > One option is to write your own Comparator (similar to > > > > > > > > BinaryPrefixComparator > > > > > > > > > in essence) that treats the known column name specially. > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 12:52 PM, Vrushali C < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> Hi > > > > > > > > >> Is there a way to do this kind of filtering : In my scan, > I > > > want > > > > > to > > > > > > > > >> retrieve all columns except for columns starting with a > > > certain > > > > > > > prefix. > > > > > > > > But > > > > > > > > >> within that set of columns being ignored, I have one known > > > > column > > > > > > name > > > > > > > > that > > > > > > > > >> I want to retrieve but ignore the rest. The reason is that > > > > columns > > > > > > > with > > > > > > > > >> this prefix have a lot of data and I am not interested in > > > > > everything > > > > > > > > EXCEPT > > > > > > > > >> one of those. > > > > > > > > >> > > > > > > > > >> So for ignoring the columns with a certain prefix in the > > > scan, I > > > > > am > > > > > > > > doing > > > > > > > > >> something like > > > > > > > > >> filters.addFilter( > > > > > > > > >> new > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, > > > > > > > > >> new BinaryPrefixComparator( > > > > > > > > >> Bytes.add(Bytes.toBytes("c!"), > > > > > > Constants.SEP_BYTES)))) > > > > > > > > >> > > > > > > > > >> Which works. But what I also want to add, is something > like > > > this > > > > > > > > >> > > > > > > > > >> filters.addFilter( > > > > > > > > >> new QualifierFilter(CompareFilter.CompareOp.EQUAL, > > > > > > > > >> new BinaryPrefixComparator( > > > > > > > > >> Bytes.add(Bytes.toBytes("c!someName"), > > > > > > > > >> Constants.SEP_BYTES)))) > > > > > > > > >> > > > > > > > > >> I realize both filters are contradictory to each other, so > > how > > > > do > > > > > I > > > > > > > > >> achieve this? > > > > > > > > >> > > > > > > > > >> thanks > > > > > > > > >> Vrushali > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
