bq. to do some test to compare the 2 solutions against the dataset. We're on the same page, JMS.
On Fri, Jun 6, 2014 at 5:00 AM, Jean-Marc Spaggiari <[email protected] > wrote: > Yep, it's exactly my point. In one case we call 2 binary comparator (very > fast) in another case with call one regex comparator (slower). Now, > depending of the size of the strings, the columns names, etc. One solution > might be faster than the other one. But I can not tell which one. And I was > just suggesting to do some test to compare the 2 solutions against the > dataset. > > > 2014-06-05 22:08 GMT-04:00 Ted Yu <[email protected]>: > > > For FilterList approach, a row where no qualifier starts with 'c!', each > > qualifier would go through both sub-filters. > > > > For RegexStringComparator, each qualifier in such row would be evaluated > > once - since prefix doesn't match, result is drawn quickly. > > > > Cheers > > > > > > On Thu, Jun 5, 2014 at 5:33 PM, Jean-Marc Spaggiari < > > [email protected] > > > wrote: > > > > > I just re-used what Vrushali sent. I write that in the email so might > not > > > compile. But will give the idea. > > > > > > FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE); > > > SingleColumnValueFilter filter1 = new > > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!"), > > > Constants.SEP_BYTES)))); > > > > > > list.add(filter1); > > > > > > SingleColumnValueFilter filter2 = new > > > QualifierFilter(CompareFilter.CompareOp.EQUAL, > > > > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!someName"), > > > Constants.SEP_BYTES)))) > > > list.add(filter2); > > > scan.setFilter(list); > > > > > > > > > To pass the first, value should NOT be starting with c!. > > > To pass the 2nd, value SHOULD start with c!someName. > > > > > > So c!notThis will fail for the first since it start with c!. and it > will > > > fail for the second since it's not starting with c!someName. > > > > > > Make sense? > > > > > > > > > 2014-06-05 20:27 GMT-04:00 Ted Yu <[email protected]>: > > > > > > > If we test c!notThis first will give false, second too. We rejest. > > > > If we test d!this first will give true, second false. We take it. > > > > > > > > Assuming the first filter compares against c!someName (negated), why > > > > would 'c!notThis' > > > > give false ? > > > > > > > > Mind showing the definition of the FilterList ? > > > > > > > > Cheers > > > > > > > > > > > > On Thu, Jun 5, 2014 at 4:52 PM, Jean-Marc Spaggiari < > > > > [email protected] > > > > > wrote: > > > > > > > > > He want to excluse everything starting with "c!" and keep > c!someName. > > > > > > > > > > So. First filter is a NOT, second is a include. > > > > > > > > > > If we test c!notThis first will give false, second too. We rejest. > > > > > If we test d!this first will give true, second false. We take it. > > > > > If we test c!someName first will give false, second will give true. > > We > > > > take > > > > > it. > > > > > > > > > > Do I miss something? It's possible because it's confusing ;) But I > > > think > > > > it > > > > > might work. > > > > > > > > > > JM > > > > > > > > > > > > > > > 2014-06-05 19:47 GMT-04:00 Ted Yu <[email protected]>: > > > > > > > > > > > MUST_PASS_ONE represents boolean OR operator. > > > > > > > > > > > > According to Vrushali's description, "c!someName" should be > > excluded. > > > > > > > > > > > > Would MUST_PASS_ONE achieve what Vrushali wanted ? > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 4:33 PM, Jean-Marc Spaggiari < > > > > > > [email protected] > > > > > > > wrote: > > > > > > > > > > > > > I will still give a try to the 2 filters options. > > > > > > > > > > > > > > RegEx are nice and powerful but very expensive. It's non > trivial. > > > > While > > > > > > the > > > > > > > prefix comparator is pretty simple and fast. So I'm not sure > > which > > > of > > > > > > the 2 > > > > > > > options will be faster. > > > > > > > > > > > > > > My opinion: Code wise, RegEx will be simpler, 2 filters will be > > > > faster. > > > > > > > > > > > > > > > > > > > > > 2014-06-05 18:55 GMT-04:00 Ted Yu <[email protected]>: > > > > > > > > > > > > > > > You're welcome. > > > > > > > > > > > > > > > > Filters / comparators shipped with HBase are pretty powerful. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 3:04 PM, Vrushali C < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks Ted! Using that regex comparator helped me resolve > > this. > > > > > > > > Appreciate > > > > > > > > > it very much! > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, June 5, 2014 2:23 PM, Ted Yu < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Or, you can use RegexStringComparator. > > > > > > > > > > > > > > > > > > Here is a regex string, in Java, that matches columns with > > > prefix > > > > > c! > > > > > > > > except > > > > > > > > > column called c!someName : > > > > > > > > > > > > > > > > > > "^c\\!((?!someName).)*$" > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 1:26 PM, Ted Yu < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > > > > > One option is to write your own Comparator (similar to > > > > > > > > > BinaryPrefixComparator > > > > > > > > > > in essence) that treats the known column name specially. > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 12:52 PM, Vrushali C < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> Hi > > > > > > > > > >> Is there a way to do this kind of filtering : In my > scan, > > I > > > > want > > > > > > to > > > > > > > > > >> retrieve all columns except for columns starting with a > > > > certain > > > > > > > > prefix. > > > > > > > > > But > > > > > > > > > >> within that set of columns being ignored, I have one > known > > > > > column > > > > > > > name > > > > > > > > > that > > > > > > > > > >> I want to retrieve but ignore the rest. The reason is > that > > > > > columns > > > > > > > > with > > > > > > > > > >> this prefix have a lot of data and I am not interested > in > > > > > > everything > > > > > > > > > EXCEPT > > > > > > > > > >> one of those. > > > > > > > > > >> > > > > > > > > > >> So for ignoring the columns with a certain prefix in the > > > > scan, I > > > > > > am > > > > > > > > > doing > > > > > > > > > >> something like > > > > > > > > > >> filters.addFilter( > > > > > > > > > >> new > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, > > > > > > > > > >> new BinaryPrefixComparator( > > > > > > > > > >> Bytes.add(Bytes.toBytes("c!"), > > > > > > > Constants.SEP_BYTES)))) > > > > > > > > > >> > > > > > > > > > >> Which works. But what I also want to add, is something > > like > > > > this > > > > > > > > > >> > > > > > > > > > >> filters.addFilter( > > > > > > > > > >> new QualifierFilter(CompareFilter.CompareOp.EQUAL, > > > > > > > > > >> new BinaryPrefixComparator( > > > > > > > > > >> Bytes.add(Bytes.toBytes("c!someName"), > > > > > > > > > >> Constants.SEP_BYTES)))) > > > > > > > > > >> > > > > > > > > > >> I realize both filters are contradictory to each other, > so > > > how > > > > > do > > > > > > I > > > > > > > > > >> achieve this? > > > > > > > > > >> > > > > > > > > > >> thanks > > > > > > > > > >> Vrushali > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
