bq. merge your other filters into the RegEx? Please keep in mind of keeping the regex readable for other developers.
bq. I need to have a MUST_PASS_ALL Listing requirement clearly would have helped making discussion more focused. Cheers On Fri, Jun 6, 2014 at 2:22 PM, Jean-Marc Spaggiari <[email protected] > wrote: > Then you need to use Ted's approach... Because with the 2 filters you > listed, you can not pass all as you said in your first message. > > You might be able to merge your other filters into the RegEx? > > JM > > > 2014-06-06 17:17 GMT-04:00 Vrushali C <[email protected]>: > > Thanks for the discussion! This helps me understand these filters better. >> >> FWIW, I need to have a MUST_PASS_ALL since I have some other filters as >> well in this scan. >> >> >> On Friday, June 6, 2014 9:18 AM, Ted Yu <[email protected]> wrote: >> >> >> bq. to do some test to compare the 2 solutions against the dataset. >> >> We're on the same page, JMS. >> >> >> On Fri, Jun 6, 2014 at 5:00 AM, Jean-Marc Spaggiari < >> [email protected] >> > wrote: >> >> > Yep, it's exactly my point. In one case we call 2 binary comparator >> (very >> > fast) in another case with call one regex comparator (slower). Now, >> > depending of the size of the strings, the columns names, etc. One >> solution >> > might be faster than the other one. But I can not tell which one. And I >> was >> > just suggesting to do some test to compare the 2 solutions against the >> > dataset. >> > >> > >> > 2014-06-05 22:08 GMT-04:00 Ted Yu <[email protected]>: >> > >> > > For FilterList approach, a row where no qualifier starts with 'c!', >> each >> > > qualifier would go through both sub-filters. >> > > >> > > For RegexStringComparator, each qualifier in such row would be >> evaluated >> > > once - since prefix doesn't match, result is drawn quickly. >> > > >> > > Cheers >> > > >> > > >> > > On Thu, Jun 5, 2014 at 5:33 PM, Jean-Marc Spaggiari < >> > > [email protected] >> > > > wrote: >> > > >> > > > I just re-used what Vrushali sent. I write that in the email so >> might >> > not >> > > > compile. But will give the idea. >> > > > >> > > > FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE); >> > > > SingleColumnValueFilter filter1 = new >> > > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, >> > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!"), >> > > > Constants.SEP_BYTES)))); >> > > > >> > > > list.add(filter1); >> > > > >> > > > SingleColumnValueFilter filter2 = new >> > > > QualifierFilter(CompareFilter.CompareOp.EQUAL, >> > > > >> > > > new BinaryPrefixComparator(Bytes.add(Bytes.toBytes("c!someName"), >> > > > Constants.SEP_BYTES)))) >> > > > list.add(filter2); >> > > > scan.setFilter(list); >> > > > >> > > > >> > > > To pass the first, value should NOT be starting with c!. >> > > > To pass the 2nd, value SHOULD start with c!someName. >> > > > >> > > > So c!notThis will fail for the first since it start with c!. and it >> > will >> > > > fail for the second since it's not starting with c!someName. >> > > > >> > > > Make sense? >> > > > >> > > > >> > > > 2014-06-05 20:27 GMT-04:00 Ted Yu <[email protected]>: >> > > > >> > > > > If we test c!notThis first will give false, second too. We rejest. >> > > > > If we test d!this first will give true, second false. We take it. >> > > > > >> > > > > Assuming the first filter compares against c!someName (negated), >> why >> > > > > would 'c!notThis' >> > > > > give false ? >> > > > > >> > > > > Mind showing the definition of the FilterList ? >> > > > > >> > > > > Cheers >> > > > > >> > > > > >> > > > > On Thu, Jun 5, 2014 at 4:52 PM, Jean-Marc Spaggiari < >> > > > > [email protected] >> > > > > > wrote: >> > > > > >> > > > > > He want to excluse everything starting with "c!" and keep >> > c!someName. >> > > > > > >> > > > > > So. First filter is a NOT, second is a include. >> > > > > > >> > > > > > If we test c!notThis first will give false, second too. We >> rejest. >> > > > > > If we test d!this first will give true, second false. We take >> it. >> > > > > > If we test c!someName first will give false, second will give >> true. >> > > We >> > > > > take >> > > > > > it. >> > > > > > >> > > > > > Do I miss something? It's possible because it's confusing ;) >> But I >> > > > think >> > > > > it >> > > > > > might work. >> > > > > > >> > > > > > JM >> > > > > > >> > > > > > >> > > > > > 2014-06-05 19:47 GMT-04:00 Ted Yu <[email protected]>: >> > > > > > >> > > > > > > MUST_PASS_ONE represents boolean OR operator. >> > > > > > > >> > > > > > > According to Vrushali's description, "c!someName" should be >> > > excluded. >> > > > > > > >> > > > > > > Would MUST_PASS_ONE achieve what Vrushali wanted ? >> > > > > > > >> > > > > > > Cheers >> > > > > > > >> > > > > > > >> > > > > > > On Thu, Jun 5, 2014 at 4:33 PM, Jean-Marc Spaggiari < >> > > > > > > [email protected] >> > > > > > > > wrote: >> > > > > > > >> > > > > > > > I will still give a try to the 2 filters options. >> > > > > > > > >> > > > > > > > RegEx are nice and powerful but very expensive. It's non >> > trivial. >> > > > > While >> > > > > > > the >> > > > > > > > prefix comparator is pretty simple and fast. So I'm not sure >> > > which >> > > > of >> > > > > > > the 2 >> > > > > > > > options will be faster. >> > > > > > > > >> > > > > > > > My opinion: Code wise, RegEx will be simpler, 2 filters >> will be >> > > > > faster. >> > > > > > > > >> > > > > > > > >> > > > > > > > 2014-06-05 18:55 GMT-04:00 Ted Yu <[email protected]>: >> > > > > > > > >> > > > > > > > > You're welcome. >> > > > > > > > > >> > > > > > > > > Filters / comparators shipped with HBase are pretty >> powerful. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Thu, Jun 5, 2014 at 3:04 PM, Vrushali C < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Thanks Ted! Using that regex comparator helped me >> resolve >> > > this. >> > > > > > > > > Appreciate >> > > > > > > > > > it very much! >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Thursday, June 5, 2014 2:23 PM, Ted Yu < >> > > > [email protected] >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > Or, you can use RegexStringComparator. >> > > > > > > > > > >> > > > > > > > > > Here is a regex string, in Java, that matches columns >> with >> > > > prefix >> > > > > > c! >> > > > > > > > > except >> > > > > > > > > > column called c!someName : >> > > > > > > > > > >> > > > > > > > > > "^c\\!((?!someName).)*$" >> > > > > > > > > > >> > > > > > > > > > Cheers >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Jun 5, 2014 at 1:26 PM, Ted Yu < >> > [email protected]> >> > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > One option is to write your own Comparator (similar to >> > > > > > > > > > BinaryPrefixComparator >> > > > > > > > > > > in essence) that treats the known column name >> specially. >> > > > > > > > > > > >> > > > > > > > > > > Cheers >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > On Thu, Jun 5, 2014 at 12:52 PM, Vrushali C < >> > > > > [email protected]> >> > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> Hi >> > > > > > > > > > >> Is there a way to do this kind of filtering : In my >> > scan, >> > > I >> > > > > want >> > > > > > > to >> > > > > > > > > > >> retrieve all columns except for columns starting >> with a >> > > > > certain >> > > > > > > > > prefix. >> > > > > > > > > > But >> > > > > > > > > > >> within that set of columns being ignored, I have one >> > known >> > > > > > column >> > > > > > > > name >> > > > > > > > > > that >> > > > > > > > > > >> I want to retrieve but ignore the rest. The reason is >> > that >> > > > > > columns >> > > > > > > > > with >> > > > > > > > > > >> this prefix have a lot of data and I am not >> interested >> > in >> > > > > > > everything >> > > > > > > > > > EXCEPT >> > > > > > > > > > >> one of those. >> > > > > > > > > > >> >> > > > > > > > > > >> So for ignoring the columns with a certain prefix in >> the >> > > > > scan, I >> > > > > > > am >> > > > > > > > > > doing >> > > > > > > > > > >> something like >> > > > > > > > > > >> filters.addFilter( >> > > > > > > > > > >> new >> > > QualifierFilter(CompareFilter.CompareOp.NOT_EQUAL, >> > > > > > > > > > >> new BinaryPrefixComparator( >> > > > > > > > > > >> Bytes.add(Bytes.toBytes("c!"), >> > > > > > > > Constants.SEP_BYTES)))) >> > > > > > > > > > >> >> > > > > > > > > > >> Which works. But what I also want to add, is >> something >> > > like >> > > > > this >> > > > > > > > > > >> >> > > > > > > > > > >> filters.addFilter( >> > > > > > > > > > >> new >> QualifierFilter(CompareFilter.CompareOp.EQUAL, >> > > > > > > > > > >> new BinaryPrefixComparator( >> > > > > > > > > > >> Bytes.add(Bytes.toBytes("c!someName"), >> > > > > > > > > > >> Constants.SEP_BYTES)))) >> > > > > > > > > > >> >> > > > > > > > > > >> I realize both filters are contradictory to each >> other, >> > so >> > > > how >> > > > > > do >> > > > > > > I >> > > > > > > > > > >> achieve this? >> > > > > > > > > > >> >> > > > > > > > > > >> thanks >> > > > > > > > > > >> Vrushali >> > > > > > > > > > >> >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >> >> >
