Hi Lars,
public class AggregationCountForMultiFilter {
private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest");
private static final byte[] CF = Bytes.toBytes("cf");
private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST");
public static void main(String[] args) {
Configuration conf = new Configuration();
Configuration configuration = HBaseConfiguration.create(conf);
AggregationClient aggregationClient = new AggregationClient(configuration);
byte[] colA = Bytes.toBytes("tags");
byte[] colB = Bytes.toBytes("googleid");
byte[] colC = Bytes.toBytes("createtime");
List<Filter> filters = new ArrayList<Filter>();
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(CF, colA,
CompareOp.NOT_EQUAL, FAKE_VLAUE);
filter1.setFilterIfMissing(true);
filters.add(filter1);
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(CF, colB,
CompareOp.NOT_EQUAL, FAKE_VLAUE);
filter2.setFilterIfMissing(true);
filters.add(filter2);
SingleColumnValueFilter filter3 = new SingleColumnValueFilter(CF, colC,
CompareOp.EQUAL, new RegexStringComparator("^2014-01-15"));
filter3.setFilterIfMissing(true);
filters.add(filter3);
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL,
filters);
Scan scan = new Scan();
scan.addFamily(CF);
scan.setFilter(filterList);
long rowCount = 0;
try {
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
} catch (Throwable e) {
e.printStackTrace();
}
System.out.println("rowCount: " + rowCount);
}
}
}
The HBase version 0.94.6-cdh4.3.1
Thanks,
Lei
[email protected]
From: lars hofhansl
Date: 2014-01-18 11:18
To: [email protected]
Subject: Re: Re: How to quickly count the rows that meet several conditions
using hbase coprocessor
Offhand there is no reason for that.
If you send some sample code that can seed the data and then run the filter
that shows the problem, I'll offer to do some profiling.
Which version of HBase are you using?
-- Lars
________________________________
From: "[email protected]" <[email protected]>
To: user <[email protected]>
Cc: user <[email protected]>
Sent: Friday, January 17, 2014 5:24 PM
Subject: Re: Re: How to quickly count the rows that meet several conditions
using hbase coprocessor
Hi,
I have tried.
For a talbe with about 600 million rowkey, just pass a single QualifierFilter,
it can get the result quickly.
But when i add the SingleColumnValueFilter with FilterList, it becoumes very
slow and i can't stand it.
I think i can write my own custumed aggregation client. Is there any example
or user guide about how to write custumed aggregation client using coprocessor?
Thanks,
Lei
[email protected]
From: Ted Yu
Date: 2014-01-17 18:03
To: [email protected]
CC: user
Subject: Re: How to quickly count the rows that meet several conditions using
hbase coprocessor
Take a look at
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html#rowCount(byte[],%20org.apache.hadoop.hbase.coprocessor.ColumnInterpreter,%20org.apache.hadoop.hbase.client.Scan)
You can pass custom filter through Scan parameter.
Cheers
On Jan 16, 2014, at 11:58 PM, "[email protected]" <[email protected]>
wrote:
> Hi,
>
> I know that hbase copocessor provides a quick way to count the rows of a
> table.
> But how can i count the rows that meet several conditions.
>
> Take this for example.
> I have a hbase table with one column family, several columns. I want to
> caculate the number of rows that meet 3 conditions:
> has column1
> has column2
> has column3 and the value of column3 satisfy a regular expression
>
> Thans,
> Lei
>
>
>
> [email protected]