Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

[email protected] Fri, 17 Jan 2014 20:36:02 -0800

Hi Lars,

public class AggregationCountForMultiFilter {


private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest");
private static final byte[] CF = Bytes.toBytes("cf");
private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST");

public static void main(String[] args) {

Configuration conf = new Configuration();
Configuration configuration = HBaseConfiguration.create(conf);
AggregationClient aggregationClient = new AggregationClient(configuration);

byte[] colA = Bytes.toBytes("tags");
byte[] colB = Bytes.toBytes("googleid");
byte[] colC = Bytes.toBytes("createtime");

List<Filter> filters = new ArrayList<Filter>();

SingleColumnValueFilter filter1 = new SingleColumnValueFilter(CF, colA, 
CompareOp.NOT_EQUAL, FAKE_VLAUE);
filter1.setFilterIfMissing(true);
filters.add(filter1);

SingleColumnValueFilter filter2 = new SingleColumnValueFilter(CF, colB, 
CompareOp.NOT_EQUAL, FAKE_VLAUE);
filter2.setFilterIfMissing(true);
filters.add(filter2);

SingleColumnValueFilter filter3 = new SingleColumnValueFilter(CF, colC, 
CompareOp.EQUAL, new RegexStringComparator("^2014-01-15"));
filter3.setFilterIfMissing(true);
filters.add(filter3);

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, 
filters);

Scan scan = new Scan();
scan.addFamily(CF);
scan.setFilter(filterList);

long rowCount = 0;
try {
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
} catch (Throwable e) {
e.printStackTrace();
}
System.out.println("rowCount: " + rowCount);
}
}
}

The HBase version 0.94.6-cdh4.3.1 

Thanks,
Lei



[email protected]

From: lars hofhansl
Date: 2014-01-18 11:18
To: [email protected]
Subject: Re: Re: How to quickly count the rows that meet several conditions 
using hbase coprocessor
Offhand there is no reason for that.
If you send some sample code that can seed the data and then run the filter 
that shows the problem, I'll offer to do some profiling.

Which version of HBase are you using?

-- Lars 


________________________________
From: "[email protected]" <[email protected]>
To: user <[email protected]> 
Cc: user <[email protected]> 
Sent: Friday, January 17, 2014 5:24 PM
Subject: Re: Re: How to quickly count the rows that meet several conditions 
using hbase coprocessor

Hi, 

I have tried.  
For a talbe with about 600 million rowkey,  just pass a single QualifierFilter, 
 it can get the result quickly. 
But when i add the SingleColumnValueFilter with FilterList, it becoumes very 
slow and i can't stand it. 

I think i can write my own custumed aggregation client.  Is there any example 
or user guide about how to write custumed aggregation client using coprocessor?

Thanks,
Lei




[email protected]

From: Ted Yu
Date: 2014-01-17 18:03
To: [email protected]
CC: user
Subject: Re: How to quickly count the rows that meet several conditions using 
hbase coprocessor
Take a look at 
http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html#rowCount(byte[],%20org.apache.hadoop.hbase.coprocessor.ColumnInterpreter,%20org.apache.hadoop.hbase.client.Scan)

You can pass custom filter through Scan parameter. 

Cheers

On Jan 16, 2014, at 11:58 PM, "[email protected]" <[email protected]> 
wrote:

> Hi,
> 
> I know that hbase copocessor provides a quick way to count the rows of a 
> table.
> But how can i count the rows that meet several conditions.
> 
> Take this for example. 
> I have a hbase table with one column family, several columns. I want to 
> caculate the number of rows that meet 3 conditions:
> has column1
> has column2
> has column3  and the value of column3 satisfy a regular expression
> 
> Thans,
> Lei
> 
> 
> 
> [email protected]

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

Reply via email to