Hi Andy and Ted!

Thanks for your reply. Basically, I'm currently trying a range scan and a regex 
row filter on a very small table (~ 115K rows), just to get used to. 
Hadoop/HBase ... is running in the available Cloudera VM.

I have the following row key, as already discussed in other threads.

vehicle_id: up to 16 characters
device_id: up to 16 characters
timestamp: YYYYMMDDhhmmss

Pretty much one row every 5 minutes for a particular vehicle and device.

Now I want to get the rows for an entire day for a particular vehicle and 
device.

The following range scan implementation:

        Scan scan = new Scan();

        String startKey =
                String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' 
', '0') // Vehicle ID
                + "-"
                + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
"1").replace(' ', '0') // Device ID
                + "-"
                + "20110808000000";
        String endKey =
                String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' 
', '0') // Vehicle ID
                + "-"
                + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
"1").replace(' ', '0') // Device ID
                + "-"
                + "20110808235959";
        scan.setStartRow(Bytes.toBytes(startKey));
        scan.setStopRow(Bytes.toBytes(endKey));
        scan.addColumn(Bytes.toBytes("data_details"), 
Bytes.toBytes("temperature1_value"));

Takes < 1 sec.

Whereas the following regex based row filter implementation:

        List<Filter> filters = new ArrayList<Filter>();
        RowFilter rf = new RowFilter(
                CompareFilter.CompareOp.EQUAL
                , new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
        );
        filters.add(rf);
        
        QualifierFilter qf = new QualifierFilter(
                CompareFilter.CompareOp.EQUAL
                , new RegexStringComparator("temperature1_value")
        );
        filters.add(qf);
        
        FilterList filterList1 = new FilterList(filters);
        scan.setFilter(filterList1);


Takes around 6 sec on a very small table.


We aren't sure if we need the regex row filter capabilities at all or if range 
scans are sufficient for our access pattern. But a better understanding on how 
to optimize regex stuff would be helpful.


Thanks!

Thomas


-----Original Message-----
From: Andrew Purtell [mailto:[email protected]] 
Sent: Mittwoch, 27. Juli 2011 08:25
To: [email protected]
Subject: Re: Something like Execution Plan as in the RDBMS world?

> Or is this a complete different thinking?

Yes.

There isn't an "execution plan" when using HBase, as that term is commonly 
understood from RDBMS systems. The commands you issue against HBase using the 
client API are executed in order as you issue them.

> Depending on the access pattern, we might be in a situation to use 
>e.g. RegEx filters on rowkeys. I wonder if there is some kind of an 
>execution plan when running a HBase query to better understand

Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics 
like number of store files read, how many keys examined, etc. is an interesting 
idea perhaps along the lines of what you ask, but HBase does not have support 
for that level of query performance introspection at the moment. 

What people do is measure the application metrics of interest and try different 
approaches to optimize them.

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


>________________________________
>From: Steinmaurer Thomas <[email protected]>
>To: [email protected]
>Sent: Tuesday, July 26, 2011 11:10 PM
>Subject: Something like Execution Plan as in the RDBMS world?
>
>Hello,
>
>
>
>we have a three part row-key taking into account that the first part is 
>important for distribution/partitioning when the system grows. 
>Depending on the access pattern, we might be in a situation to use e.g. 
>RegEx filters on rowkeys. I wonder if there is some kind of an 
>execution plan (as known in RDBMS) when running a HBase query to better 
>understand how HBase processes the query and what execution path it 
>takes to generate the result set.
>
>
>
>Or is this a complete different thinking?
>
>
>
>Thanks,
>
>Thomas
>
>
>
>
>
>

Reply via email to