Your first scan is set to start with an offset in the table and has a stop row, whereas the other one is a full table scan with filtering (if my understanding is correct). Compare the second scan with a straight up full table scan and you should see where the slowdown comes from (should be from the fact that it has to read everything).
Hope that helps, J-D On Thu, Aug 4, 2011 at 3:57 AM, Steinmaurer Thomas <[email protected]> wrote: > Hi Andy and Ted! > > Thanks for your reply. Basically, I'm currently trying a range scan and a > regex row filter on a very small table (~ 115K rows), just to get used to. > Hadoop/HBase ... is running in the available Cloudera VM. > > I have the following row key, as already discussed in other threads. > > vehicle_id: up to 16 characters > device_id: up to 16 characters > timestamp: YYYYMMDDhhmmss > > Pretty much one row every 5 minutes for a particular vehicle and device. > > Now I want to get the rows for an entire day for a particular vehicle and > device. > > The following range scan implementation: > > Scan scan = new Scan(); > > String startKey = > String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, > "57").replace(' ', '0') // Vehicle ID > + "-" > + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, > "1").replace(' ', '0') // Device ID > + "-" > + "20110808000000"; > String endKey = > String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, > "57").replace(' ', '0') // Vehicle ID > + "-" > + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, > "1").replace(' ', '0') // Device ID > + "-" > + "20110808235959"; > scan.setStartRow(Bytes.toBytes(startKey)); > scan.setStopRow(Bytes.toBytes(endKey)); > scan.addColumn(Bytes.toBytes("data_details"), > Bytes.toBytes("temperature1_value")); > > Takes < 1 sec. > > Whereas the following regex based row filter implementation: > > List<Filter> filters = new ArrayList<Filter>(); > RowFilter rf = new RowFilter( > CompareFilter.CompareOp.EQUAL > , new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}") > ); > filters.add(rf); > > QualifierFilter qf = new QualifierFilter( > CompareFilter.CompareOp.EQUAL > , new RegexStringComparator("temperature1_value") > ); > filters.add(qf); > > FilterList filterList1 = new FilterList(filters); > scan.setFilter(filterList1); > > > Takes around 6 sec on a very small table. > > > We aren't sure if we need the regex row filter capabilities at all or if > range scans are sufficient for our access pattern. But a better understanding > on how to optimize regex stuff would be helpful. > > > Thanks! > > Thomas > > > -----Original Message----- > From: Andrew Purtell [mailto:[email protected]] > Sent: Mittwoch, 27. Juli 2011 08:25 > To: [email protected] > Subject: Re: Something like Execution Plan as in the RDBMS world? > >> Or is this a complete different thinking? > > Yes. > > There isn't an "execution plan" when using HBase, as that term is commonly > understood from RDBMS systems. The commands you issue against HBase using the > client API are executed in order as you issue them. > >> Depending on the access pattern, we might be in a situation to use >>e.g. RegEx filters on rowkeys. I wonder if there is some kind of an >>execution plan when running a HBase query to better understand > > Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics > like number of store files read, how many keys examined, etc. is an > interesting idea perhaps along the lines of what you ask, but HBase does not > have support for that level of query performance introspection at the moment. > > What people do is measure the application metrics of interest and try > different approaches to optimize them. > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >>________________________________ >>From: Steinmaurer Thomas <[email protected]> >>To: [email protected] >>Sent: Tuesday, July 26, 2011 11:10 PM >>Subject: Something like Execution Plan as in the RDBMS world? >> >>Hello, >> >> >> >>we have a three part row-key taking into account that the first part is >>important for distribution/partitioning when the system grows. >>Depending on the access pattern, we might be in a situation to use e.g. >>RegEx filters on rowkeys. I wonder if there is some kind of an >>execution plan (as known in RDBMS) when running a HBase query to better >>understand how HBase processes the query and what execution path it >>takes to generate the result set. >> >> >> >>Or is this a complete different thinking? >> >> >> >>Thanks, >> >>Thomas >> >> >> >> >> >> >
