RE: Something like Execution Plan as in the RDBMS world?

Michael Segel Thu, 04 Aug 2011 15:05:48 -0700

Tomas,

If I understand you correctly you have a row key of A,B,C and you wan to fetch 
only the rows on A and C 
You can do a start row of A 
And then do the end row of A1


So that you get the first row for the give vehicle_id, and then stop when the 
vehicle_id changes.

You would then have to do a server side filter on values for C to get the 
timestamp for a given day.
(You could do this with a client side filter, but that means pushing all the 
data over the wire.) 
[Note having said that, you could just do a client side filter since you only 
have 115K rows and you're going to get a subset of that returned by the range 
key.]

The idea of doing something like the following:
SELECT * 
FROM TABLE 
WHERE A=x
AND DAY(C) = y [or some variation]
{A and C are part of a composite index}

doesn't work in HBase.

If your key was ACB, meaning that Vehicle_id, timestamp, device_id  was the 
composite key, then you could do a start/stop range scan using A and C.

Sorry if I'm missing something since I jumped in the middle of a discussion.

-Mike


> Subject: RE: Something like Execution Plan as in the RDBMS world?
> Date: Thu, 4 Aug 2011 12:57:12 +0200
> From: [email protected]
> To: [email protected]; [email protected]
> 
> Hi Andy and Ted!
> 
> Thanks for your reply. Basically, I'm currently trying a range scan and a 
> regex row filter on a very small table (~ 115K rows), just to get used to. 
> Hadoop/HBase ... is running in the available Cloudera VM.
> 
> I have the following row key, as already discussed in other threads.
> 
> vehicle_id: up to 16 characters
> device_id: up to 16 characters
> timestamp: YYYYMMDDhhmmss
> 
> Pretty much one row every 5 minutes for a particular vehicle and device.
> 
> Now I want to get the rows for an entire day for a particular vehicle and 
> device.
> 
> The following range scan implementation:
> 
>       Scan scan = new Scan();
> 
>       String startKey =
>               String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' 
> ', '0') // Vehicle ID
>               + "-"
>               + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "1").replace(' ', '0') // Device ID
>               + "-"
>               + "20110808000000";
>       String endKey =
>               String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' 
> ', '0') // Vehicle ID
>               + "-"
>               + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, 
> "1").replace(' ', '0') // Device ID
>               + "-"
>               + "20110808235959";
>       scan.setStartRow(Bytes.toBytes(startKey));
>       scan.setStopRow(Bytes.toBytes(endKey));
>       scan.addColumn(Bytes.toBytes("data_details"), 
> Bytes.toBytes("temperature1_value"));
> 
> Takes < 1 sec.
> 
> Whereas the following regex based row filter implementation:
> 
>       List<Filter> filters = new ArrayList<Filter>();
>       RowFilter rf = new RowFilter(
>               CompareFilter.CompareOp.EQUAL
>               , new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
>       );
>       filters.add(rf);
>       
>       QualifierFilter qf = new QualifierFilter(
>               CompareFilter.CompareOp.EQUAL
>               , new RegexStringComparator("temperature1_value")
>       );
>       filters.add(qf);
>       
>       FilterList filterList1 = new FilterList(filters);
>       scan.setFilter(filterList1);
> 
> 
> Takes around 6 sec on a very small table.
> 
> 
> We aren't sure if we need the regex row filter capabilities at all or if 
> range scans are sufficient for our access pattern. But a better understanding 
> on how to optimize regex stuff would be helpful.
> 
> 
> Thanks!
> 
> Thomas
> 
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:[email protected]] 
> Sent: Mittwoch, 27. Juli 2011 08:25
> To: [email protected]
> Subject: Re: Something like Execution Plan as in the RDBMS world?
> 
> > Or is this a complete different thinking?
> 
> Yes.
> 
> There isn't an "execution plan" when using HBase, as that term is commonly 
> understood from RDBMS systems. The commands you issue against HBase using the 
> client API are executed in order as you issue them.
> 
> > Depending on the access pattern, we might be in a situation to use 
> >e.g. RegEx filters on rowkeys. I wonder if there is some kind of an 
> >execution plan when running a HBase query to better understand
> 
> Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics 
> like number of store files read, how many keys examined, etc. is an 
> interesting idea perhaps along the lines of what you ask, but HBase does not 
> have support for that level of query performance introspection at the moment. 
> 
> What people do is measure the application metrics of interest and try 
> different approaches to optimize them.
> 
> Best regards,
> 
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
> 
> 
> >________________________________
> >From: Steinmaurer Thomas <[email protected]>
> >To: [email protected]
> >Sent: Tuesday, July 26, 2011 11:10 PM
> >Subject: Something like Execution Plan as in the RDBMS world?
> >
> >Hello,
> >
> >
> >
> >we have a three part row-key taking into account that the first part is 
> >important for distribution/partitioning when the system grows. 
> >Depending on the access pattern, we might be in a situation to use e.g. 
> >RegEx filters on rowkeys. I wonder if there is some kind of an 
> >execution plan (as known in RDBMS) when running a HBase query to better 
> >understand how HBase processes the query and what execution path it 
> >takes to generate the result set.
> >
> >
> >
> >Or is this a complete different thinking?
> >
> >
> >
> >Thanks,
> >
> >Thomas
> >
> >
> >
> >
> >
> >

RE: Something like Execution Plan as in the RDBMS world?

Reply via email to