Tomas,
If I understand you correctly you have a row key of A,B,C and you wan to fetch
only the rows on A and C
You can do a start row of A
And then do the end row of A1
So that you get the first row for the give vehicle_id, and then stop when the
vehicle_id changes.
You would then have to do a server side filter on values for C to get the
timestamp for a given day.
(You could do this with a client side filter, but that means pushing all the
data over the wire.)
[Note having said that, you could just do a client side filter since you only
have 115K rows and you're going to get a subset of that returned by the range
key.]
The idea of doing something like the following:
SELECT *
FROM TABLE
WHERE A=x
AND DAY(C) = y [or some variation]
{A and C are part of a composite index}
doesn't work in HBase.
If your key was ACB, meaning that Vehicle_id, timestamp, device_id was the
composite key, then you could do a start/stop range scan using A and C.
Sorry if I'm missing something since I jumped in the middle of a discussion.
-Mike
> Subject: RE: Something like Execution Plan as in the RDBMS world?
> Date: Thu, 4 Aug 2011 12:57:12 +0200
> From: [email protected]
> To: [email protected]; [email protected]
>
> Hi Andy and Ted!
>
> Thanks for your reply. Basically, I'm currently trying a range scan and a
> regex row filter on a very small table (~ 115K rows), just to get used to.
> Hadoop/HBase ... is running in the available Cloudera VM.
>
> I have the following row key, as already discussed in other threads.
>
> vehicle_id: up to 16 characters
> device_id: up to 16 characters
> timestamp: YYYYMMDDhhmmss
>
> Pretty much one row every 5 minutes for a particular vehicle and device.
>
> Now I want to get the rows for an entire day for a particular vehicle and
> device.
>
> The following range scan implementation:
>
> Scan scan = new Scan();
>
> String startKey =
> String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace('
> ', '0') // Vehicle ID
> + "-"
> + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT,
> "1").replace(' ', '0') // Device ID
> + "-"
> + "20110808000000";
> String endKey =
> String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace('
> ', '0') // Vehicle ID
> + "-"
> + String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT,
> "1").replace(' ', '0') // Device ID
> + "-"
> + "20110808235959";
> scan.setStartRow(Bytes.toBytes(startKey));
> scan.setStopRow(Bytes.toBytes(endKey));
> scan.addColumn(Bytes.toBytes("data_details"),
> Bytes.toBytes("temperature1_value"));
>
> Takes < 1 sec.
>
> Whereas the following regex based row filter implementation:
>
> List<Filter> filters = new ArrayList<Filter>();
> RowFilter rf = new RowFilter(
> CompareFilter.CompareOp.EQUAL
> , new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
> );
> filters.add(rf);
>
> QualifierFilter qf = new QualifierFilter(
> CompareFilter.CompareOp.EQUAL
> , new RegexStringComparator("temperature1_value")
> );
> filters.add(qf);
>
> FilterList filterList1 = new FilterList(filters);
> scan.setFilter(filterList1);
>
>
> Takes around 6 sec on a very small table.
>
>
> We aren't sure if we need the regex row filter capabilities at all or if
> range scans are sufficient for our access pattern. But a better understanding
> on how to optimize regex stuff would be helpful.
>
>
> Thanks!
>
> Thomas
>
>
> -----Original Message-----
> From: Andrew Purtell [mailto:[email protected]]
> Sent: Mittwoch, 27. Juli 2011 08:25
> To: [email protected]
> Subject: Re: Something like Execution Plan as in the RDBMS world?
>
> > Or is this a complete different thinking?
>
> Yes.
>
> There isn't an "execution plan" when using HBase, as that term is commonly
> understood from RDBMS systems. The commands you issue against HBase using the
> client API are executed in order as you issue them.
>
> > Depending on the access pattern, we might be in a situation to use
> >e.g. RegEx filters on rowkeys. I wonder if there is some kind of an
> >execution plan when running a HBase query to better understand
>
> Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics
> like number of store files read, how many keys examined, etc. is an
> interesting idea perhaps along the lines of what you ask, but HBase does not
> have support for that level of query performance introspection at the moment.
>
> What people do is measure the application metrics of interest and try
> different approaches to optimize them.
>
> Best regards,
>
>
> - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via
> Tom White)
>
>
> >________________________________
> >From: Steinmaurer Thomas <[email protected]>
> >To: [email protected]
> >Sent: Tuesday, July 26, 2011 11:10 PM
> >Subject: Something like Execution Plan as in the RDBMS world?
> >
> >Hello,
> >
> >
> >
> >we have a three part row-key taking into account that the first part is
> >important for distribution/partitioning when the system grows.
> >Depending on the access pattern, we might be in a situation to use e.g.
> >RegEx filters on rowkeys. I wonder if there is some kind of an
> >execution plan (as known in RDBMS) when running a HBase query to better
> >understand how HBase processes the query and what execution path it
> >takes to generate the result set.
> >
> >
> >
> >Or is this a complete different thinking?
> >
> >
> >
> >Thanks,
> >
> >Thomas
> >
> >
> >
> >
> >
> >