Re: Schema and indexes for efficient time range queries

James Taylor Mon, 08 Jun 2015 09:38:31 -0700

Both DATE and TIME have millisecond granularity (both are stored as 8
byte longs), so I'd recommend using either of those. Phoenix also
supports date arithmetic, so you can do queries like this to get the
last weeks worth of data:


SELECT * FROM SENSOR_DATA
WHERE sid = 'ID1' AND dt > CURRENT_TIME() - 7;

Also, built-in functions such as TRUNC(dt, 'HOUR') will let you group
by in time "buckets", and TO_CHAR(dt) will convert to a string
representation.

I wouldn't recommend using a string-based representation for your
date/time column.

Will most of your queries include a sensor ID (or a small subset of
sensor IDs)? If so, then it makes sense to lead your PK with sid. If
most of your queries will be across all or a large percentage of your
sensors, then it'd be better to lead with your date/time column (and
salt the table to prevent write hot spotting).

Thanks,
James

On Mon, Jun 8, 2015 at 9:17 AM, Vladimir Rodionov
<[email protected]> wrote:
> There are several time data types, natively supported by Phoenix: TIME is
> probably most suitable for your case (it should have millisecond accuracy,
> but you better check it yourself.)
>
> -Vlad
>
> On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas <[email protected]>
> wrote:
>>
>> Hi Vladimir,
>>
>> thanks a lot for your input, just some followup questions:
>> 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from
>> https://phoenix.apache.org/language/datatypes.html right?
>> 2) Would also string format be efficient? Like YYYYMMDDHHmm right?
>>
>> Thanks a lot!
>>
>> On 8 June 2015 at 16:48, Vladimir Rodionov <[email protected]> wrote:
>>>
>>> PRIMARY KEY(dt,sid)) won't work well for your query.
>>> PRIMARY KEY(sid, dt)) is much better for time range queries for a
>>> particular sensor. In a latter case this query will be translated into
>>> efficient range scan.
>>> Do not use bigint for timestamp, try to fit it into long or use
>>> stringified version in a format suitable for byte-by-byte comparison.
>>>
>>> "2015 06/08 05:23:25.345"
>>>
>>> -Vlad
>>>
>>> On Mon, Jun 8, 2015 at 2:48 AM, Yiannis Gkoufas <[email protected]>
>>> wrote:
>>>>
>>>> Hi there,
>>>>
>>>> I am investigating Phoenix as a potential data-store for time-series
>>>> data on sensors.
>>>> What I am really interested in, as a first milestone, is to have
>>>> efficient time range queries for a particular sensor. From those queries 
>>>> the
>>>> results would consist of 1 or 2 columns (so small rows).
>>>> I was looking on some advice about the schema design and indexes.
>>>> What I have done so far:
>>>>
>>>> CREATE TABLE VALUES (dt bigint not null,sid varchar not null, us double,
>>>> gen double CONSTRAINT PK PRIMARY KEY(dt,sid)) COMPRESSION='GZ',
>>>> SALT_BUCKETS=120
>>>>
>>>> loaded the data and performed queries like:
>>>>
>>>> SELECT DT,US,GEN FROM VALUES WHERE DT>=1369676800000 AND DT <=
>>>> 1370000800000 AND SID='ID1'
>>>>
>>>> Is this the optimal way to go?
>>>>
>>>> Thanks a lot
>>>
>>>
>>
>

Re: Schema and indexes for efficient time range queries

Reply via email to