Re: Schema and indexes for efficient time range queries

Yiannis Gkoufas Tue, 09 Jun 2015 02:27:30 -0700

Thanks a lot for your replies!
Will try the DATE field and change the order of the Composite Key.


On 8 June 2015 at 17:37, James Taylor <[email protected]> wrote:

> Both DATE and TIME have millisecond granularity (both are stored as 8
> byte longs), so I'd recommend using either of those. Phoenix also
> supports date arithmetic, so you can do queries like this to get the
> last weeks worth of data:
>
> SELECT * FROM SENSOR_DATA
> WHERE sid = 'ID1' AND dt > CURRENT_TIME() - 7;
>
> Also, built-in functions such as TRUNC(dt, 'HOUR') will let you group
> by in time "buckets", and TO_CHAR(dt) will convert to a string
> representation.
>
> I wouldn't recommend using a string-based representation for your
> date/time column.
>
> Will most of your queries include a sensor ID (or a small subset of
> sensor IDs)? If so, then it makes sense to lead your PK with sid. If
> most of your queries will be across all or a large percentage of your
> sensors, then it'd be better to lead with your date/time column (and
> salt the table to prevent write hot spotting).
>
> Thanks,
> James
>
> On Mon, Jun 8, 2015 at 9:17 AM, Vladimir Rodionov
> <[email protected]> wrote:
> > There are several time data types, natively supported by Phoenix: TIME is
> > probably most suitable for your case (it should have millisecond
> accuracy,
> > but you better check it yourself.)
> >
> > -Vlad
> >
> > On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas <[email protected]>
> > wrote:
> >>
> >> Hi Vladimir,
> >>
> >> thanks a lot for your input, just some followup questions:
> >> 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from
> >> https://phoenix.apache.org/language/datatypes.html right?
> >> 2) Would also string format be efficient? Like YYYYMMDDHHmm right?
> >>
> >> Thanks a lot!
> >>
> >> On 8 June 2015 at 16:48, Vladimir Rodionov <[email protected]>
> wrote:
> >>>
> >>> PRIMARY KEY(dt,sid)) won't work well for your query.
> >>> PRIMARY KEY(sid, dt)) is much better for time range queries for a
> >>> particular sensor. In a latter case this query will be translated into
> >>> efficient range scan.
> >>> Do not use bigint for timestamp, try to fit it into long or use
> >>> stringified version in a format suitable for byte-by-byte comparison.
> >>>
> >>> "2015 06/08 05:23:25.345"
> >>>
> >>> -Vlad
> >>>
> >>> On Mon, Jun 8, 2015 at 2:48 AM, Yiannis Gkoufas <[email protected]>
> >>> wrote:
> >>>>
> >>>> Hi there,
> >>>>
> >>>> I am investigating Phoenix as a potential data-store for time-series
> >>>> data on sensors.
> >>>> What I am really interested in, as a first milestone, is to have
> >>>> efficient time range queries for a particular sensor. From those
> queries the
> >>>> results would consist of 1 or 2 columns (so small rows).
> >>>> I was looking on some advice about the schema design and indexes.
> >>>> What I have done so far:
> >>>>
> >>>> CREATE TABLE VALUES (dt bigint not null,sid varchar not null, us
> double,
> >>>> gen double CONSTRAINT PK PRIMARY KEY(dt,sid)) COMPRESSION='GZ',
> >>>> SALT_BUCKETS=120
> >>>>
> >>>> loaded the data and performed queries like:
> >>>>
> >>>> SELECT DT,US,GEN FROM VALUES WHERE DT>=1369676800000 AND DT <=
> >>>> 1370000800000 AND SID='ID1'
> >>>>
> >>>> Is this the optimal way to go?
> >>>>
> >>>> Thanks a lot
> >>>
> >>>
> >>
> >
>

Re: Schema and indexes for efficient time range queries

Reply via email to