Both DATE and TIME have millisecond granularity (both are stored as 8 byte longs), so I'd recommend using either of those. Phoenix also supports date arithmetic, so you can do queries like this to get the last weeks worth of data:
SELECT * FROM SENSOR_DATA WHERE sid = 'ID1' AND dt > CURRENT_TIME() - 7; Also, built-in functions such as TRUNC(dt, 'HOUR') will let you group by in time "buckets", and TO_CHAR(dt) will convert to a string representation. I wouldn't recommend using a string-based representation for your date/time column. Will most of your queries include a sensor ID (or a small subset of sensor IDs)? If so, then it makes sense to lead your PK with sid. If most of your queries will be across all or a large percentage of your sensors, then it'd be better to lead with your date/time column (and salt the table to prevent write hot spotting). Thanks, James On Mon, Jun 8, 2015 at 9:17 AM, Vladimir Rodionov <[email protected]> wrote: > There are several time data types, natively supported by Phoenix: TIME is > probably most suitable for your case (it should have millisecond accuracy, > but you better check it yourself.) > > -Vlad > > On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas <[email protected]> > wrote: >> >> Hi Vladimir, >> >> thanks a lot for your input, just some followup questions: >> 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from >> https://phoenix.apache.org/language/datatypes.html right? >> 2) Would also string format be efficient? Like YYYYMMDDHHmm right? >> >> Thanks a lot! >> >> On 8 June 2015 at 16:48, Vladimir Rodionov <[email protected]> wrote: >>> >>> PRIMARY KEY(dt,sid)) won't work well for your query. >>> PRIMARY KEY(sid, dt)) is much better for time range queries for a >>> particular sensor. In a latter case this query will be translated into >>> efficient range scan. >>> Do not use bigint for timestamp, try to fit it into long or use >>> stringified version in a format suitable for byte-by-byte comparison. >>> >>> "2015 06/08 05:23:25.345" >>> >>> -Vlad >>> >>> On Mon, Jun 8, 2015 at 2:48 AM, Yiannis Gkoufas <[email protected]> >>> wrote: >>>> >>>> Hi there, >>>> >>>> I am investigating Phoenix as a potential data-store for time-series >>>> data on sensors. >>>> What I am really interested in, as a first milestone, is to have >>>> efficient time range queries for a particular sensor. From those queries >>>> the >>>> results would consist of 1 or 2 columns (so small rows). >>>> I was looking on some advice about the schema design and indexes. >>>> What I have done so far: >>>> >>>> CREATE TABLE VALUES (dt bigint not null,sid varchar not null, us double, >>>> gen double CONSTRAINT PK PRIMARY KEY(dt,sid)) COMPRESSION='GZ', >>>> SALT_BUCKETS=120 >>>> >>>> loaded the data and performed queries like: >>>> >>>> SELECT DT,US,GEN FROM VALUES WHERE DT>=1369676800000 AND DT <= >>>> 1370000800000 AND SID='ID1' >>>> >>>> Is this the optimal way to go? >>>> >>>> Thanks a lot >>> >>> >> >
