Thanks a lot for your replies! Will try the DATE field and change the order of the Composite Key.
On 8 June 2015 at 17:37, James Taylor <[email protected]> wrote: > Both DATE and TIME have millisecond granularity (both are stored as 8 > byte longs), so I'd recommend using either of those. Phoenix also > supports date arithmetic, so you can do queries like this to get the > last weeks worth of data: > > SELECT * FROM SENSOR_DATA > WHERE sid = 'ID1' AND dt > CURRENT_TIME() - 7; > > Also, built-in functions such as TRUNC(dt, 'HOUR') will let you group > by in time "buckets", and TO_CHAR(dt) will convert to a string > representation. > > I wouldn't recommend using a string-based representation for your > date/time column. > > Will most of your queries include a sensor ID (or a small subset of > sensor IDs)? If so, then it makes sense to lead your PK with sid. If > most of your queries will be across all or a large percentage of your > sensors, then it'd be better to lead with your date/time column (and > salt the table to prevent write hot spotting). > > Thanks, > James > > On Mon, Jun 8, 2015 at 9:17 AM, Vladimir Rodionov > <[email protected]> wrote: > > There are several time data types, natively supported by Phoenix: TIME is > > probably most suitable for your case (it should have millisecond > accuracy, > > but you better check it yourself.) > > > > -Vlad > > > > On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas <[email protected]> > > wrote: > >> > >> Hi Vladimir, > >> > >> thanks a lot for your input, just some followup questions: > >> 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from > >> https://phoenix.apache.org/language/datatypes.html right? > >> 2) Would also string format be efficient? Like YYYYMMDDHHmm right? > >> > >> Thanks a lot! > >> > >> On 8 June 2015 at 16:48, Vladimir Rodionov <[email protected]> > wrote: > >>> > >>> PRIMARY KEY(dt,sid)) won't work well for your query. > >>> PRIMARY KEY(sid, dt)) is much better for time range queries for a > >>> particular sensor. In a latter case this query will be translated into > >>> efficient range scan. > >>> Do not use bigint for timestamp, try to fit it into long or use > >>> stringified version in a format suitable for byte-by-byte comparison. > >>> > >>> "2015 06/08 05:23:25.345" > >>> > >>> -Vlad > >>> > >>> On Mon, Jun 8, 2015 at 2:48 AM, Yiannis Gkoufas <[email protected]> > >>> wrote: > >>>> > >>>> Hi there, > >>>> > >>>> I am investigating Phoenix as a potential data-store for time-series > >>>> data on sensors. > >>>> What I am really interested in, as a first milestone, is to have > >>>> efficient time range queries for a particular sensor. From those > queries the > >>>> results would consist of 1 or 2 columns (so small rows). > >>>> I was looking on some advice about the schema design and indexes. > >>>> What I have done so far: > >>>> > >>>> CREATE TABLE VALUES (dt bigint not null,sid varchar not null, us > double, > >>>> gen double CONSTRAINT PK PRIMARY KEY(dt,sid)) COMPRESSION='GZ', > >>>> SALT_BUCKETS=120 > >>>> > >>>> loaded the data and performed queries like: > >>>> > >>>> SELECT DT,US,GEN FROM VALUES WHERE DT>=1369676800000 AND DT <= > >>>> 1370000800000 AND SID='ID1' > >>>> > >>>> Is this the optimal way to go? > >>>> > >>>> Thanks a lot > >>> > >>> > >> > > >
