Hi there, I did try to reverse the order of the Composite Key and it worked! I changed BIGINT to UNSIGNED_LONG because I was facing some issues with the timezone of the DATE field. There is one small detail I cannot understand why is happening:
When I give the query: SELECT * FROM SENSOR_DATA WHERE DT>=1410677800000 AND DT <= 1500677800000 AND SID='ID1'; The time that it takes is: Rows read: 100, Elapsed time (seconds) - *Total: 15.462*, SQL query: 0.02, *Reading results: 15.442* When I give a more difficult query (in my head): SELECT * FROM SENSOR_DATA WHERE DT>=1410677800000 AND DT <= 1500677800000 AND SID='ID1' ORDER BY DT; The time that it takes is: Rows read: 100, Elapsed time (seconds) - *Total: 1.435*, SQL query: 0.014, *Reading results: 1.421* In either case I am interested in my application for the results of the second query, I was just curious about the difference in the performance. Thanks a lot! On 9 June 2015 at 10:26, Yiannis Gkoufas <[email protected]> wrote: > Thanks a lot for your replies! > Will try the DATE field and change the order of the Composite Key. > > On 8 June 2015 at 17:37, James Taylor <[email protected]> wrote: > >> Both DATE and TIME have millisecond granularity (both are stored as 8 >> byte longs), so I'd recommend using either of those. Phoenix also >> supports date arithmetic, so you can do queries like this to get the >> last weeks worth of data: >> >> SELECT * FROM SENSOR_DATA >> WHERE sid = 'ID1' AND dt > CURRENT_TIME() - 7; >> >> Also, built-in functions such as TRUNC(dt, 'HOUR') will let you group >> by in time "buckets", and TO_CHAR(dt) will convert to a string >> representation. >> >> I wouldn't recommend using a string-based representation for your >> date/time column. >> >> Will most of your queries include a sensor ID (or a small subset of >> sensor IDs)? If so, then it makes sense to lead your PK with sid. If >> most of your queries will be across all or a large percentage of your >> sensors, then it'd be better to lead with your date/time column (and >> salt the table to prevent write hot spotting). >> >> Thanks, >> James >> >> On Mon, Jun 8, 2015 at 9:17 AM, Vladimir Rodionov >> <[email protected]> wrote: >> > There are several time data types, natively supported by Phoenix: TIME >> is >> > probably most suitable for your case (it should have millisecond >> accuracy, >> > but you better check it yourself.) >> > >> > -Vlad >> > >> > On Mon, Jun 8, 2015 at 9:02 AM, Yiannis Gkoufas <[email protected]> >> > wrote: >> >> >> >> Hi Vladimir, >> >> >> >> thanks a lot for your input, just some followup questions: >> >> 1) When you say "try to fit it in long" you mean UNSIGNED_LONG from >> >> https://phoenix.apache.org/language/datatypes.html right? >> >> 2) Would also string format be efficient? Like YYYYMMDDHHmm right? >> >> >> >> Thanks a lot! >> >> >> >> On 8 June 2015 at 16:48, Vladimir Rodionov <[email protected]> >> wrote: >> >>> >> >>> PRIMARY KEY(dt,sid)) won't work well for your query. >> >>> PRIMARY KEY(sid, dt)) is much better for time range queries for a >> >>> particular sensor. In a latter case this query will be translated into >> >>> efficient range scan. >> >>> Do not use bigint for timestamp, try to fit it into long or use >> >>> stringified version in a format suitable for byte-by-byte comparison. >> >>> >> >>> "2015 06/08 05:23:25.345" >> >>> >> >>> -Vlad >> >>> >> >>> On Mon, Jun 8, 2015 at 2:48 AM, Yiannis Gkoufas <[email protected] >> > >> >>> wrote: >> >>>> >> >>>> Hi there, >> >>>> >> >>>> I am investigating Phoenix as a potential data-store for time-series >> >>>> data on sensors. >> >>>> What I am really interested in, as a first milestone, is to have >> >>>> efficient time range queries for a particular sensor. From those >> queries the >> >>>> results would consist of 1 or 2 columns (so small rows). >> >>>> I was looking on some advice about the schema design and indexes. >> >>>> What I have done so far: >> >>>> >> >>>> CREATE TABLE VALUES (dt bigint not null,sid varchar not null, us >> double, >> >>>> gen double CONSTRAINT PK PRIMARY KEY(dt,sid)) COMPRESSION='GZ', >> >>>> SALT_BUCKETS=120 >> >>>> >> >>>> loaded the data and performed queries like: >> >>>> >> >>>> SELECT DT,US,GEN FROM VALUES WHERE DT>=1369676800000 AND DT <= >> >>>> 1370000800000 AND SID='ID1' >> >>>> >> >>>> Is this the optimal way to go? >> >>>> >> >>>> Thanks a lot >> >>> >> >>> >> >> >> > >> > >
