>From performance point of view, less columns is better when doing scan. The more columns you have in a filter - the more comparisons HBase must perform to decide if it needs skip or include particular cell into a result set.
-Vlad On Tue, Jun 23, 2015 at 10:01 AM, James Taylor <[email protected]> wrote: > Perhaps I'm missing something, but I'm not sure how 24 additional > KeyValue columns is going to help performance. Sid would be a good one > to chime in, as he's done a lot of work with time series data for the > new metrics framework of Ambari (which uses Phoenix underneath). > > Will you have queries that lookup an AD_INFO row based on AD_ID? If > not, lead with AD_TS and salt your table. How about something like > this? > > CREATE TABLE AD_INFO ( > AD_TS DATE NOT NULL, > AD_ID BIGINT NOT NULL, > CONSTRAINT PK PRIMARY KEY (AD_TS, AD_ID) ) > IMMUTABLE_ROWS=true, SALT_BUCKETS=64; > > (This assumes that you won't be updating a row in-place). > > You can run queries that bucketize by hour like this (over last 7 days > worth of data): > > SELECT count(*) FROM AD_INFO > WHERE AD_TS > CURRENT_DATE() - 7 > GROUP BY TRUNC(AD_TS, 'HOUR'); > > If need be, you can add a functional index on AD_TS based on the hour > of the day and Phoenix will maintain a separate table for you (and use > it automatically when you're filtering on HOUR(AD_TS). This will have > the same number of rows as the original, but the rows for the same > hour of the day will be grouped together. > > CREATE INDEX hr_of_day_idx ON AD_INFO(HOUR(AD_TS)); > > If this doesn't perform well enough, then another alternative would be > for you to maintain a separate rollup table manually. > > HTH. Thanks, > > James > > > > On Tue, Jun 23, 2015 at 8:38 AM, Puneet Kumar Ojha > <[email protected]> wrote: > > Yes …you are correct. Performance will be better with 24 columns. Better > add > > one more column(25th) to keep daily aggregates for 24 hours. This will > give > > a better performance boost. > > > > > > > > From: Pariksheet Barapatre [mailto:[email protected]] > > Sent: Tuesday, June 23, 2015 8:07 PM > > To: [email protected] > > Subject: Re: Data Model Suggestion > > > > > > > > Hi Puneet, > > > > Upsert could be easy - > > > > e.g. If we want to load data for 2015062412 > > > > --------------------------------------------------- > > > > ADID | COUNTRY | HOUR ID | CF.IMP | > > > > > --------------------------------------------------------------------------------- > > > > 1 | US | 2015062412 | 2000 | > > > > > > > > Then upsert will be - > > > > UPSERT INTO myTABLE (ADID,COUNTRY,DAYID,IMP12) > VALUES(1,'US',20150624,2000) > > ; > > > > > > > > Your point is true that HBase will store 24 key-values. With one of ebay > > blogs it says, think your key value as sorted map. > > > > > > SortedMap<RowKey, SortedMap<ColumnKey, ColumnValue>> > > > > If we refer this, I guess keeping 24 columns does make a difference > versus > > hour id in row key. > > > > > > > > Thanks > > > > Pari > > > > > > > > On 23 June 2015 at 19:10, Puneet Kumar Ojha <[email protected]> > > wrote: > > > > Whether you keep 24 column .Hbase will still store it as 24 records only. > > > > > > > > There are benefits on use cases. Like breaking into column families , how > > updates behave for a particular rowkey if you go with 24 columns. > > > > > > > > > > > > From: Pariksheet Barapatre [mailto:[email protected]] > > Sent: Tuesday, June 23, 2015 7:07 PM > > To: [email protected] > > Subject: Re: Data Model Suggestion > > > > > > > > Thanks Skanda for your suggestion. If we go with 24 columns , won't it > > reduce number of rows over large data set and will improve scan > performance. > > If I want to get daily report, I just have to scan only one row. > > > > Your thoughts? > > > > Thanks > > > > Pari > > > > > > > > On 23 June 2015 at 18:57, Skanda <[email protected]> wrote: > > > > Hi Pari > > > > For your use-case, having it as part of the rowkey should be a better > design > > than creating so many columns. > > > > Regards > > > > Skanda > > > > > > > > On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre < > [email protected]> > > wrote: > > > > Hello All, > > > > > > > > This is more like a HBase question but as I am planning to use Phoenix > as a > > access layer, I hope phoenix user will help me. > > > > I would like to create time series data to get on-the-fly analytics. > > > > This use case is for adTech. > > > > Report - what is houly,daily,weekly impression counts at country level > for a > > given advertisement ID (ADID). > > > > I am doing hourly aggregation and loading into a Phoenix table. > > > > Primary Key - ADID | COUNTRY | HOUR ID > > > > > > > --------------------------------------------------------------------------------- > > > > ADID | COUNTRY | HOUR ID | CF.IMP | > > > > > --------------------------------------------------------------------------------- > > > > 1 | US | 2015062301 | 3000 | > > > > 1 | US | 2015062302 | 3421 | > > > > 1 | UK | 2015062302 | 1212 | > > > --------------------------------------------------------------------------------- > > > > Is it a good schema design or shall I create alternate schema as below > > Primary Key - ADID | COUNTRY | DAY ID > > > ---------------------------------------------------------------------------------------------------- > > > > ADID | COUNTRY | DAY ID | CF.IMP01 | CF.IMP02 | > > > > > ---------------------------------------------------------------------------------------------------- > > > > 1 | US | 20150623 | 3000 | 3421 > > | > > > > 1 | UK | 20150623 | NULL | 1212 > > | > > > > > ---------------------------------------------------------------------------------------------------- > > > > Here, I have taken hour part from hour ID and created 24 columns. > > > > I gone through many time-series NoSQL blog posts, most the author > suggest to > > go with wider rows as above. This will reduce the scan, but I don't see > much > > difference in both Data Models in term of latency for scanning. > > > > Can anybody please suggest good approach for my use case? > > > > > > > > Cheers, > > > > Pari > > > > > > > > > > > > > > -- > > > > Cheers, > > > > Pari > > > > > > > > > > -- > > > > Cheers, > > > > Pari >
