Hi Pari For your use-case, having it as part of the rowkey should be a better design than creating so many columns.
Regards Skanda On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre <[email protected]> wrote: > Hello All, > > This is more like a HBase question but as I am planning to use Phoenix as > a access layer, I hope phoenix user will help me. > > I would like to create time series data to get on-the-fly analytics. > > This use case is for adTech. > > Report - what is houly,daily,weekly impression counts at country level for > a given advertisement ID (ADID). > > I am doing hourly aggregation and loading into a Phoenix table. > > Primary Key - *ADID | COUNTRY | HOUR ID* > > > --------------------------------------------------------------------------------- > *ADID | COUNTRY | HOUR ID* | CF.IMP | > > --------------------------------------------------------------------------------- > 1 | US | 2015062301 | 3000 | > 1 | US | 2015062302 | 3421 | > 1 | UK | 2015062302 | 1212 | > > --------------------------------------------------------------------------------- > > Is it a good schema design or shall I create alternate schema as below > Primary Key - *ADID | COUNTRY | DAY ID* > > ---------------------------------------------------------------------------------------------------- > *ADID | COUNTRY | DAY ID* | CF.IMP*01* | CF.IMP*02* > | > > ---------------------------------------------------------------------------------------------------- > 1 | US | 20150623 | 3000 | > 3421 | > 1 | UK | 20150623 | NULL | > 1212 | > > ---------------------------------------------------------------------------------------------------- > Here, I have taken hour part from hour ID and created 24 columns. > > I gone through many time-series NoSQL blog posts, most the author suggest > to go with wider rows as above. This will reduce the scan, but I don't see > much difference in both Data Models in term of latency for scanning. > > Can anybody please suggest good approach for my use case? > > > Cheers, > Pari >
