Whether you keep 24 column .Hbase will still store it as 24 records only.

There are benefits on use cases. Like breaking into column families , how 
updates behave for a particular rowkey if you go with 24 columns.


From: Pariksheet Barapatre [mailto:[email protected]]
Sent: Tuesday, June 23, 2015 7:07 PM
To: [email protected]
Subject: Re: Data Model Suggestion

Thanks Skanda for your suggestion. If we go with 24 columns , won't it reduce 
number of rows over large data set and will improve scan performance. If I want 
to get daily report, I just have to scan only one row.
Your thoughts?
Thanks
Pari

On 23 June 2015 at 18:57, Skanda 
<[email protected]<mailto:[email protected]>> wrote:
Hi Pari
For your use-case, having it as part of the rowkey should be a better design 
than creating so many columns.
Regards
Skanda

On Tue, Jun 23, 2015 at 6:49 PM, Pariksheet Barapatre 
<[email protected]<mailto:[email protected]>> wrote:
Hello All,

This is more like a HBase question but as I am planning to use Phoenix as a 
access layer, I hope phoenix user will help me.
I would like to create time series data to get on-the-fly analytics.
This use case is for adTech.
Report - what is houly,daily,weekly impression counts at country level for a 
given advertisement ID (ADID).
I am doing hourly aggregation and loading into a Phoenix table.
Primary Key - ADID          | COUNTRY       | HOUR ID

---------------------------------------------------------------------------------
ADID          | COUNTRY       | HOUR ID      |  CF.IMP  |
---------------------------------------------------------------------------------
1                | US                  | 2015062301<tel:2015062301>  | 3000     
   |
1                | US                  | 2015062302<tel:2015062302>  | 3421     
   |
1                | UK                  | 2015062302<tel:2015062302>  | 1212     
   |
---------------------------------------------------------------------------------
Is it a good schema design or shall I create alternate schema as below
Primary Key - ADID          | COUNTRY       | DAY ID
----------------------------------------------------------------------------------------------------
ADID          | COUNTRY       | DAY ID      |  CF.IMP01  | CF.IMP02  |
----------------------------------------------------------------------------------------------------
1                | US                  | 20150623  | 3000        | 3421         
|
1                | UK                  | 20150623  | NULL        | 1212         
 |
----------------------------------------------------------------------------------------------------
Here, I have taken hour part from hour ID and created 24 columns.
I gone through many time-series NoSQL blog posts, most the author suggest to go 
with wider rows as above. This will reduce the scan, but I don't see much 
difference in both Data Models in term of latency for scanning.
Can anybody please suggest good approach for my use case?

Cheers,
Pari




--
Cheers,
Pari

Reply via email to