Re: HBase + Phoenix for CDR

Matjaž Trtnik Tue, 07 Jul 2015 01:58:36 -0700

Vlad and Eli, thanks for your answer and comments.

1. Normally I do query by whole Anumber, meaning country code + operator id + 
user number but as you suggested I could just reverse everything and it should 
work well if I’ll reverse number entered by user.


2. What’s the suggested number of
Regarding syntax for table splitting I haven’t found any example in Phoenix but 
only for HBase.

create ‘mytable, ‘mycolumnfamlity’, {SPLITS=›

['10000000000000000000000000000000',
'20000000000000000000000000000000',
'30000000000000000000000000000000',
'40000000000000000000000000000000',
'50000000000000000000000000000000',
'60000000000000000000000000000000']}

Do I have to use full row key when defining split?
For example in my case where first 6 bytes represent reversed user number 
followed by oeprator and country code followed by other parts of row key - 
timestamp, job id and record number:

0000000000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000010000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000020000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000030000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000040000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000050000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000060000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000070000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000080000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
0000090000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00


3. I was thinking about this solution too but problem is that let’s say for 
same IMEI there can be multiple Anumbers. Imagine someone using same phone 
(IMEI) but changing sim card (IMSI/Anumber). What I was trying yesterday was to 
enforce using of index and it worked pretty well but I read this should be only 
used if result set is rather small. I think in our case result set is typically 
few hundred records, maximum could be few thousand records but that would 
happen rarely. Is it advisable to use index enforcing for such case?

4. For TTL is there a way to set TTL to record level? Because CDR has to expire 
in 1 year since it was created and not inserted into table. And some CDRs like 
roaming are coming later so for example it can happen that you get today CDR 
which is already 1 month old and it should expire in 11 months and not 12. I 
haven’t found any examples of setting TTL in Phoenix.


Another question I have is regarding IMMUTABLE_ROWS=true. It’s suggested to use 
this for append-only table with no updates. What about deletes? Can I use 
IMMUTABLE_ROWS=true if I delete records from table?



On 06 Jul 2015, at 20:32, Vladimir Rodionov 
<[email protected]<mailto:[email protected]>> wrote:

1. Unless you do query by Anumber prefix (country code + operator id) - reverse 
it : random 6 + operator id + country code. In this case you will not need 
salting row.
2. Presplit  table. Make sure you won't need to split table during normal 
operation.
3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by IMEI 
then run query by Anumber. This index is going to be much smaller.

Phoenix supports any table level configuration options, so you can specify TTL 
in your DDL statement

As for capacity planning, you can read:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html

-Vlad

As for capacity planning, please read HBase book


On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik 
<[email protected]<mailto:[email protected]>> wrote:
Hi fellow Phoenix users!

We are considering using HBase and Phoenix for CDR data retention. Number of 
records is around 50 million per day and we should keep them for about one 
year. I have played around a bit but would like to hear second opinion from 
people who have more experience so I have few questions:


  1.  Based on your experience can anyone recommend me approx number of nodes 
in cluster and hardware configuration of one node (RAM).
  2.  Regarding row key I was thinking of Anumber + timestamp + Bnumber + jobId 
+ recordIndex. Any other ideas? Do I need to use salting or no? Let’s assume 
aNumber in most cases start with first 5 digits the same (country + operator 
code), followed by 6 random digits for user number.
  3.  Searches are typically done by Anumber and timestamp but also some other 
criterias may apply, like IMEI or IMSI number. Do you suggest to have secondary 
indexes for that? I read that if using secondary index all columns in select 
statement should be included in index as well. Keeping in mind I’m returning 
almost all columns does this mean almost double of data for each index? Any 
other suggestions how to handle this?
  4.  For time stamp, do you suggest using LONG and storing epoch time or stick 
with DATE format?
  5.  Another request is that after some time we need to be able to efficiently 
delete all CDRs that are older than let’s say 1 year. Is design of row key 
still good for that as only argument here will be timestamp? Is it possible to 
use TTL with Phoenix?

Any other suggestions and advices how to design system are more than welcomed.

Thanks, Matjaz

Re: HBase + Phoenix for CDR

Reply via email to