Re: HBase + Phoenix for CDR

James Taylor Tue, 07 Jul 2015 09:50:39 -0700

Actually, we support deletes on tables with IMMUTABLE_ROWS=true ( I believe
as of 4.2 release), as long as you're not filtering on a column not
contained in the index.


On Tuesday, July 7, 2015, Vladimir Rodionov <[email protected]> wrote:

> Phoenix grammar contains examples of usage. For example, create table:
> https://phoenix.apache.org/language/index.html#create_table
>
> You can not specify TTL per record. I suggest you using 1 year for whole
> table and additional logic inside your application to filter expired rows
> out.
>
> When you set IMMUTABLE_ROWS=true no updates and deletes are allowed. Your
> only option in this case - rely on TTL , or drop table entirely.
>
>
> Optimal number of splits depends on a size of a cluster, max region size
> and projected data store size - you will need to do some math here.
>
> -Vlad
>
> On Tue, Jul 7, 2015 at 1:57 AM, Matjaž Trtnik <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>>  Vlad and Eli, thanks for your answer and comments.
>>
>> 1. Normally I do query by whole Anumber, meaning country code + operator
>> id + user number but as you suggested I could just reverse everything and
>> it should work well if I’ll reverse number entered by user.
>>
>>  2. What’s the suggested number of
>> Regarding syntax for table splitting I haven’t found any example in
>> Phoenix but only for HBase.
>>
>>  create ‘mytable, ‘mycolumnfamlity’, {SPLITS=›
>>
>> ['10000000000000000000000000000000',
>> '20000000000000000000000000000000',
>> '30000000000000000000000000000000',
>> '40000000000000000000000000000000',
>> '50000000000000000000000000000000',
>> '60000000000000000000000000000000']}
>>  Do I have to use full row key when defining split?
>> For example in my case where first 6 bytes represent reversed user number
>> followed by oeprator and country code followed by other parts of row key -
>> timestamp, job id and record number:
>>
>>
>> 0000000000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000010000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000020000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000030000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000040000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000050000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000060000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000070000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000080000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>> 0000090000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>>
>>
>>  3. I was thinking about this solution too but problem is that let’s say
>> for same IMEI there can be multiple Anumbers. Imagine someone using same
>> phone (IMEI) but changing sim card (IMSI/Anumber). What I was trying
>> yesterday was to enforce using of index and it worked pretty well but I
>> read this should be only used if result set is rather small. I think in our
>> case result set is typically few hundred records, maximum could be few
>> thousand records but that would happen rarely. Is it advisable to use index
>> enforcing for such case?
>>
>>  4. For TTL is there a way to set TTL to record level? Because CDR has
>> to expire in 1 year since it was created and not inserted into table. And
>> some CDRs like roaming are coming later so for example it can happen that
>> you get today CDR which is already 1 month old and it should expire in 11
>> months and not 12. I haven’t found any examples of setting TTL in Phoenix.
>>
>>
>>  Another question I have is regarding IMMUTABLE_ROWS=true. It’s
>> suggested to use this for append-only table with no updates. What about
>> deletes? Can I use IMMUTABLE_ROWS=true if I delete records from table?
>>
>>
>>
>>  On 06 Jul 2015, at 20:32, Vladimir Rodionov <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>  1. Unless you do query by Anumber prefix (country code + operator id) -
>> reverse it : random 6 + operator id + country code. In this case you will
>> not need salting row.
>> 2. Presplit  table. Make sure you won't need to split table during normal
>> operation.
>> 3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by
>> IMEI then run query by Anumber. This index is going to be much smaller.
>>
>>  Phoenix supports any table level configuration options, so you can
>> specify TTL in your DDL statement
>>
>>  As for capacity planning, you can read:
>>
>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html
>>
>>  -Vlad
>>
>>  As for capacity planning, please read HBase book
>>
>>
>> On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>>  Hi fellow Phoenix users!
>>>
>>>  We are considering using HBase and Phoenix for CDR data retention.
>>> Number of records is around 50 million per day and we should keep them for
>>> about one year. I have played around a bit but would like to hear second
>>> opinion from people who have more experience so I have few questions:
>>>
>>>
>>>    1. Based on your experience can anyone recommend me approx number of
>>>    nodes in cluster and hardware configuration of one node (RAM).
>>>    2. Regarding row key I was thinking of Anumber + timestamp + Bnumber
>>>    + jobId + recordIndex. Any other ideas? Do I need to use salting or no?
>>>    Let’s assume aNumber in most cases start with first 5 digits the same
>>>    (country + operator code), followed by 6 random digits for user number.
>>>    3. Searches are typically done by Anumber and timestamp but also
>>>    some other criterias may apply, like IMEI or IMSI number. Do you suggest 
>>> to
>>>    have secondary indexes for that? I read that if using secondary index all
>>>    columns in select statement should be included in index as well. Keeping 
>>> in
>>>    mind I’m returning almost all columns does this mean almost double of 
>>> data
>>>    for each index? Any other suggestions how to handle this?
>>>    4. For time stamp, do you suggest using LONG and storing epoch time
>>>    or stick with DATE format?
>>>    5. Another request is that after some time we need to be able to
>>>    efficiently delete all CDRs that are older than let’s say 1 year. Is 
>>> design
>>>    of row key still good for that as only argument here will be timestamp? 
>>> Is
>>>    it possible to use TTL with Phoenix?
>>>
>>>
>>>  Any other suggestions and advices how to design system are more than
>>> welcomed.
>>>
>>>  Thanks, Matjaz
>>>
>>
>>
>>
>

Re: HBase + Phoenix for CDR

Reply via email to