Phoenix grammar contains examples of usage. For example, create table: https://phoenix.apache.org/language/index.html#create_table
You can not specify TTL per record. I suggest you using 1 year for whole table and additional logic inside your application to filter expired rows out. When you set IMMUTABLE_ROWS=true no updates and deletes are allowed. Your only option in this case - rely on TTL , or drop table entirely. Optimal number of splits depends on a size of a cluster, max region size and projected data store size - you will need to do some math here. -Vlad On Tue, Jul 7, 2015 at 1:57 AM, Matjaž Trtnik <[email protected]> wrote: > Vlad and Eli, thanks for your answer and comments. > > 1. Normally I do query by whole Anumber, meaning country code + operator > id + user number but as you suggested I could just reverse everything and > it should work well if I’ll reverse number entered by user. > > 2. What’s the suggested number of > Regarding syntax for table splitting I haven’t found any example in > Phoenix but only for HBase. > > create ‘mytable, ‘mycolumnfamlity’, {SPLITS=› > > ['10000000000000000000000000000000', > '20000000000000000000000000000000', > '30000000000000000000000000000000', > '40000000000000000000000000000000', > '50000000000000000000000000000000', > '60000000000000000000000000000000']} > Do I have to use full row key when defining split? > For example in my case where first 6 bytes represent reversed user number > followed by oeprator and country code followed by other parts of row key - > timestamp, job id and record number: > > > 0000000000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000010000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000020000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000030000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000040000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000050000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000060000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000070000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000080000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > 0000090000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > > > 3. I was thinking about this solution too but problem is that let’s say > for same IMEI there can be multiple Anumbers. Imagine someone using same > phone (IMEI) but changing sim card (IMSI/Anumber). What I was trying > yesterday was to enforce using of index and it worked pretty well but I > read this should be only used if result set is rather small. I think in our > case result set is typically few hundred records, maximum could be few > thousand records but that would happen rarely. Is it advisable to use index > enforcing for such case? > > 4. For TTL is there a way to set TTL to record level? Because CDR has to > expire in 1 year since it was created and not inserted into table. And some > CDRs like roaming are coming later so for example it can happen that you > get today CDR which is already 1 month old and it should expire in 11 > months and not 12. I haven’t found any examples of setting TTL in Phoenix. > > > Another question I have is regarding IMMUTABLE_ROWS=true. It’s suggested > to use this for append-only table with no updates. What about deletes? Can > I use IMMUTABLE_ROWS=true if I delete records from table? > > > > On 06 Jul 2015, at 20:32, Vladimir Rodionov <[email protected]> > wrote: > > 1. Unless you do query by Anumber prefix (country code + operator id) - > reverse it : random 6 + operator id + country code. In this case you will > not need salting row. > 2. Presplit table. Make sure you won't need to split table during normal > operation. > 3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by > IMEI then run query by Anumber. This index is going to be much smaller. > > Phoenix supports any table level configuration options, so you can > specify TTL in your DDL statement > > As for capacity planning, you can read: > > http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html > > -Vlad > > As for capacity planning, please read HBase book > > > On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik <[email protected]> wrote: > >> Hi fellow Phoenix users! >> >> We are considering using HBase and Phoenix for CDR data retention. >> Number of records is around 50 million per day and we should keep them for >> about one year. I have played around a bit but would like to hear second >> opinion from people who have more experience so I have few questions: >> >> >> 1. Based on your experience can anyone recommend me approx number of >> nodes in cluster and hardware configuration of one node (RAM). >> 2. Regarding row key I was thinking of Anumber + timestamp + Bnumber >> + jobId + recordIndex. Any other ideas? Do I need to use salting or no? >> Let’s assume aNumber in most cases start with first 5 digits the same >> (country + operator code), followed by 6 random digits for user number. >> 3. Searches are typically done by Anumber and timestamp but also some >> other criterias may apply, like IMEI or IMSI number. Do you suggest to >> have >> secondary indexes for that? I read that if using secondary index all >> columns in select statement should be included in index as well. Keeping >> in >> mind I’m returning almost all columns does this mean almost double of data >> for each index? Any other suggestions how to handle this? >> 4. For time stamp, do you suggest using LONG and storing epoch time >> or stick with DATE format? >> 5. Another request is that after some time we need to be able to >> efficiently delete all CDRs that are older than let’s say 1 year. Is >> design >> of row key still good for that as only argument here will be timestamp? Is >> it possible to use TTL with Phoenix? >> >> >> Any other suggestions and advices how to design system are more than >> welcomed. >> >> Thanks, Matjaz >> > > >
