Actually, we support deletes on tables with IMMUTABLE_ROWS=true ( I believe as of 4.2 release), as long as you're not filtering on a column not contained in the index.
On Tuesday, July 7, 2015, Vladimir Rodionov <[email protected]> wrote: > Phoenix grammar contains examples of usage. For example, create table: > https://phoenix.apache.org/language/index.html#create_table > > You can not specify TTL per record. I suggest you using 1 year for whole > table and additional logic inside your application to filter expired rows > out. > > When you set IMMUTABLE_ROWS=true no updates and deletes are allowed. Your > only option in this case - rely on TTL , or drop table entirely. > > > Optimal number of splits depends on a size of a cluster, max region size > and projected data store size - you will need to do some math here. > > -Vlad > > On Tue, Jul 7, 2015 at 1:57 AM, Matjaž Trtnik <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Vlad and Eli, thanks for your answer and comments. >> >> 1. Normally I do query by whole Anumber, meaning country code + operator >> id + user number but as you suggested I could just reverse everything and >> it should work well if I’ll reverse number entered by user. >> >> 2. What’s the suggested number of >> Regarding syntax for table splitting I haven’t found any example in >> Phoenix but only for HBase. >> >> create ‘mytable, ‘mycolumnfamlity’, {SPLITS=› >> >> ['10000000000000000000000000000000', >> '20000000000000000000000000000000', >> '30000000000000000000000000000000', >> '40000000000000000000000000000000', >> '50000000000000000000000000000000', >> '60000000000000000000000000000000']} >> Do I have to use full row key when defining split? >> For example in my case where first 6 bytes represent reversed user number >> followed by oeprator and country code followed by other parts of row key - >> timestamp, job id and record number: >> >> >> 0000000000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000010000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000020000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000030000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000040000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000050000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000060000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000070000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000080000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> 0000090000000\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 >> >> >> 3. I was thinking about this solution too but problem is that let’s say >> for same IMEI there can be multiple Anumbers. Imagine someone using same >> phone (IMEI) but changing sim card (IMSI/Anumber). What I was trying >> yesterday was to enforce using of index and it worked pretty well but I >> read this should be only used if result set is rather small. I think in our >> case result set is typically few hundred records, maximum could be few >> thousand records but that would happen rarely. Is it advisable to use index >> enforcing for such case? >> >> 4. For TTL is there a way to set TTL to record level? Because CDR has >> to expire in 1 year since it was created and not inserted into table. And >> some CDRs like roaming are coming later so for example it can happen that >> you get today CDR which is already 1 month old and it should expire in 11 >> months and not 12. I haven’t found any examples of setting TTL in Phoenix. >> >> >> Another question I have is regarding IMMUTABLE_ROWS=true. It’s >> suggested to use this for append-only table with no updates. What about >> deletes? Can I use IMMUTABLE_ROWS=true if I delete records from table? >> >> >> >> On 06 Jul 2015, at 20:32, Vladimir Rodionov <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> 1. Unless you do query by Anumber prefix (country code + operator id) - >> reverse it : random 6 + operator id + country code. In this case you will >> not need salting row. >> 2. Presplit table. Make sure you won't need to split table during normal >> operation. >> 3. Keep index between Bnumber (IMEI, IMSI?) and Anumber. Get Anumber by >> IMEI then run query by Anumber. This index is going to be much smaller. >> >> Phoenix supports any table level configuration options, so you can >> specify TTL in your DDL statement >> >> As for capacity planning, you can read: >> >> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_system-admin-guide/content/ch_hbase_cluster_capacity_region_sizing.html >> >> -Vlad >> >> As for capacity planning, please read HBase book >> >> >> On Mon, Jul 6, 2015 at 8:52 AM, Matjaž Trtnik <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Hi fellow Phoenix users! >>> >>> We are considering using HBase and Phoenix for CDR data retention. >>> Number of records is around 50 million per day and we should keep them for >>> about one year. I have played around a bit but would like to hear second >>> opinion from people who have more experience so I have few questions: >>> >>> >>> 1. Based on your experience can anyone recommend me approx number of >>> nodes in cluster and hardware configuration of one node (RAM). >>> 2. Regarding row key I was thinking of Anumber + timestamp + Bnumber >>> + jobId + recordIndex. Any other ideas? Do I need to use salting or no? >>> Let’s assume aNumber in most cases start with first 5 digits the same >>> (country + operator code), followed by 6 random digits for user number. >>> 3. Searches are typically done by Anumber and timestamp but also >>> some other criterias may apply, like IMEI or IMSI number. Do you suggest >>> to >>> have secondary indexes for that? I read that if using secondary index all >>> columns in select statement should be included in index as well. Keeping >>> in >>> mind I’m returning almost all columns does this mean almost double of >>> data >>> for each index? Any other suggestions how to handle this? >>> 4. For time stamp, do you suggest using LONG and storing epoch time >>> or stick with DATE format? >>> 5. Another request is that after some time we need to be able to >>> efficiently delete all CDRs that are older than let’s say 1 year. Is >>> design >>> of row key still good for that as only argument here will be timestamp? >>> Is >>> it possible to use TTL with Phoenix? >>> >>> >>> Any other suggestions and advices how to design system are more than >>> welcomed. >>> >>> Thanks, Matjaz >>> >> >> >> >
