Hi, Let's imagine this scenario.
I want to store IPs with counters. And I want to have counters by groups of IPs. All of that will be calculated with MR jobs and stored in HBase. Let's take some IPs and make sure they are ordered by adding some "0" when required. 037.113.031.119 058.022.018.176 058.022.159.151 109.169.201.076 109.169.201.150 109.254.019.140 122.031.039.016 122.224.005.210 178.137.167.041 I want to have counters for all "levels" of those IPs. Which mean for those groups. Group 1: 037 058 109 122 178 Group 2: 037.113 058.022 109.169 109.254 122.031 122.224 178.167 Group 3: 037.113.031 058.022.018 058.022.159 109.169.201 109.254.019 122.031.039 122.224.005 178.137.167 And group 4 is the complete IPs list. Each time I see an IP, I will increment the required values into the 4 groups. What's the bests way to store that knowing that I want to be able to easily list all the entries (ranged based) from one group. Option 1 is to have one table per group. 1CF, 1C Pros: Very easy to access, retrieve, etc. Cons: Will generate 4 tables Option 2 is to have one table, but 1 CF per group. Pros: Only one table, easy access. Cons: Heard that we should try to keep CFs under 3. Might have bad performances impacts. Option 3 is to have one table, one CF and one C per group. Pros: Only one table, only one CF. Cons: Access is less easy than option 1 and 2. I think Option 2 is the worst one. Option 1 is very easy to implement. And for option 3, I don't see any benefit compared to option 1. So I'm tempted to go with option 1, but I don't like the idea of multiplying the table. Does anyone have any comment on which options might be the best one, or even proposed another option? JM
