What I would like is to have a faster (direct?) access to the number of entries starting with "058".
For IPv4 it's 0 to 255, so working fine. For for IPv6, it can take a while to scan the full range and aggregate. JM 2013/1/27, lars hofhansl <[email protected]>: > I might be missing something. Why don't just have a counter per IP and then > aggregate at read time? > If you wanted the total of the 058 group you'd start a scanner with "058" as > start row and "058\0" as stop row. On the client you sum up the counter > values. > Similarly for the 109.169 group. Start with "109.169" and stop "109.169\0". > > -- Lars > > > > ________________________________ > From: Jean-Marc Spaggiari <[email protected]> > To: user <[email protected]> > Sent: Sunday, January 27, 2013 8:51 AM > Subject: Tables vs CFs vs Cs > > Hi, > > Let's imagine this scenario. > > I want to store IPs with counters. And I want to have counters by > groups of IPs. All of that will be calculated with MR jobs and stored > in HBase. > > Let's take some IPs and make sure they are ordered by adding some "0" > when required. > > 037.113.031.119 > 058.022.018.176 > 058.022.159.151 > 109.169.201.076 > 109.169.201.150 > 109.254.019.140 > 122.031.039.016 > 122.224.005.210 > 178.137.167.041 > > I want to have counters for all "levels" of those IPs. Which mean for > those groups. > > Group 1: > 037 > 058 > 109 > 122 > 178 > > Group 2: > > 037.113 > 058.022 > 109.169 > 109.254 > 122.031 > 122.224 > 178.167 > > Group 3: > > 037.113.031 > 058.022.018 > 058.022.159 > 109.169.201 > 109.254.019 > 122.031.039 > 122.224.005 > 178.137.167 > > And group 4 is the complete IPs list. > > Each time I see an IP, I will increment the required values into the 4 > groups. > > What's the bests way to store that knowing that I want to be able to > easily list all the entries (ranged based) from one group. > > Option 1 is to have one table per group. 1CF, 1C > Pros: Very easy to access, retrieve, etc. > Cons: Will generate 4 tables > > Option 2 is to have one table, but 1 CF per group. > Pros: Only one table, easy access. > Cons: Heard that we should try to keep CFs under 3. Might have bad > performances impacts. > > Option 3 is to have one table, one CF and one C per group. > Pros: Only one table, only one CF. > Cons: Access is less easy than option 1 and 2. > > I think Option 2 is the worst one. Option 1 is very easy to implement. > And for option 3, I don't see any benefit compared to option 1. > > So I'm tempted to go with option 1, but I don't like the idea of > multiplying the table. > > Does anyone have any comment on which options might be the best one, > or even proposed another option? > > JM
