Yes ill be storing at multiple levels of aggregation.
On Wed, Apr 30, 2014 at 9:21 AM, Rendon, Carlos (KBB) <[email protected]> wrote: >> Ok didnt know if the sheer number of gets would be a limiting factor. Thanks > > Yes retrieving and summing thousands of rows is much slower and requires more > network, memory, cpu, than doing that for a hundred or <10. > Perhaps day-level, week-level, or month-level granularity would be a better > fit for a 6 month aggregation? > You did say you were going to store data at multiple levels of time > aggregation right? > > > -----Original Message----- > From: Software Dev [mailto:[email protected]] > Sent: Tuesday, April 29, 2014 8:05 PM > To: [email protected] > Subject: Re: Help with row and column design > > Ok didnt know if the sheer number of gets would be a limiting factor. Thanks > > On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu <[email protected]> wrote: >> As I said this afternoon: >> See the following API in HTable for batching Get's : >> >> public Result[] get(List<Get> gets) throws IOException { >> >> Cheers >> >> >> On Tue, Apr 29, 2014 at 7:45 PM, Software Dev >> <[email protected]>wrote: >> >>> Nothing against your code. I just meant that if we are doing a scan >>> say for hourly metrics across a 6 month period we are talking about >>> 4K+ gets. Is that something that can easily be handled? >>> >>> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) >>> <[email protected]> >>> wrote: >>> >> Gets a bit hairy when doing say a shitload of gets thought.. no? >>> > >>> > If you by "hairy" you mean the code is ugly, it was written for >>> > maximal >>> clarity. >>> > I think you'll find a few sensible loops makes it fairly clean. >>> > Otherwise I'm not sure what you mean. >>> > >>> > -----Original Message----- >>> > From: Software Dev [mailto:[email protected]] >>> > Sent: Tuesday, April 29, 2014 5:02 PM >>> > To: [email protected] >>> > Subject: Re: Help with row and column design >>> > >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have >>> >> to >>> pre-store every level of aggregation you care about. >>> > >>> > Ok I think this makes sense. Gets a bit hairy when doing say a >>> > shitload >>> of gets thought.. no? >>> > >>> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) >>> > <[email protected]> >>> wrote: >>> >> You don't do a scan, you do a series of gets, which I believe you >>> >> can >>> batch into one call. >>> >> >>> >> last 5 days query in pseudocode >>> >> res1 = Get( hash("2014-04-29") + "2014-04-29") >>> >> res2 = Get( hash("2014-04-28") + "2014-04-28") >>> >> res3 = Get( hash("2014-04-27") + "2014-04-27") >>> >> res4 = Get( hash("2014-04-26") + "2014-04-26") >>> >> res5 = Get( hash("2014-04-25") + "2014-04-25") >>> >> >>> >> For each result you look for the particular column or columns you >>> >> are interested in Total_usa = res1.get("c:usa") + >>> >> res2.get("c:usa") + >>> res3.get("c:usa") + ... >>> >> Total_female_usa = res1.get("c:usa:sex:f") + ... >>> >> >>> >> "What happens when we add more fields? Do we just keep adding in >>> >> more >>> column qualifiers? If so, how would we filter across columns to get >>> an aggregate total?" >>> >> >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have >>> >> to >>> pre-store every level of aggregation you care about. >>> >> >>> >> -----Original Message----- >>> >> From: Software Dev [mailto:[email protected]] >>> >> Sent: Tuesday, April 29, 2014 4:36 PM >>> >> To: [email protected] >>> >> Subject: Re: Help with row and column design >>> >> >>> >>> The downside is it still has a hotspot when inserting, but when >>> >>> reading a range of time it does not >>> >> >>> >> How can you do a scan query between dates when you hash the date? >>> >> >>> >>> Column qualifiers are just the collection of items you are >>> >>> aggregating on. Values are increments. In your case qualifiers >>> >>> might look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m, >>> >>> c:italy:sex:f, c:italy, >>> >> >>> >> What happens when we add more fields? Do we just keep adding in >>> >> more >>> column qualifiers? If so, how would we filter across columns to get >>> an aggregate total? >>>
