As I said this afternoon:
See the following API in HTable for batching Get's :
public Result[] get(List<Get> gets) throws IOException {
Cheers
On Tue, Apr 29, 2014 at 7:45 PM, Software Dev <[email protected]>wrote:
> Nothing against your code. I just meant that if we are doing a scan
> say for hourly metrics across a 6 month period we are talking about
> 4K+ gets. Is that something that can easily be handled?
>
> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) <[email protected]>
> wrote:
> >> Gets a bit hairy when doing say a shitload of gets thought.. no?
> >
> > If you by "hairy" you mean the code is ugly, it was written for maximal
> clarity.
> > I think you'll find a few sensible loops makes it fairly clean.
> > Otherwise I'm not sure what you mean.
> >
> > -----Original Message-----
> > From: Software Dev [mailto:[email protected]]
> > Sent: Tuesday, April 29, 2014 5:02 PM
> > To: [email protected]
> > Subject: Re: Help with row and column design
> >
> >> Yes. See total_usa vs. total_female_usa above. Basically you have to
> pre-store every level of aggregation you care about.
> >
> > Ok I think this makes sense. Gets a bit hairy when doing say a shitload
> of gets thought.. no?
> >
> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) <[email protected]>
> wrote:
> >> You don't do a scan, you do a series of gets, which I believe you can
> batch into one call.
> >>
> >> last 5 days query in pseudocode
> >> res1 = Get( hash("2014-04-29") + "2014-04-29")
> >> res2 = Get( hash("2014-04-28") + "2014-04-28")
> >> res3 = Get( hash("2014-04-27") + "2014-04-27")
> >> res4 = Get( hash("2014-04-26") + "2014-04-26")
> >> res5 = Get( hash("2014-04-25") + "2014-04-25")
> >>
> >> For each result you look for the particular column or columns you are
> >> interested in Total_usa = res1.get("c:usa") + res2.get("c:usa") +
> res3.get("c:usa") + ...
> >> Total_female_usa = res1.get("c:usa:sex:f") + ...
> >>
> >> "What happens when we add more fields? Do we just keep adding in more
> column qualifiers? If so, how would we filter across columns to get an
> aggregate total?"
> >>
> >> Yes. See total_usa vs. total_female_usa above. Basically you have to
> pre-store every level of aggregation you care about.
> >>
> >> -----Original Message-----
> >> From: Software Dev [mailto:[email protected]]
> >> Sent: Tuesday, April 29, 2014 4:36 PM
> >> To: [email protected]
> >> Subject: Re: Help with row and column design
> >>
> >>> The downside is it still has a hotspot when inserting, but when
> >>> reading a range of time it does not
> >>
> >> How can you do a scan query between dates when you hash the date?
> >>
> >>> Column qualifiers are just the collection of items you are
> >>> aggregating on. Values are increments. In your case qualifiers might
> >>> look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m,
> >>> c:italy:sex:f, c:italy,
> >>
> >> What happens when we add more fields? Do we just keep adding in more
> column qualifiers? If so, how would we filter across columns to get an
> aggregate total?
>