----- Original Message -----
From: Doug Meil <[email protected]>
Sent: Thu Jul 14 2011 22:29:16 GMT+0200 (CET)
To:
CC:
Subject: Re: data structure


Hi there-

A few high-level suggestions...

re:  "to generate a report: for example we want to know how many
impressions were done by all users in last x days"

Can you create a summary table by day (via MR job), and then have your
ad-hoc report hit the summary table?

Re:  "and with the data growing, the time will increase"


Yes.  As you add more and more data processing times will slow.  That's
why you need to expect to periodically expand your cluster.



i guess a summary table will be it
the only disadvantage of such tables is, that its not that flexible
so ie if i store data for every hour (24 entries a day), i can run fast reports 
for special  time ranges, ie 12:00 to 15:00

but there is no way to generate a report for the time range 12:30 to 13:45

i guess we will live with that constraint

i thought, the hadoop+hbase+mapreduce is such a cool magic stuff, there is not 
need for summary tables, and data scans are running withins milliseconds... ;-)


Reply via email to