Yes I will put something in front of the date.
If the date comes in milliseconds in can be millions of rows., even with a
combined key, but I will only need this data to maybe hour map reduce jobs.
My focus here is if I gain anything put the timestamp in the columns instead
than the row , because I will have less rows bua a lot more columns with
timestamps.
Thanks,
Miguel
From: Ted Dunning [mailto:[email protected]]
Sent: terça-feira, 5 de Abril de 2011 17:02
To: [email protected]
Cc: Miguel Costa
Subject: Re: Use Timestamp
Using timestamp as key will cause your scan to largely hit one region. That
may not be so good.
If you add something in front of the date, you may be able to spread your
scan over several machines.
On the other hand, your aggregation might be very small. In that case, the
convenience of a time key might be enough to sufficient to make you prefer
that implementation.
How much data are you talking about aggregating each time you aggregate?
On Tue, Apr 5, 2011 at 2:16 AM, Miguel Costa <[email protected]>
wrote:
I want to have my data aggregated by day, so I would like to know wich is
the best option to query my data. To put The timestamp of the data on my
rowkey or to use timestamp of columns?
smime.p7s
Description: S/MIME cryptographic signature
