bq. Using timestamp in row-keys is discouraged The above is true. Prefixing row key with timestamp would create hot region.
bq. should I filter by a simpler row-key plus a filter on timestamp? You can do the above. On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier <[email protected]>wrote: > Hi to everybody, > > in my use case I have to perform batch analysis skipping old data. > For example, I want to process all rows created after a certain timestamp, > passed as parameter. > > What is the most effective way to do this? > Should I design my row-key to embed timestamp? > Or just filtering by timestamp of the row is fast as well? Or what else? > > Initially I was thinking to compose my key as: > timestamp|source|title|type > > but: > > 1) Using timestamp in row-keys is discouraged > 2) If this design is ok, using this approach I still have problems > filtering by timestamp because I cannot found a way to numerically filer > (instead of alphanumerically/by string). Example: > 1372776400441|something has timestamp lesser > than 1372778470913|somethingelse but I cannot filter all row whose key is > "numerically" greater than 1372776400441. Is it possible to overcome this > issue? > 3) If this design is not ok, should I filter by a simpler row-key plus a > filter on timestamp? Or what else? > > Best, > Flavio >
