Hi Ted,

Tks for pointing that HBASE-3488 is about CellCounter.
This will bring better visibility on the stored cells.

Vishal initial question was about having numerous version for a same rowid/key.

I know datamodel design depends on usecase, but on a technical point-of-vue (read/write performance...) is there anything against having numerous (thousands, millions,...) versions of a same key ?

Tks,
- Eric


On 4/04/2011 18:38, Ted Yu wrote:
For 2, HBASE-3488  is for Cell Counter.
In Vishal's case, 3 years of data is stored for given row key. Issuing 'get'
command would not help much.

TIMERANGE support has been added in HBASE-3729

Cheers

On Sun, Apr 3, 2011 at 11:40 PM, Eric Charles<[email protected]>wrote:

1.- On my side, I could imagine to use the versions to store the history of
a key (without the need to add extra index table). Really depends on
requirement and datamodel, I think but many versions can sometimes make
sense.

2.- HBASE-3488 is related to the hadoop rowcounter job. To get versions by
code, you can use the setVersion/setMaxVersion/setTimeRange methods of the
Get and Scan objects. Via the shell, you can use  "get 't1', 'r1', {COLUMN
=>  'c1', TIMESTAMP =>  ts1, VERSIONS =>  4}" (not sure oif it's possible with
TIMERANGE vi the shell?)

Tks,
- Eric



On 3/04/2011 22:12, Ted Yu wrote:

For 1, please give some background to justify the high number of versions.

For 2, take a look at HBASE-3488

On Sun, Apr 3, 2011 at 12:49 PM, Vishal Kapoor
<[email protected]>wrote:

  two questions,

1) if I give number of versions for a family as 365*3 is it a bad
design? how many versions are a good practice? if I have two many
versions will that be a single seek when I get the row Id? if yes,
will it take longer to store data? pros and cons?

2) how do I get the number of versions actually stored in a cell ( not
the max versions it is configured to store)

thanks,
Vishal




Reply via email to