Re: Secondary indexes suggestions

Michael Segel Tue, 14 Aug 2012 04:56:30 -0700

Ah... schema design...

Yes you have both options identified... but just to add a twist... in the 
column name, prepend the  (epoch - timestamp) to the message id. This will put 
the messages in reverse order. 
The only drawback to this is that its theoretically possible to create a row 
which exceeds your region's size....
 
You could also do this if you use a composite key. (Hash the user_id  and then 
(epoch - timestamp) and then the message_id.

You are correct that you have to scan many rows. However by using a start 
scanner that has the user_id as the start key and then end key as the user_id + 
the first character after the separator key. 

The only reason I would say to hash the key is so that you get a more even 
distribution of data across the cluster, but that's not really that important.

On Aug 14, 2012, at 6:44 AM, Lukáš Drbal <[email protected]> wrote:

> Hi,
> 
> thanks a lot for all response.
> 
> Otis: filter from your link are great, i'll check it in my tests.
> 
> Michael: i understand what is secondary indexes, but still don't have
> idea about effective rowkey format. I'm ok with delay in creating
> secondary index and atomicity, we don't need "realitime" data.
> 
> 
> When i have 10 messages with ids 1, 8, 10, 255, ... from one user with
> id 88. I see here only 2 options for rowkey in sec. index:
> 
> 1) composite rowkey like <userId><SEPARATOR><messageId>
> 2) use userId as rowkey and put messageId into cells
> Exists any other?
> 
> When i use first method, i must scan over many rows. What about
> startRow for scanner? Can be this scan effective?
> 
> Second method need many many cells and i don't need all in one time,
> so this is imho bad idea.
> 
> 
> -- 
> Save The World - http://www.worldcommunitygrid.org/
> http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
> 
> Lukas Drbal
>

Re: Secondary indexes suggestions

Reply via email to