Ah... schema design... Yes you have both options identified... but just to add a twist... in the column name, prepend the (epoch - timestamp) to the message id. This will put the messages in reverse order. The only drawback to this is that its theoretically possible to create a row which exceeds your region's size.... You could also do this if you use a composite key. (Hash the user_id and then (epoch - timestamp) and then the message_id.
You are correct that you have to scan many rows. However by using a start scanner that has the user_id as the start key and then end key as the user_id + the first character after the separator key. The only reason I would say to hash the key is so that you get a more even distribution of data across the cluster, but that's not really that important. On Aug 14, 2012, at 6:44 AM, Lukáš Drbal <[email protected]> wrote: > Hi, > > thanks a lot for all response. > > Otis: filter from your link are great, i'll check it in my tests. > > Michael: i understand what is secondary indexes, but still don't have > idea about effective rowkey format. I'm ok with delay in creating > secondary index and atomicity, we don't need "realitime" data. > > > When i have 10 messages with ids 1, 8, 10, 255, ... from one user with > id 88. I see here only 2 options for rowkey in sec. index: > > 1) composite rowkey like <userId><SEPARATOR><messageId> > 2) use userId as rowkey and put messageId into cells > Exists any other? > > When i use first method, i must scan over many rows. What about > startRow for scanner? Can be this scan effective? > > Second method need many many cells and i don't need all in one time, > so this is imho bad idea. > > > -- > Save The World - http://www.worldcommunitygrid.org/ > http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR > > Lukas Drbal >
