Re: Update vs Delete/Insert

aaron morton Wed, 16 Jun 2010 05:06:35 -0700

It may make sense to use a secondary index for the counts. You could store the 
counts in both places and use a batch mutation to update them. It does not give 
you a transaction guarantee, but it will mean you still make one request to 
Cassandra.


e.g.

<lhid> {
        <rhid1> : <count>
        <rhid2> : <count>
}

The secondary index can be in the same CF, with a tweak to the key.

<lhis.count_index> {
        <count> : <rhid1>, 
        <count> : <rhid2> 
}

Are the counts going to be unique? If not you may want to store the secondary 
index in a super CF, were the super colum name is the count and the columns in 
that are the id's that have that count. 

Aaron

On 16 Jun 2010, at 22:00, Dr. Martin Grabmüller wrote:

> Hi Colin, 
> 
>> From: Colin Vipurs [mailto:zodiac...@gmail.com] 
> [...]
>> I've got some data that I'm doing counts on, stored in a CF as:
>> 
>> <lhid> {
>>    <rhid1> : <count>
>>    <rhid2> : <count>
>>    ....
>> }
> [...]
>> <lhid> {
>>   <count-rhid1> : PLACEHOLDER
>>   <count-rhid2> : PLACEHOLDER
>> }
>> 
>> would be a better way of storing the data? Does anyone know the
>> relative performance differences between doing the insert in the first
>> instance and a delete/insert in the second?
> 
> I can't say anything about perfomance differences, but I think it will
> not matter, as you are about to insert the same amount of data.
> 
> Just keep the following in mind:
> 
> - With the second scheme, it is more difficult to delete individual columns,
>  because you have to know the count and the name to construct the column
>  name.  You can iterate over the columns to find the names, of course, but
>  this may or may not work for you.
> 
>  Maybe you want to store the rhids instead of the placeholders to solve
>  that problem.
> 
> - You will need to left-pad the counts with zeros so that lexicographical
>  ordering works.
> 
> - (may be irrelevant, but anyway) there is a limit on column names which
>  AFAIK is lower than the limit on column values.
> 
> Cheers,
>  Martin

Re: Update vs Delete/Insert

Reply via email to