Re: Mutable primary key in a table

Eric Stevens Sun, 08 Feb 2015 07:18:34 -0800

It sounds like changing user names is the kind of thing which doesn't
happen often, in which case you probably don't have to worry too much about
the additional overhead of using logged batches (not like you're going to
be doing hundreds to thousands of these per second).  You probably also
want to look into conditional updates (search for Compare And Set - CAS) to
help avoid collisions when creating or renaming users.


Colin's suggestion of using a surrogate key for the primary key on the user
table is also a good idea, but you'll still want to use CAS to help
maintain the integrity of your data.  Note that CAS has a similar overhead
to logged batches in that it also involves a Paxos round.  So keep the
number of statements in either CAS or logged batches as minimal as possible.

On Sun, Feb 8, 2015 at 7:17 AM, Colin <co...@clark.ws> wrote:

> Another way to do this is to use a time based uuid for the primary key
> (partition key) and to store the user name with that uuid.
>
> In addition, you'll need 2 additonal tables, one that is used to get the
> uuid by user name and another to track user name changes over time which
> would be organized by uuid, and user name (cluster on the name).
>
> This pattern is referred to as an inverted index and provides a lot of
> power and flexibility once mastered.  I use it all the time with cassandra
> - in fact, to be successful with cassandra, it might actually be a
> requirement!
>
> --
> *Colin Clark*
> +1 612 859 6129
> Skype colin.p.clark
>
> On Feb 8, 2015, at 8:08 AM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
> What is your full primary key? Specifically, what is the partition key, as
> opposed to clustering columns?
>
> The point is that the partition key for a row is hashed to determine the
> token for the partition, which in turn determines which node of the cluster
> owns that partition. Changing the partition key means that potentially the
> partition would need to be "moved" to another node, which is clearly not
> something that Cassandra would do since the core design of Cassandra is
> that all operations should be blazingly fast and to refrain from offering
> slow features.
>
> I would recommend that your application:
>
> 1. Read the existing user data
> 2. Create a new user, using the existing user data.
> 3. Update the old user row to indicate that it is no longer a valid user.
> Actually, you will have to decide on an application policy for old user
> names. For example, can they be reused, or are they locked, or... whatever.
>
>
> -- Jack Krupansky
>
> On Sun, Feb 8, 2015 at 1:48 AM, Ajaya Agrawal <ajku....@gmail.com> wrote:
>
>>
>> On Sun, Feb 8, 2015 at 5:03 AM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> I'm struggling to think of a model where it makes sense to update a
>>> primary key as a typical operation.  It suggests, as Adil said, that you
>>> may be reasoning wrong about your data model.  Maybe you can explain your
>>> problem in more detail - what kind of thing has you updating your PK on a
>>> regular basis?
>>>
>>> I have a 'user' table which has a column called 'user_name' and other
>> columns like name, city etc. The application requires that user_name be
>> unique and user should be searchable by 'user_name'. The only way to do
>> this in C* would be to make user_name column primary key. Things get
>> trickier when there is a requirement which says that user_name can be
>> changed by the users of the application. This a distributed application
>> which mean that it runs on multiple nodes. If I have to change user_name
>> atomically then either I need to implement distributed locking or use
>> something C* provides.
>>
>>
>

Re: Mutable primary key in a table

Reply via email to