Hello,

I'm running a 2.0.17 cluster (I know, I know, need to upgrade) with 46 nodes across 3 racks (& RF=3). I'm seeing that under high contention, LWT may actually not guarantee uniqueness. With a total of 16 million LWT transactions (with peak LWT concurrency around 5k/sec), I found 38 conflicts that should have been impossible. I was wondering if there were any known issues that make LWT broken for this old version of cassandra.

I use LWT to guarantee that a 128 bit number (hash) maps to a unique 64 bit number (id). There could be a large number of threads trying to allocate an id for a given hash.

I do the following logic (slightly more complicated than this due to timeout handling)

 1  existing_id = SELECT id FROM hash_id WHERE hash=computed_hash *| consistency = ONE*
 2  if existing_id != null:
 3    return existing_id
 4  new_id = generateUniqueId()
 5  result=INSERT INTO hash_id (id) VALUES(new_id) WHERE hash=computed_hash IF NOT EXIST | *consistency = QUORUM, serialConsistency = SERIAL*
 6  if result == [applied] // ie we won LWT
 7    return new_id
 8  else// we lost LWT, fetch the winning value
 9    existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | consistency = ONE
10    return existing_id

Is there anything flawed about this ?
I do the read at line #1 and #9 at a consistency of ONE. Would that cause uncommitted changes to be seen (ie, dirty reads) ? Should it be a SERIAL consistency instead ? My understanding is that only one transaction will be able to apply the write (at quorum), so doing a read at consistency of one will either result in a null, or I would get the id that won the LWT race.

Any help is appreciated. I've been banging my head on this issue (thinking it was a bug in the code) for some time now.

--
Mahdi.

Reply via email to