Update: I replace all quorum reads on that table with serial reads, and now these errors got less. Somehow quorum reads on CAS values cause most of these WTEs.
Also I found two tickets on that topic: https://issues.apache.org/jira/browse/CASSANDRA-9328 https://issues.apache.org/jira/browse/CASSANDRA-8672 On Thu, Dec 15, 2016 at 3:14 PM, horschi <hors...@gmail.com> wrote: > Hi, > > I would like to warm up this old thread. I did some debugging and found > out that the timeouts are coming from StorageProxy.proposePaxos() > - callback.isFullyRefused() returns false and therefore triggers a > WriteTimeout. > > Looking at my ccm cluster logs, I can see that two replica nodes return > different results in their ProposeVerbHandler. In my opinion the > coordinator should not throw a Exception in such a case, but instead retry > the operation. > > What do the CAS/Paxos experts on this list say to this? Feel free to > instruct me to do further tests/code changes. I'd be glad to help. > > Log: > > node1/logs/system.log:WARN [SharedPool-Worker-5] 2016-12-15 14:48:36,896 > PaxosState.java:124 - Rejecting proposal for > Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, > [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] > node1/logs/system.log- Row: id=@ | value=<tombstone>) because > inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock] > key=locktest_ 1 columns=[[] | [value]] > -- > node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980 > StorageProxy.java:506 - proposePaxos: > Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, > [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] > node1/logs/system.log- Row: id=@ | value=<tombstone>)//1//0 > -- > node2/logs/system.log:WARN [SharedPool-Worker-7] 2016-12-15 14:48:36,969 > PaxosState.java:117 - Accepting proposal: > Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, > [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] > node2/logs/system.log- Row: id=@ | value=<tombstone>) > -- > node3/logs/system.log:WARN [SharedPool-Worker-2] 2016-12-15 14:48:36,897 > PaxosState.java:124 - Rejecting proposal for > Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, > [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] > node3/logs/system.log- Row: id=@ | value=<tombstone>) because > inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock] > key=locktest_ 1 columns=[[] | [value]] > > > kind regards, > Christian > > > On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers <datag...@aol.com> wrote: > >> My thinking was that due to the size of the data that there maybe I/O >> issues. But it sounds more like you're competing for locks and hit a >> deadlock issue. >> >> Regards, >> Denise >> Cell - (860)989-3431 <(860)%20989-3431> >> >> Sent from mi iPhone >> >> On Apr 15, 2016, at 9:00 AM, horschi <hors...@gmail.com> wrote: >> >> Hi Denise, >> >> in my case its a small blob I am writing (should be around 100 bytes): >> >> CREATE TABLE "Lock" ( >> lockname varchar, >> id varchar, >> value blob, >> PRIMARY KEY (lockname, id) >> ) WITH COMPACT STORAGE >> AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor', >> 'chunk_length_kb' : '8' }; >> >> You ask because large values are known to cause issues? Anything special >> you have in mind? >> >> kind regards, >> Christian >> >> >> >> >> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers <datag...@aol.com> wrote: >> >>> Also, what type of data were you reading/writing? >>> >>> Regards, >>> Denise >>> >>> Sent from mi iPad >>> >>> On Apr 15, 2016, at 8:29 AM, horschi <hors...@gmail.com> wrote: >>> >>> Hi Jan, >>> >>> were you able to resolve your Problem? >>> >>> We are trying the same and also see a lot of WriteTimeouts: >>> WriteTimeoutException: Cassandra timeout during write query at >>> consistency SERIAL (2 replica were required but only 1 acknowledged the >>> write) >>> >>> How many clients were competing for a lock in your case? In our case its >>> only two :-( >>> >>> cheers, >>> Christian >>> >>> >>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli <rc...@eventbrite.com> >>> wrote: >>> >>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen < >>>> jan.algermis...@nordsc.com> wrote: >>>> >>>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) >>>>> for implementing distributed locks. >>>>> >>>> >>>> [ and I'm experiencing the problem described in the subject ... ] >>>> >>>> >>>>> Any idea how to approach this problem? >>>>> >>>> >>>> 1) Upgrade to 2.0.1 release. >>>> 2) Try to reproduce symptoms. >>>> 3) If able to, file a JIRA at https://issues.apache.org/jira >>>> /secure/Dashboard.jspa including repro steps >>>> 4) Reply to this thread with the JIRA ticket URL >>>> >>>> =Rob >>>> >>>> >>>> >>> >>> >> >