> > Hmm, can you create a ticket with a simple way to reproduce that? We > should be giving back an InvalidRequestException for > multiple-mutations-on-same-key instead of erroring out later and > causing timeouts. > Humm... this is actually quite confusing. When I look at the error, I don't see the same super column although it is the same columns. It does look like it's the same key. I thought that was possible. Isn't this really https://issues.apache.org/jira/browse/CASSANDRA-2949except I get a sligthly different log message ? Sylvain, is this really the same ? Any idea of when 8.1.3 will be voted on ?
ERROR [ReplicateOnWriteStage:409] 2011-08-08 16:49:11,182 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:409,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException: ColumnFamily ColumnFamily(PUBLIC_MONTHLY_17 [SuperColumn(ghcisco [00000000010000:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, 8}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, 2}]@1312814951133!-9223372036854775808,00000000010001:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, -776}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, -194}]@1312814951133!-9223372036854775808,]),]) already has modifications in this mutation: ColumnFamily(PUBLIC_MONTHLY_17 [SuperColumn(gdwls [00000000010000:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, 8}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, 2}]@1312814951133!-9223372036854775808,00000000010001:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, -756}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, -189}]@1312814951133!-9223372036854775808,]),]) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IllegalArgumentException: ColumnFamily ColumnFamily(PUBLIC_MONTHLY_17 [SuperColumn(ghcisco [00000000010000:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, 8}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, 2}]@1312814951133!-9223372036854775808,00000000010001:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, -776}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, -194}]@1312814951133!-9223372036854775808,]),]) already has modifications in this mutation: ColumnFamily(PUBLIC_MONTHLY_17 [SuperColumn(gdwls [00000000010000:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, 8}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, 2}]@1312814951133!-9223372036854775808,00000000010001:false:[{223a4de0-b5fb-11e0-0000-826f85850cbd, 4, -756}*,{224ceb80-b5fb-11e0-0000-848783ceb9bf, 1, -189}]@1312814951133!-9223372036854775808,]),]) at org.apache.cassandra.db.RowMutation.add(RowMutation.java:123) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:120) at org.apache.cassandra.service.StorageProxy$5$1.runMayThrow(StorageProxy.java:455) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more > > On Mon, Aug 8, 2011 at 12:34 AM, Philippe <watche...@gmail.com> wrote: > > Hi Boris, > > Thanks for the suggestion, I didn't know there was one. > > I believe have finally figured it out and it turns out my last two > questions > > are related. > > First, my batch loading was ignoring a bunch of rows when reading the > first > > file (so it took hundreds of potential mutations for the problem to show > up) > > and secondly, the ReplicateOnWriteStage error was generated by the batch > > mutations themselves and explained the TimedOutException : I was doing > > multiple mutations on the same key in one batch > > > > > > 2011/8/8 Boris Yen <yulin...@gmail.com> > >> > >> Maybe you could try to adjust the setting "cassandraThriftSocketTimeout" > >> of hector. https://github.com/rantav/hector/wiki/User-Guide > >> > >> On Mon, Aug 8, 2011 at 6:54 AM, Philippe <watche...@gmail.com> wrote: > >>> > >>> Quick followup. > >>> I have pushed the RPC timeout to 30s. Using Hector, I'm doing 1 thread > >>> doing batches of 10 mutates at a time so that's even slower than when I > was > >>> doing 16 threads in parallel doing non-batched mutations. > >>> After a couple hundred execute() calls, I get a timeout for every node; > I > >>> have a 15 second grace period between retries. tpstats indicate no > pendings > >>> on any of the nodes. I never recover from that > >>> I then set the batch size to one and it seems to work a lot better. The > >>> only difference I note is that the Mutator.execute() method returns a > result > >>> than sometimes has a null host and 0 microsecond time in the batch > sizes of > >>> ten but never in batch sizes of 1. > >>> > >>> I'm stumped ! Any ideas ? > >>> Thanks > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >