Re: Failed to solve Digest mismatch
Is this Cassandra 1.1.1? How often do you observe this? How many columns are in the row? Can you reproduce when querying by column name, or only when slicing the row? On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote: Hi First I delete one column, then I delete one row. Then try to read all columns from the same row, all operations from same client app. The consistency level is read/write quorum. Check the Cassandra log, the local node don't perform the delete operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1) After delete, I try to read all columns from the row, I found the node found Digest mismatch due to Quorum consistency configuration, but the result is not correct. From the log, I can see the delete mutation already accepted by 192.168.0.6, 192.168.0.1, but when 192.168.0.5 read response from 0.6 and 0.1, and then it merge the data, but finally 0.5 shows the result which is the dirty data. Following logs shows the change of column 737461747573 , 192.168.0.5 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it has the data. log: 192.168.0.5 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653) Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc', key=7878323239537570657254616e67307878, columnParent='QueryPath(columnFamilyName='queue', superColumnName='null', columnName='null')', columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674) reading data from /192.168.0.6 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694) reading digest from /192.168.0.1 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6556@/192.168.0.6 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6557@/192.168.0.1 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed digest response DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(100572974179274741747356988451225858264, 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs d41d8cd98f00b204e9800998ecf8427e) DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6558@/192.168.0.6 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6559@/192.168.0.1 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 3 of 2147483647: 6c6f67526f6f744964:false:7@1340870382109015 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 4 of 2147483647: 6d6f54797065:false:6@1340870382109009 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 5 of 2147483647: 706172746974696f6e:false:2@1340870382109001 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 6 of 2147483647: 7265636569766554696d65:false:13@1340870382109003 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 7 of 2147483647: 72657175657374:false:300@1340870382109013 DEBUG [RequestResponseStage:5] 2012-06-28 15:59:42,202 ResponseVerbHandler.java (line 44) Processing response on a callback from
Re: Ball is rolling on High Performance Cassandra Cookbook second edition
On Wed, Jun 27, 2012 at 5:11 PM, Aaron Turner synfina...@gmail.com wrote: Honestly, I think using the same terms as a RDBMS does makes users think they're exactly the same thing and have the same properties... which is close enough in some cases, but dangerous in others. The point is that thinking in terms of the storage engine is difficult and unnecessary. You can represent that data relationally, which is the Right Thing to do both because people are familiar with that world and because it decouples model from representation, which lets us change the latter if necessary. http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: items removed from 1.1.0 cfstats output
They were removed because in 1.1 caches are global and not per-cf: http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 On Fri, Jun 29, 2012 at 5:45 AM, Bill b...@dehora.net wrote: Were Key cache capacity: Key cache size: Key cache hit rate: Row cache: removed from cfstats in 1.1.0? I can see them in 1.0.8 but not 1.1.0. If so, was wondering why, as they're fairly useful :) Bill -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: upgrade issue
More generally, don't just throw your old config file at a new version of Cassandra; start with the new version's config, then apply any customizations that are still relevant. On Fri, Jun 29, 2012 at 8:40 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: commitlog_rotation_threshold_in_mb was removed in 1.0.0-beta1 (CASSANDRA-2771). Adeel Akbar adeel.ak...@panasiangroup.com a écrit sur 29/06/2012 15:24:18 : Thanks for the help. Now I am facing another issue; INFO 09:23:45,111 Logging initialized INFO 09:23:45,119 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_24 INFO 09:23:45,119 Heap size: 511705088/511705088 INFO 09:23:45,120 Classpath: /opt/apache-cassandra-1.0.10/bin/.. /conf:/opt/apache-cassandra-1.0.10/bin/../build/classes/main: /opt/apache-cassandra-1.0.10/bin/../build/classes/thrift: /opt/apache-cassandra-1.0.10/bin/../lib/antlr-3.2.jar:/opt/apache- cassandra-1.0.10/bin/../lib/apache-cassandra-1.0.10.jar:/opt/apache- cassandra-1.0.10/bin/../lib/apache-cassandra-clientutil-1.0.10.jar: /opt/apache-cassandra-1.0.10/bin/../lib/apache-cassandra-thrift-1.0. 10.jar:/opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-fixes.jar: /opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-sources-fixes. jar:/opt/apache-cassandra-1.0.10/bin/../lib/commons-cli-1.1.jar: /opt/apache-cassandra-1.0.10/bin/../lib/commons-codec-1.2.jar: /opt/apache-cassandra-1.0.10/bin/../lib/commons-lang-2.4.jar: /opt/apache-cassandra-1.0.10/bin/../lib/compress-lzf-0.8.4.jar: /opt/apache-cassandra-1.0.10/bin/../lib/concurrentlinkedhashmap- lru-1.2.jar:/opt/apache-cassandra-1.0.10/bin/../lib/guava-r08.jar: /opt/apache-cassandra-1.0.10/bin/../lib/high-scale-lib-1.1.2.j ar: /opt/apache-cassandra-1.0.10/bin/../lib/jackson-core-asl-1.4.0.jar: /opt/apache-cassandra-1.0.10/bin/../lib/jackson-mapper-asl-1.4.0. jar:/opt/apache-cassandra-1.0.10/bin/../lib/jamm-0.2.5.jar: /opt/apache-cassandra-1.0.10/bin/../lib/jline-0.9.94.jar: /opt/apache-cassandra-1.0.10/bin/../lib/json-simple-1.1.jar: /opt/apache-cassandra-1.0.10/bin/../lib/libthrift-0.6.jar: /opt/apache-cassandra-1.0.10/bin/../lib/log4j-1.2.16.jar: /opt/apache-cassandra-1.0.10/bin/../lib/servlet-api-2.5-20081211. jar:/opt/apache-cassandra-1.0.10/bin/../lib/slf4j-api-1.6.1.jar: /opt/apache-cassandra-1.0.10/bin/../lib/slf4j-log4j12-1.6.1.jar: /opt/apache-cassandra-1.0.10/bin/../lib/snakeyaml-1.6.jar: /opt/apache-cassandra-1.0.10/bin/../lib/snappy-java-1.0.4.1.jar INFO 09:23:45,122 JNA not found. Native methods will be disabled. INFO 09:23:45,131 Loading settings from file:/opt/apache- cassandra-1.0.10/conf/cassandra.yaml ERROR 09:23:45,303 Fatal configuration error error Can't construct a java object for tag:yaml.org,2002:org.apache. cassandra.config.Config; exception=Cannot create property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache. cassandra.config.Config@4dd36dfe; Unable to find property 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra. config.Config in reader, line 10, column 1: cluster_name: 'Test Cluster' ^ at org.yaml.snakeyaml.constructor. Constructor$ConstructYamlObject.construct(Constructor.java:372) at org.yaml.snakeyaml.constructor.BaseConstructor. constructObject(BaseConstructor.java:177) at org.yaml.snakeyaml.constructor.BaseConstructor. constructDocument(BaseConstructor.java:136) at org.yaml.snakeyaml.constructor.BaseConstructor. getSingleData(BaseConstructor.java:122) at org.yaml.snakeyaml.Loader.load(Loader.java:52) at org.yaml.snakeyaml.Yaml.load(Yaml.java:166) at org.apache.cassandra.config.DatabaseDescriptor. clinit(DatabaseDescriptor.java:131) at org.apache.cassandra.service.AbstractCassandraDaemon. setup(AbstractCassandraDaemon.java:131) at org.apache.cassandra.service.AbstractCassandraDaemon. activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon. main(CassandraDaemon.java:107) Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache. cassandra.config.Config@4dd36dfe; Unable to find property 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra. config.Config at org.yaml.snakeyaml.constructor. Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:305) at org.yaml.snakeyaml.constructor. Constructor$ConstructMapping.construct(Constructor.java:184) at org.yaml.snakeyaml.constructor. Constructor$ConstructYamlObject.construct(Constructor.java:370) ... 9 more Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find property 'commitlog_rotation_threshold_in_mb' on class: org.apache. cassandra.config.Config at org.yaml.snakeyaml.constructor. Constructor$ConstructMapping.getProperty(Constructor.java:342) at org.yaml.snakeyaml.constructor.
Re: Question on pending tasks in compaction manager
Pending compactions is just an estimate of how many compactions does Cassandra think it will take to get to fully-compacted state; there are no actual tasks enqueued anywhere. You could enable debug logging on org.apache.cassandra.db.compaction, and force a compaction with nodetool to see why no compactions happen when the estimate says there is still work to do. On Fri, Jun 29, 2012 at 4:27 AM, Martin McGovern martin.mcgov...@gmail.com wrote: Hi All, Could someone explain why the compaction manager stops compacting when it has a number of pending tasks? I have a test cluster that I am using to stress test IO throughput, i.e. find out what a safe load for our hardware is. Over a 16 hour period my node cluster completes approximately 49,000 tasks per node. After stopping my test compaction continues for a few minutes then stops. There are ~7,000 tasks still pending. No more tasks will be executed until I start another test and the 7000 pending will never be executed. I'm using leveled compaction with 5MB SS tables and my tests have a 50:50 read:write ratio. Each value is a 10K byte array with random content. Thanks, Martin -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: jscv CPU Consumption
Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs to me. On Fri, Jun 29, 2012 at 1:45 AM, Olivier Mallassi omalla...@octo.com wrote: Hi all We have a 12 servers clusters (8 cores by machines..). OS is Ubuntu 10.04.2. On one of the machine (only one) and without any load (no inserts, no reads), we have a huge CPU Load whereas there is no activities (no compaction in progress etc...) A top on the machine show us the process jscv is using all the available CPUs. Is that link to JNA? do you have any ideas? Cheers -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Memtable tuning in 1.0 and higher
On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd jwijg...@gmail.com wrote: the currentThoughput is increased even before the data is merged into the memtable so it is actually measuring the throughput afaik. You're right. I've attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: High CPU usage as of 8pm eastern time
Thank you for the mail. Same here, but I restarted the affected server before I noticed your mail. It affected both OpenJDK Java 6 (packaged with Ubuntu 10.04) and Oracle Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit machine. Likely it is related to the leap second introduced today. On 2012.07.01. 5:11, Mina Naguib wrote: Hi folks Our cassandra (and other java-based apps) started experiencing extremely high CPU usage as of 8pm eastern time (midnight UTC). The issue appears to be related to specific versions of java + linux + ntpd There are many solutions floating around on IRC, twitter, stackexchange, LKML. The simplest one that worked for us is simply to run this command on each affected machine: date; date `date +%m%d%H%M%C%y.%S`; date; CPU drop was instantaneous - there was no need to restart the server, ntpd, or any of the affected JVMs.
Re: Memtable tuning in 1.0 and higher
Hi Jonathan, Looks good, any chance of porting this fix to the 1.0 branch? Kind regards Joost Sent from my iPhone On 1 jul. 2012, at 09:25, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd jwijg...@gmail.com wrote: the currentThoughput is increased even before the data is merged into the memtable so it is actually measuring the throughput afaik. You're right. I've attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: High CPU usage as of 8pm eastern time
More information for others that were affected. Our installation of java: [root@inv4 conf]# java -version java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) [root@inv4 conf]# uname -a Linux inv4 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue Feb 14 04:00:16 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux Jonathan pointed out a Linux bug that may be related: https://issues.apache.org/jira/browse/CASSANDRA-4066 In my case only the Java process went nuts, as seems to be the case in many other reports: https://bugzilla.mozilla.org/show_bug.cgi?id=769972 http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/ I hope everyone got enough sleep! - David On Sun, Jul 1, 2012 at 4:49 AM, Hontvári József Levente hontv...@flyordie.com wrote: Thank you for the mail. Same here, but I restarted the affected server before I noticed your mail. It affected both OpenJDK Java 6 (packaged with Ubuntu 10.04) and Oracle Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit machine. Likely it is related to the leap second introduced today. On 2012.07.01. 5:11, Mina Naguib wrote: Hi folks Our cassandra (and other java-based apps) started experiencing extremely high CPU usage as of 8pm eastern time (midnight UTC). The issue appears to be related to specific versions of java + linux + ntpd There are many solutions floating around on IRC, twitter, stackexchange, LKML. The simplest one that worked for us is simply to run this command on each affected machine: date; date `date +%m%d%H%M%C%y.%S`; date; CPU drop was instantaneous - there was no need to restart the server, ntpd, or any of the affected JVMs.
SnappyCompressor and Cassandra 1.1.1
I'm running Cassandra on Raspberry Pi (for educational reason) and have been successfully running 1.1.0 for some time. However there is no native build of SnappyCompressor for the platform (I'm currently working n rectifying that if I can) so that compression is unavailable. When I try and start 1.1.1 on the platform I'm getting the following error which looks to me like 1.1.1 is trying to load snappy compressor at startup and falls over when to can't find it. Thats not been the case with 1.1.0: INFO 14:22:07,600 Global memtable threshold is enabled at 35MB java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45) at org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55) at org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37) at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76) at org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79) at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1681) at java.lang.Runtime.loadLibrary0(Runtime.java:840) at java.lang.System.loadLibrary(System.java:1047) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 17 more ERROR 14:22:09,934 Exception encountered during startup org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45) at org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55) at org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37) at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76) at org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79) at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45) at org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55) at org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37) at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76) at org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79) at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106) Exception encountered during startup: [FAILED_TO_LOAD_NATIVE_LIBRARY] null Andy The University of Dundee is a Scottish Registered Charity, No. SC015096.
Bootstrap code path
Could someone please tell me where I should start looking at code to understand how cassandra bootstrap process works? I am sure it is complicated but I have time. Also is my understanding correct that the new nodes that are added are not joining the ring till the bootstrap process is complete i.e do not receive any read or write requests from outside?
Re: Failed to solve Digest mismatch
For the create/update/deleteColumn/deleteRow test case, for Quorum consistency level, 6 nodes, replicate factor 3, for one thread around 1/100 round, I can have this reproduced. And if I have 20 client threads to run the test client, the ratio is bigger. And the test group will be executed by one thread, and the client time stamp is unique and sequenced, guaranteed by Hector. And client only access the data from local Cassandra. And the query only use the row key which is unique. The column name is not unique, in my case, eg, status. And the row have around 7 columns, which are all not big, eg status:true, userName:Jason ... BRs //Ares 2012/7/1 Jonathan Ellis jbel...@gmail.com Is this Cassandra 1.1.1? How often do you observe this? How many columns are in the row? Can you reproduce when querying by column name, or only when slicing the row? On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote: Hi First I delete one column, then I delete one row. Then try to read all columns from the same row, all operations from same client app. The consistency level is read/write quorum. Check the Cassandra log, the local node don't perform the delete operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1) After delete, I try to read all columns from the row, I found the node found Digest mismatch due to Quorum consistency configuration, but the result is not correct. From the log, I can see the delete mutation already accepted by 192.168.0.6, 192.168.0.1, but when 192.168.0.5 read response from 0.6 and 0.1, and then it merge the data, but finally 0.5 shows the result which is the dirty data. Following logs shows the change of column 737461747573 , 192.168.0.5 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it has the data. log: 192.168.0.5 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653) Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc', key=7878323239537570657254616e67307878, columnParent='QueryPath(columnFamilyName='queue', superColumnName='null', columnName='null')', columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674) reading data from /192.168.0.6 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694) reading digest from /192.168.0.1 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6556@/192.168.0.6 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6557@/192.168.0.1 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed digest response DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(100572974179274741747356988451225858264, 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs d41d8cd98f00b204e9800998ecf8427e) DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6558@/192.168.0.6 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6559@/192.168.0.1 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 3 of 2147483647:
cassandra halt after started minutes later
I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the java binary just dont run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay. after that node A stop working... same problem, I install sun jdk, then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms. DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms. DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 this problem is really driving me crazy since I just dont know what happened, and how to debug it, I tried to kill node A and restart it, then node B halt, after I restart B, then node C goes down.. one thing may related is that the log time on node B is not the same with the system time(A and C are okay). while date on node B shows: Sun Jul 1 23:10:57 CST 2012 (system time) but you may noticed that the time is 2012-07-01 07:45:XX in those above log message. the system time is right, just not sure why cassandra's log file shows the wrong time, I didn't recall cassandra have timezone settings.
Re: cassandra halt after started minutes later
adjust the timezone of java by -Duser.timezone and the timezone of cassandra is the same with system(Debian 6.0). after restart cassandra I found the following error message in the log file of node B. after about 2 minutes later, node C stop responding the error log of node B: Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) the log info in node C: DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark', key='3937343836623538363837363135353264313339333463343532623634373131656462306139', modifications=[ColumnFamily(permacache [76616c7565:false:67906@1341156582948365,])]) applied. Sending response to 79529@/192.168.1.129 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line 523) insert DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', key='636f6d6d656e74735f706172656e74735f32373232343938', modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843 ,])])]/QUORUM DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.40 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.129 DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 ResponseVerbHandler.java (line 44) Processing response on a callback from 50050@/192.168.1.129 DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 ResponseVerbHandler.java (line 44) Processing response on a callback from 50051@/192.168.1.40 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote: I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the java binary just dont run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay. after that node A stop working... same problem, I install sun jdk, then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms. DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms. DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 this problem is really driving me crazy since I just dont know what happened, and how to debug it, I tried to kill node A and restart it, then node B halt, after I restart B, then node C goes down.. one thing may related is that the log time on node B is not the same with the system time(A and C are okay). while date on node B shows: Sun Jul 1 23:10:57 CST 2012 (system time) but you may noticed that the time is 2012-07-01 07:45:XX in those above log message. the system time is right, just not sure why cassandra's log
Re: cassandra halt after started minutes later
This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject High CPU usage as of 8pm eastern time If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. -- David Daeschler On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu springri...@gmail.com wrote: adjust the timezone of java by -Duser.timezone and the timezone of cassandra is the same with system(Debian 6.0). after restart cassandra I found the following error message in the log file of node B. after about 2 minutes later, node C stop responding the error log of node B: Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) the log info in node C: DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark', key='3937343836623538363837363135353264313339333463343532623634373131656462306139', modifications=[ColumnFamily(permacache [76616c7565:false:67906@1341156582948365,])]) applied. Sending response to 79529@/192.168.1.129 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line 523) insert DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', key='636f6d6d656e74735f706172656e74735f32373232343938', modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843,])])]/QUORUM DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.40 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.129 DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 ResponseVerbHandler.java (line 44) Processing response on a callback from 50050@/192.168.1.129 DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 ResponseVerbHandler.java (line 44) Processing response on a callback from 50051@/192.168.1.40 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote: I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the java binary just dont run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay. after that node A stop working... same problem, I install sun jdk, then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms. DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms. DEBUG [Thread-8] 2012-07-01 07:45:42,831
Re: cassandra halt after started minutes later
huge great thanks it is the leap second problem! finally I can go to bed On Mon, Jul 2, 2012 at 12:11 AM, David Daeschler david.daesch...@gmail.comwrote: This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject High CPU usage as of 8pm eastern time If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. -- David Daeschler On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu springri...@gmail.com wrote: adjust the timezone of java by -Duser.timezone and the timezone of cassandra is the same with system(Debian 6.0). after restart cassandra I found the following error message in the log file of node B. after about 2 minutes later, node C stop responding the error log of node B: Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) the log info in node C: DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark', key='3937343836623538363837363135353264313339333463343532623634373131656462306139', modifications=[ColumnFamily(permacache [76616c7565:false:67906@1341156582948365,])]) applied. Sending response to 79529@/192.168.1.129 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line 523) insert DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', key='636f6d6d656e74735f706172656e74735f32373232343938', modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843,])])]/QUORUM DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.40 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.129 DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 ResponseVerbHandler.java (line 44) Processing response on a callback from 50050@/192.168.1.129 DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 ResponseVerbHandler.java (line 44) Processing response on a callback from 50051@/192.168.1.40 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote: I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the java binary just dont run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay. after that node A stop working... same problem, I install sun jdk, then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830
Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
Hello We was under ddos attack, and as result we got high ksoftirqd activity - as result cassandra begin answer very slow. But when ddos was gone high ksoftirqd activity still exists, and dissaper when i stop cassandra daemon, and repeat again when i start cassadra daemon, the fully resolution of problem is full reboot of server. What this can be (why ksoftirqd begin work very intensive when cassandra runing - we disable all working traffic to cluster but this doesn't help so this is can't be due heavy load )? And how to solve this? PS: OS ubuntu 10.0.4 (2.6.32.41) cassandra 1.0.10 java 1.6.32 (from oracle)
Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
Hello, it is not related to cassandra/ddos. it is kernel problems due to leap second. See http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello We was under ddos attack, and as result we got high ksoftirqd activity - as result cassandra begin answer very slow. But when ddos was gone high ksoftirqd activity still exists, and dissaper when i stop cassandra daemon, and repeat again when i start cassadra daemon, the fully resolution of problem is full reboot of server. What this can be (why ksoftirqd begin work very intensive when cassandra runing - we disable all working traffic to cluster but this doesn't help so this is can't be due heavy load )? And how to solve this? PS: OS ubuntu 10.0.4 (2.6.32.41) cassandra 1.0.10 java 1.6.32 (from oracle)
Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
Good afternoon, This again looks like it could be the leap second issue: This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject High CPU usage as of 8pm eastern time If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. Please spread the word if you see others having issues with unresponsive kernels/high CPU. -- David Daeschler On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello We was under ddos attack, and as result we got high ksoftirqd activity - as result cassandra begin answer very slow. But when ddos was gone high ksoftirqd activity still exists, and dissaper when i stop cassandra daemon, and repeat again when i start cassadra daemon, the fully resolution of problem is full reboot of server. What this can be (why ksoftirqd begin work very intensive when cassandra runing - we disable all working traffic to cluster but this doesn't help so this is can't be due heavy load )? And how to solve this? PS: OS ubuntu 10.0.4 (2.6.32.41) cassandra 1.0.10 java 1.6.32 (from oracle)
Re: Cassandra consistency issue on cluster system
If you are reading at QUOURM there is no problem, this is how eventual consistency works in Cassandra. The coordinator will resolve the differences between and the column with the higher timestamp will win. If the delete was applied to less then CL nodes the client should have received a TimedOutException. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/06/2012, at 7:41 PM, 黄荣桢 wrote: Background: My application is running on a cluster system(which have 4 nodes), and system time of these four nodes are synchronizing by NTP. I use Write.QUORUM and Read.QUORUM strategy. The probability of this problem is not very high. Cassandra version is 1.0.3, I have tried Cassandra 1.1.1, this problem is still exist. Problem: I deleted a column, but after 6 seconds, Cassandra can still get the old record which isMarkedForDelete is still false. Is anybody meet the same problem? And how to solve it? Detail: See the log below: Node 3(Local node): [pool-2-thread-42] 2012-06-27 14:49:23,732 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: SuperColumn(667072 [..7fff01382ca96c8b636b698a:false:36@1340779097312016,..) [pool-2-thread-44] 2012-06-27 14:51:21,367 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])]/QUORUM -- I delete this record at 14:51:21,367 [pool-2-thread-37] 2012-06-27 14:51:27,400 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: SuperColumn(667072 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..) -- But I can still get the old record at 14:51:27,400 Node2: [MutationStage:118] 2012-06-27 14:51:21,373 RowMutationVerbHandler.java (line 48) Applying RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])]) [MutationStage:118] 2012-06-27 14:51:21,374 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])]) applied. Sending response to 6692098@/192.168.0.3 [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 48) Applying RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..]) [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,...]),])]) applied. Sending response to 6698516@/192.168.0.3 Node1: [MutationStage:98] 2012-06-27 14:51:24,661 RowMutationVerbHandler.java (line 48) Applying RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])]) [MutationStage:98] 2012-06-27 14:51:24,675 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a: true :4@1340779881338000,]),])]) applied. Sending response to 6692099@/192.168.0.3 [MutationStage:93] 2012-06-27 14:51:40,932 RowMutationVerbHandler.java (line 48) Applying RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a:true:4@1340779900915004,]),])]) DEBUG [MutationStage:93] 2012-06-27 14:51:40,933 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='3332', modifications=[ColumnFamily(fpr_index [SuperColumn(667072 [7fff01382ca96c8b636b698a: true :4@1340779900915004,]),])]) applied. Sending response to 6706555@/192.168.0.3 [ReadStage:55] 2012-06-27 14:51:43,074 SliceQueryFilter.java (line 123) collecting 0 of 5000:7fff01382ca96c8b636b698a:true:4@1340779900915004 Node 4: There is no log about this record on Node 4.
Re: No indexed columns present in by-columns clause with equals operator
Like the exception says: Bad Request: No indexed columns present in by-columns clause with equals operator Same with other relational operators(,=,=) You must include an equality operator in the where clause: That is why SELECT * FROM STEST WHERE VALUE1 = 10; Works but SELECT * FROM STEST WHERE VALUE1 10; does not. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/06/2012, at 8:55 PM, Abhijit Chanda wrote: Hi All, I have got a strange exception while using cassandra cql. Relational operators like (, , =, =) are not working. my columnfamily looks like this. CREATE COLUMNFAMILY STEST ( ROW_KEY text PRIMARY KEY, VALUE1 text, VALUE2 text ) WITH comment='' AND comparator=text AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True; CREATE INDEX VALUE1_IDX ON STEST (VALUE1); CREATE INDEX VALUE2_IDX ON STEST (VALUE2); Now in this columnfamily if i query this SELECT * FROM STEST WHERE VALUE1 = 10; it returns - ROW_KEY | VALUE1 | VALUE2 -+-+ 2 | 10 | AB But if i query like this SELECT * FROM STEST WHERE VALUE1 10; It is showing this exception Bad Request: No indexed columns present in by-columns clause with equals operator Same with other relational operators(,=,=) these are the datas available in my columnfamily ROW_KEY | VALUE1 | VALUE2 +--+ 3 | 100 |ABC 5 |9 | ABCDE 2 | 10 | AB 1 |1 | A 4 | 19 | ABCD Looks like some configuration problem. Please help me. Thanks in Advance Regards, -- Abhijit Chanda Analyst VeHere Interactive Pvt. Ltd. +91-974395
Re: BulkLoading SSTables and compression
When the data is streamed into the cluster by the bulk loader it is compressed on the receiving end (if the target CF has compression enabled). If you are able to reproduce this can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/06/2012, at 10:00 PM, Andy Cobley wrote: My (limited) experience of moving form 0.8 to 1.0 is that you do have to use rebuildsstables. I'm guessing BlukLoading is bypassing the compression ? Andy On 28 Jun 2012, at 10:53, jmodha wrote: Hi, We are migrating our Cassandra cluster from v1.0.3 to v1.1.1, the data is migrated using SSTableLoader to an empty Cassandra cluster. The data in the source cluster (v1.0.3) is uncompressed and the target cluster (1.1.1) has the column family created with compression turned on. What we are seeing is that once the data has been loaded into the target cluster, the size is similar to the data in the source cluster. Our expectation is that since we have turned on compression in the target cluster, the amount of data would be reduced. We have tried running the rebuildsstables nodetool command on a node after data has been loaded and we do indeed see a huge reduction in size e.g. from 30GB to 10GB for a given column family. We were hoping to see this at the point of loading the data in via the SSTableLoader. Is this behaviour expected? Do we need to run the rebuildsstables command on all nodes to actually compress the data after it has been streamed in? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. The University of Dundee is a Scottish Registered Charity, No. SC015096.
Re: Amazingly bad compaction performance
Can compression be changed or disabled on-the-fly with cassandra? Yes. Disable it in the schema and then run nodetool upgradetables As Tyler said, JDK7 is not officially supported yet and you may be running into issues others have not found. Any chance you could downgrade one node to JDK6 and check the performance ? If it looks like a JDK issue could you post your findings to https://issues.apache.org/jira/browse/CASSANDRA and include the schema details ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/06/2012, at 2:36 AM, Dustin Wenz wrote: My maximum and initial heap sizes are set to 6GB. Actual memory usage for the VM is around 11-12GB. The machine has 24GB of physical memory, so there isn't any paging going in. I don't see any GC events logged that are longer than a few hundred milliseconds. Is it possible that GC is taking significant time without it being reported? - .Dustin On Jun 27, 2012, at 1:31 AM, Igor wrote: Hello Too much GC? Check JVM heap settings and real usage. On 06/27/2012 01:37 AM, Dustin Wenz wrote: We occasionally see fairly poor compaction performance on random nodes in our 7-node cluster, and I have no idea why. This is one example from the log: [CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java (line 221) Compacted to [/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,]. 26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 0.006250MB/s. Time: 4,071,163ms. That particular event took over an hour to compact only 25 megabytes. During that time, there was very little disk IO, and the java process (OpenJDK 7) was pegged at 200% CPU. The node was also completely unresponsive to network requests until the compaction was finished. Most compactions run just over 7MB/s. This is an extreme outlier, but users definitely notice the hit when it occurs. I grabbed a sample of the process using jstack, and this was the only thread in CompactionExecutor: CompactionExecutor:54 daemon prio=1 tid=41247522816 nid=0x99a5ff740 runnable [140737253617664] java.lang.Thread.State: RUNNABLE at org.xerial.snappy.SnappyNative.rawCompress(Native Method) at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358) at org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80) at org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89) at org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196) at org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260) at org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128) at org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked 36527862064 (a java.io.DataOutputStream) at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Is it possible that there is an issue with snappy compression? Based on the lousy compression ratio, I think we could get by without it just fine. Can compression be changed or disabled on-the-fly with cassandra? - .Dustin
Re: hector timeouts
Using Cassandra as a queue is generally thought of as a bas idea, owing to the high delete workload. Levelled compaction handles it better but it is still no the best approach. Depending on your needs consider running http://incubator.apache.org/kafka/ could you share some details on this? we're using hector and we see random timeout warns in the logs and not sure how to address them. First determine if they are server side or client side timeouts. Then determine what the query was. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/06/2012, at 7:02 AM, Deno Vichas wrote: On 6/28/2012 9:37 AM, David Leimbach wrote: That coupled with Hector timeout issues became a real problem for us. could you share some details on this? we're using hector and we see random timeout warns in the logs and not sure how to address them. thanks, deno
Re: BulkLoading SSTables and compression
Sure, before I create a ticket, is there a way I can confirm that the sstables are indeed not compressed other than running the rebuildsstables nodetool command (and observing the live size go down)? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849p7580922.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high
2012/7/1 David Daeschler david.daesch...@gmail.com: Good afternoon, This again looks like it could be the leap second issue: This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject High CPU usage as of 8pm eastern time If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. Please spread the word if you see others having issues with unresponsive kernels/high CPU. Hello, this realy helps. In our case two problems cross each other-(( and we doesn't have assumed that might be a kernel problem. On one data cluster we simply reboot it, and in seccond apply date solution and everything is fine, thanks
Re: No indexed columns present in by-columns clause with equals operator
Hey Aaron, I am able to sort out the problem. Thanks anyways. Regards, Abhijit