Re: Failed to solve Digest mismatch

2012-07-01 Thread Jonathan Ellis
Is this Cassandra 1.1.1?

How often do you observe this?  How many columns are in the row?  Can
you reproduce when querying by column name, or only when slicing the
row?

On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote:
 Hi

    First I delete one column, then I delete one row. Then try to read all
 columns from the same row, all operations from same client app.

    The consistency level is read/write quorum.

    Check the Cassandra log, the local node don't perform the delete
 operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)

    After delete, I try to read all columns from the row, I found the node
 found Digest mismatch due to Quorum consistency configuration, but the
 result is not correct.

    From the log, I can see the delete mutation already accepted
 by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
 and 0.1, and then it merge the data, but finally 0.5 shows the result which
 is the dirty data.

    Following logs shows the change of column 737461747573 , 192.168.0.5
 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it
 has the data.

 log:
 192.168.0.5
 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
 Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
 key=7878323239537570657254616e67307878,
 columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
 columnName='null')',
 columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
 Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
 reading data from /192.168.0.6
 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
 reading digest from /192.168.0.1
 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 6556@/192.168.0.6
 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
 AbstractRowResolver.java (line 66) Preprocessed data response
 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 6557@/192.168.0.1
 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
 AbstractRowResolver.java (line 66) Preprocessed digest response
 DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65)
 resolving 2 responses
 DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
 Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
 Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
 d41d8cd98f00b204e9800998ecf8427e)
 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 6558@/192.168.0.6
 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 6559@/192.168.0.1
 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
 AbstractRowResolver.java (line 66) Preprocessed data response
 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
 AbstractRowResolver.java (line 66) Preprocessed data response
 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63)
 resolving 2 responses
 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
 collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
 collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
 collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005
 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
 collecting 3 of 2147483647: 6c6f67526f6f744964:false:7@1340870382109015
 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
 collecting 4 of 2147483647: 6d6f54797065:false:6@1340870382109009
 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
 collecting 5 of 2147483647: 706172746974696f6e:false:2@1340870382109001
 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
 collecting 6 of 2147483647: 7265636569766554696d65:false:13@1340870382109003
 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
 collecting 7 of 2147483647: 72657175657374:false:300@1340870382109013
 DEBUG [RequestResponseStage:5] 2012-06-28 15:59:42,202
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 

Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-07-01 Thread Jonathan Ellis
On Wed, Jun 27, 2012 at 5:11 PM, Aaron Turner synfina...@gmail.com wrote:
 Honestly, I think using the same terms as a RDBMS does
 makes users think they're exactly the same thing and have the same
 properties... which is close enough in some cases, but dangerous in
 others.

The point is that thinking in terms of the storage engine is difficult
and unnecessary.  You can represent that data relationally, which is
the Right Thing to do both because people are familiar with that world
and because it decouples model from representation, which lets us
change the latter if necessary.

http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: items removed from 1.1.0 cfstats output

2012-07-01 Thread Jonathan Ellis
They were removed because in 1.1 caches are global and not per-cf:
http://www.datastax.com/dev/blog/caching-in-cassandra-1-1

On Fri, Jun 29, 2012 at 5:45 AM, Bill b...@dehora.net wrote:
 Were

 Key cache capacity:
 Key cache size:
 Key cache hit rate:
 Row cache:

 removed from cfstats in 1.1.0? I can see them in 1.0.8 but not 1.1.0. If so,
 was wondering why, as they're fairly useful :)

 Bill



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: upgrade issue

2012-07-01 Thread Jonathan Ellis
More generally, don't just throw your old config file at a new version
of Cassandra; start with the new version's config, then apply any
customizations that are still relevant.

On Fri, Jun 29, 2012 at 8:40 AM, Romain HARDOUIN
romain.hardo...@urssaf.fr wrote:

 commitlog_rotation_threshold_in_mb was removed in 1.0.0-beta1
 (CASSANDRA-2771).


 Adeel Akbar adeel.ak...@panasiangroup.com a écrit sur 29/06/2012 15:24:18
 :


 Thanks for the help. Now I am facing another issue;

 INFO 09:23:45,111 Logging initialized
  INFO 09:23:45,119 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_24
  INFO 09:23:45,119 Heap size: 511705088/511705088
  INFO 09:23:45,120 Classpath: /opt/apache-cassandra-1.0.10/bin/..
 /conf:/opt/apache-cassandra-1.0.10/bin/../build/classes/main:
 /opt/apache-cassandra-1.0.10/bin/../build/classes/thrift:
 /opt/apache-cassandra-1.0.10/bin/../lib/antlr-3.2.jar:/opt/apache-
 cassandra-1.0.10/bin/../lib/apache-cassandra-1.0.10.jar:/opt/apache-
 cassandra-1.0.10/bin/../lib/apache-cassandra-clientutil-1.0.10.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/apache-cassandra-thrift-1.0.
 10.jar:/opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-fixes.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-sources-fixes.
 jar:/opt/apache-cassandra-1.0.10/bin/../lib/commons-cli-1.1.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/commons-codec-1.2.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/commons-lang-2.4.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/compress-lzf-0.8.4.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/concurrentlinkedhashmap-
 lru-1.2.jar:/opt/apache-cassandra-1.0.10/bin/../lib/guava-r08.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/high-scale-lib-1.1.2.j ar:
 /opt/apache-cassandra-1.0.10/bin/../lib/jackson-core-asl-1.4.0.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/jackson-mapper-asl-1.4.0.
 jar:/opt/apache-cassandra-1.0.10/bin/../lib/jamm-0.2.5.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/jline-0.9.94.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/json-simple-1.1.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/libthrift-0.6.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/log4j-1.2.16.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/servlet-api-2.5-20081211.
 jar:/opt/apache-cassandra-1.0.10/bin/../lib/slf4j-api-1.6.1.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/slf4j-log4j12-1.6.1.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/snakeyaml-1.6.jar:
 /opt/apache-cassandra-1.0.10/bin/../lib/snappy-java-1.0.4.1.jar
  INFO 09:23:45,122 JNA not found. Native methods will be disabled.
  INFO 09:23:45,131 Loading settings from file:/opt/apache-
 cassandra-1.0.10/conf/cassandra.yaml
 ERROR 09:23:45,303 Fatal configuration error error
 Can't construct a java object for tag:yaml.org,2002:org.apache.
 cassandra.config.Config; exception=Cannot create
 property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache.
 cassandra.config.Config@4dd36dfe; Unable to find property
 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra.
 config.Config
  in reader, line 10, column 1:
     cluster_name: 'Test Cluster'
     ^

         at org.yaml.snakeyaml.constructor.
 Constructor$ConstructYamlObject.construct(Constructor.java:372)
         at org.yaml.snakeyaml.constructor.BaseConstructor.
 constructObject(BaseConstructor.java:177)
         at org.yaml.snakeyaml.constructor.BaseConstructor.
 constructDocument(BaseConstructor.java:136)
         at org.yaml.snakeyaml.constructor.BaseConstructor.
 getSingleData(BaseConstructor.java:122)
         at org.yaml.snakeyaml.Loader.load(Loader.java:52)
         at org.yaml.snakeyaml.Yaml.load(Yaml.java:166)
         at org.apache.cassandra.config.DatabaseDescriptor.
 clinit(DatabaseDescriptor.java:131)
         at org.apache.cassandra.service.AbstractCassandraDaemon.
 setup(AbstractCassandraDaemon.java:131)
         at org.apache.cassandra.service.AbstractCassandraDaemon.
 activate(AbstractCassandraDaemon.java:356)
         at org.apache.cassandra.thrift.CassandraDaemon.
 main(CassandraDaemon.java:107)
 Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create
 property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache.
 cassandra.config.Config@4dd36dfe; Unable to find property
 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra.
 config.Config
         at org.yaml.snakeyaml.constructor.

 Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:305)
         at org.yaml.snakeyaml.constructor.
 Constructor$ConstructMapping.construct(Constructor.java:184)
         at org.yaml.snakeyaml.constructor.

 Constructor$ConstructYamlObject.construct(Constructor.java:370)
         ... 9 more
 Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find
 property 'commitlog_rotation_threshold_in_mb' on class: org.apache.
 cassandra.config.Config
         at org.yaml.snakeyaml.constructor.
 Constructor$ConstructMapping.getProperty(Constructor.java:342)
         at org.yaml.snakeyaml.constructor.

 

Re: Question on pending tasks in compaction manager

2012-07-01 Thread Jonathan Ellis
Pending compactions is just an estimate of how many compactions
does Cassandra think it will take to get to fully-compacted state;
there are no actual tasks enqueued anywhere.

You could enable debug logging on org.apache.cassandra.db.compaction,
and force a compaction with nodetool to see why no compactions happen
when the estimate says there is still work to do.

On Fri, Jun 29, 2012 at 4:27 AM, Martin McGovern
martin.mcgov...@gmail.com wrote:
 Hi All,

 Could someone explain why the compaction manager stops compacting when it
 has a number of pending tasks?

 I have a test cluster that I am using to stress test IO throughput, i.e.
 find out what a safe load for our hardware is. Over a 16 hour period my node
 cluster completes approximately 49,000 tasks per node. After stopping my
 test compaction continues for a few minutes then stops. There are ~7,000
 tasks still pending. No more tasks will be executed until I start another
 test and the 7000 pending will never be executed.

 I'm using leveled compaction with 5MB SS tables and my tests have a 50:50
 read:write ratio. Each value is a 10K byte array with random content.

 Thanks,
 Martin



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: jscv CPU Consumption

2012-07-01 Thread Jonathan Ellis
Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs to me.

On Fri, Jun 29, 2012 at 1:45 AM, Olivier Mallassi omalla...@octo.com wrote:
 Hi all

 We have a 12 servers clusters (8 cores by machines..).
 OS is Ubuntu 10.04.2.

 On one of the machine (only one) and without any load (no inserts, no
 reads), we have a huge CPU Load whereas there is no activities (no
 compaction in progress etc...)
 A top on the machine show us the process jscv is using all the available
 CPUs.

 Is that link to JNA? do you have any ideas?

 Cheers

 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Memtable tuning in 1.0 and higher

2012-07-01 Thread Jonathan Ellis
On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
jwijg...@gmail.com wrote:
 the currentThoughput is increased even before the data is merged into the
 memtable so it is actually measuring the throughput afaik.

You're right.  I've attached a patch to
https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: High CPU usage as of 8pm eastern time

2012-07-01 Thread Hontvári József Levente
Thank you for the mail. Same here, but I restarted the affected server 
before I noticed your mail.


It affected both OpenJDK Java 6  (packaged with Ubuntu 10.04) and Oracle 
Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit 
machine.


Likely it is related to the leap second introduced today.

On 2012.07.01. 5:11, Mina Naguib wrote:

Hi folks

Our cassandra (and other java-based apps) started experiencing extremely high 
CPU usage as of 8pm eastern time (midnight UTC).

The issue appears to be related to specific versions of java + linux + ntpd

There are many solutions floating around on IRC, twitter, stackexchange, LKML.

The simplest one that worked for us is simply to run this command on each 
affected machine:

date; date `date +%m%d%H%M%C%y.%S`; date;

CPU drop was instantaneous - there was no need to restart the server, ntpd, or 
any of the affected JVMs.









Re: Memtable tuning in 1.0 and higher

2012-07-01 Thread Joost Van De Wijgerd
Hi Jonathan,

Looks good, any chance of porting this fix to the 1.0 branch?

Kind regards

Joost

Sent from my iPhone


On 1 jul. 2012, at 09:25, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
 jwijg...@gmail.com wrote:
 the currentThoughput is increased even before the data is merged into the
 memtable so it is actually measuring the throughput afaik.
 
 You're right.  I've attached a patch to
 https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


Re: High CPU usage as of 8pm eastern time

2012-07-01 Thread David Daeschler
More information for others that were affected.

Our installation of java:

[root@inv4 conf]# java -version
java version 1.6.0_30
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)

[root@inv4 conf]# uname -a
Linux inv4 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue Feb 14 04:00:16 GMT
2012 x86_64 x86_64 x86_64 GNU/Linux

Jonathan pointed out a Linux bug that may be related:
https://issues.apache.org/jira/browse/CASSANDRA-4066

In my case only the Java process went nuts, as seems to be the case in
many other reports:
https://bugzilla.mozilla.org/show_bug.cgi?id=769972
http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/

I hope everyone got enough sleep!
- David


On Sun, Jul 1, 2012 at 4:49 AM, Hontvári József Levente
hontv...@flyordie.com wrote:
 Thank you for the mail. Same here, but I restarted the affected server
 before I noticed your mail.

 It affected both OpenJDK Java 6  (packaged with Ubuntu 10.04) and Oracle
 Java 7 processes. Ubuntu 32 bit servers had no issues, only a 64 bit
 machine.

 Likely it is related to the leap second introduced today.


 On 2012.07.01. 5:11, Mina Naguib wrote:

 Hi folks

 Our cassandra (and other java-based apps) started experiencing extremely
 high CPU usage as of 8pm eastern time (midnight UTC).

 The issue appears to be related to specific versions of java + linux +
 ntpd

 There are many solutions floating around on IRC, twitter, stackexchange,
 LKML.

 The simplest one that worked for us is simply to run this command on each
 affected machine:

 date; date `date +%m%d%H%M%C%y.%S`; date;

 CPU drop was instantaneous - there was no need to restart the server,
 ntpd, or any of the affected JVMs.








SnappyCompressor and Cassandra 1.1.1

2012-07-01 Thread Andy Cobley
I'm running Cassandra on Raspberry Pi (for educational reason) and have been 
successfully running 1.1.0 for some time.  However there is no native build of 
SnappyCompressor for the platform (I'm currently working n rectifying that if I 
can) so that compression is unavailable.  When I try and start 1.1.1 on the 
platform I'm getting the following error which looks to me like 1.1.1 is trying 
to load snappy compressor at startup and falls over when to can't find it.  
Thats not been the case with 1.1.0:

INFO 14:22:07,600 Global memtable threshold is enabled at 35MB
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.clinit(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1681)
at java.lang.Runtime.loadLibrary0(Runtime.java:840)
at java.lang.System.loadLibrary(System.java:1047)
at 
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 17 more
ERROR 14:22:09,934 Exception encountered during startup
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.clinit(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.clinit(Snappy.java:44)
at 
org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
at 
org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
at 
org.apache.cassandra.io.compress.SnappyCompressor.clinit(SnappyCompressor.java:37)
at org.apache.cassandra.config.CFMetaData.clinit(CFMetaData.java:76)
at 
org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:79)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:439)
at 
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:118)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:126)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Exception encountered during startup: [FAILED_TO_LOAD_NATIVE_LIBRARY] null

Andy


The University of Dundee is a Scottish Registered Charity, No. SC015096.




Bootstrap code path

2012-07-01 Thread Bill Hastings
Could someone please tell me where I should start looking at code to
understand how cassandra bootstrap process works? I am sure it is
complicated but I have time. Also is my understanding correct that the
new nodes that are added are not joining the ring till the bootstrap
process is complete i.e do not receive any read or write requests from
outside?


Re: Failed to solve Digest mismatch

2012-07-01 Thread Jason Tang
For the create/update/deleteColumn/deleteRow test case, for Quorum
consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
round, I can have this reproduced.

And if I have 20 client threads to run the test client, the ratio is bigger.

And the test group will be executed by one thread, and the client time
stamp is unique and sequenced, guaranteed by Hector.

And client only access the data from local Cassandra.

And the query only use the row key which is unique. The column name is not
unique, in my case, eg, status.

And the row have around 7 columns, which are all not big, eg status:true,
userName:Jason ...

BRs
//Ares

2012/7/1 Jonathan Ellis jbel...@gmail.com

 Is this Cassandra 1.1.1?

 How often do you observe this?  How many columns are in the row?  Can
 you reproduce when querying by column name, or only when slicing the
 row?

 On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote:
  Hi
 
 First I delete one column, then I delete one row. Then try to read all
  columns from the same row, all operations from same client app.
 
 The consistency level is read/write quorum.
 
 Check the Cassandra log, the local node don't perform the delete
  operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)
 
 After delete, I try to read all columns from the row, I found the node
  found Digest mismatch due to Quorum consistency configuration, but the
  result is not correct.
 
 From the log, I can see the delete mutation already accepted
  by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
  and 0.1, and then it merge the data, but finally 0.5 shows the result
 which
  is the dirty data.
 
 Following logs shows the change of column 737461747573 , 192.168.0.5
  try to read from 0.1 and 0.6, it should be deleted, but finally it shows
 it
  has the data.
 
  log:
  192.168.0.5
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
  Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
  key=7878323239537570657254616e67307878,
  columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
  columnName='null')',
 
 columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
  Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
  reading data from /192.168.0.6
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
  reading digest from /192.168.0.1
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6556@/192.168.0.6
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6557@/192.168.0.1
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed digest response
  DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line
 65)
  resolving 2 responses
  DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
  Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
  7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
  d41d8cd98f00b204e9800998ecf8427e)
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6558@/192.168.0.6
  DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6559@/192.168.0.1
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line
 63)
  resolving 2 responses
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 2 of 2147483647:
 696e517565756554696d65:false:13@1340870382109005
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 3 of 2147483647: 

cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
I have a three node cluster running 1.0.2, today there's a very strange
problem that suddenly two of cassandra  node(let's say B and C) was costing
a lot of cpu, turned out for some reason the java binary just dont
run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works
okay.

after that node A stop working... same problem, I install sun jdk, then
it's okay. but minutes later, B stop working again, about 5-10 minutes
later after the cassandra started, it stop responding connections, I can't
access 9160 and nodetool dont return either.

I have turned on DEBUG and dont see much useful information, the last rows
on node B are as belows:
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 65) resolving 2 responses
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 106) digests verified
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 110) resolve: 0 ms.
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
694) Read: 5 ms.
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3


this problem is really driving me crazy since I just dont know what
happened, and how to debug it, I tried to kill node A and restart it, then
node B halt, after I restart B, then node C goes down..


one thing may related is that the log time on node B is not the same with
the system time(A and C are okay).

while date on node B shows:
Sun Jul  1 23:10:57 CST 2012 (system time)

but you may noticed that the time is 2012-07-01 07:45:XX in those above
log message.  the system time is right, just not sure why cassandra's log
file shows the wrong time, I didn't recall cassandra have timezone
settings.


Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
adjust the timezone of java by  -Duser.timezone   and the timezone of
cassandra is the same with system(Debian 6.0).

after restart cassandra I found the following error message in the log file
of node B. after about 2 minutes later, node C stop responding

the error log of node B:

Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



the log info in node C:


DEBUG [MutationStage:25] 2012-07-01 23:29:42,909
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark',
key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
modifications=[ColumnFamily(permacache
[76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to
79529@/192.168.1.129
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java
(line 523) insert
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
key='636f6d6d656e74735f706172656e74735f32373232343938',
modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843
,])])]/QUORUM
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
/192.168.1.40
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
/192.168.1.129
DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
ResponseVerbHandler.java (line 44) Processing response on a callback from
50050@/192.168.1.129
DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
ResponseVerbHandler.java (line 44) Processing response on a callback from
50051@/192.168.1.40
DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line
116) Version is now 3



On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote:

 I have a three node cluster running 1.0.2, today there's a very strange
 problem that suddenly two of cassandra  node(let's say B and C) was costing
 a lot of cpu, turned out for some reason the java binary just dont
 run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works
 okay.

 after that node A stop working... same problem, I install sun jdk, then
 it's okay. but minutes later, B stop working again, about 5-10 minutes
 later after the cassandra started, it stop responding connections, I can't
 access 9160 and nodetool dont return either.

 I have turned on DEBUG and dont see much useful information, the last rows
 on node B are as belows:
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 65) resolving 2 responses
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 106) digests verified
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 110) resolve: 0 ms.
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
 694) Read: 5 ms.
 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
 116) Version is now 3
 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
 116) Version is now 3


 this problem is really driving me crazy since I just dont know what
 happened, and how to debug it, I tried to kill node A and restart it, then
 node B halt, after I restart B, then node C goes down..


 one thing may related is that the log time on node B is not the same with
 the system time(A and C are okay).

 while date on node B shows:
 Sun Jul  1 23:10:57 CST 2012 (system time)

 but you may noticed that the time is 2012-07-01 07:45:XX in those above
 log message.  the system time is right, just not sure why cassandra's log
 

Re: cassandra halt after started minutes later

2012-07-01 Thread David Daeschler
This looks like the problem a bunch of us were having yesterday that
isn't cleared without a reboot or a date command. It seems to be
related to the leap second that was added between the 30th June and
the 1st of July.

See the mailing list thread with subject High CPU usage as of 8pm eastern time

If you are seeing high CPU usage and a stall after restarting
cassandra still, and you are on Linux, try:

date; date `date +%m%d%H%M%C%y.%S`; date;

In a terminal and see if everything starts working again.

I hope this helps.
-- 
David Daeschler



On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu springri...@gmail.com wrote:
 adjust the timezone of java by  -Duser.timezone   and the timezone of
 cassandra is the same with system(Debian 6.0).

 after restart cassandra I found the following error message in the log file
 of node B. after about 2 minutes later, node C stop responding

 the error log of node B:

 Thrift transport error occurred during processing of message.
 org.apache.thrift.transport.TTransportException
 at
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



 the log info in node C:


 DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java
 (line 60) RowMutation(keyspace='spark',
 key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
 modifications=[ColumnFamily(permacache
 [76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to
 79529@/192.168.1.129
 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line
 523) insert
 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
 key='636f6d6d656e74735f706172656e74735f32373232343938',
 modifications=[ColumnFamily(permacache
 [76616c7565:false:6@1341156582953843,])])]/QUORUM
 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
 /192.168.1.40
 DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
 /192.168.1.129
 DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
 116) Version is now 3
 DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 50050@/192.168.1.129
 DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line
 116) Version is now 3
 DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
 ResponseVerbHandler.java (line 44) Processing response on a callback from
 50051@/192.168.1.40
 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line
 116) Version is now 3



 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote:

 I have a three node cluster running 1.0.2, today there's a very strange
 problem that suddenly two of cassandra  node(let's say B and C) was costing
 a lot of cpu, turned out for some reason the java binary just dont run
 I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay.

 after that node A stop working... same problem, I install sun jdk, then
 it's okay. but minutes later, B stop working again, about 5-10 minutes later
 after the cassandra started, it stop responding connections, I can't access
 9160 and nodetool dont return either.

 I have turned on DEBUG and dont see much useful information, the last rows
 on node B are as belows:
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 65) resolving 2 responses
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 106) digests verified
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
 (line 110) resolve: 0 ms.
 DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
 694) Read: 5 ms.
 DEBUG [Thread-8] 2012-07-01 07:45:42,831 

Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
huge great thanks  it is the leap second problem!

finally I can go to bed

On Mon, Jul 2, 2012 at 12:11 AM, David Daeschler
david.daesch...@gmail.comwrote:

 This looks like the problem a bunch of us were having yesterday that
 isn't cleared without a reboot or a date command. It seems to be
 related to the leap second that was added between the 30th June and
 the 1st of July.

 See the mailing list thread with subject High CPU usage as of 8pm eastern
 time

 If you are seeing high CPU usage and a stall after restarting
 cassandra still, and you are on Linux, try:

 date; date `date +%m%d%H%M%C%y.%S`; date;

 In a terminal and see if everything starts working again.

 I hope this helps.
 --
 David Daeschler



 On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu springri...@gmail.com wrote:
  adjust the timezone of java by  -Duser.timezone   and the timezone of
  cassandra is the same with system(Debian 6.0).
 
  after restart cassandra I found the following error message in the log
 file
  of node B. after about 2 minutes later, node C stop responding
 
  the error log of node B:
 
  Thrift transport error occurred during processing of message.
  org.apache.thrift.transport.TTransportException
  at
 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  at
 
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
  at
 
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  at
 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at
 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
  at
 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at
 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
  at
 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
 
 
  the log info in node C:
 
 
  DEBUG [MutationStage:25] 2012-07-01 23:29:42,909
 RowMutationVerbHandler.java
  (line 60) RowMutation(keyspace='spark',
 
 key='3937343836623538363837363135353264313339333463343532623634373131656462306139',
  modifications=[ColumnFamily(permacache
  [76616c7565:false:67906@1341156582948365,])]) applied.  Sending
 response to
  79529@/192.168.1.129
  DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java
 (line
  523) insert
  DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
  172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
  key='636f6d6d656e74735f706172656e74735f32373232343938',
  modifications=[ColumnFamily(permacache
  [76616c7565:false:6@1341156582953843,])])]/QUORUM
  DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
  301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
 to
  /192.168.1.40
  DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
  301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
 to
  /192.168.1.129
  DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
  116) Version is now 3
  DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  50050@/192.168.1.129
  DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java
 (line
  116) Version is now 3
  DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  50051@/192.168.1.40
  DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java
 (line
  116) Version is now 3
 
 
 
  On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com
 wrote:
 
  I have a three node cluster running 1.0.2, today there's a very strange
  problem that suddenly two of cassandra  node(let's say B and C) was
 costing
  a lot of cpu, turned out for some reason the java binary just dont
 run
  I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works
 okay.
 
  after that node A stop working... same problem, I install sun jdk,
 then
  it's okay. but minutes later, B stop working again, about 5-10 minutes
 later
  after the cassandra started, it stop responding connections, I can't
 access
  9160 and nodetool dont return either.
 
  I have turned on DEBUG and dont see much useful information, the last
 rows
  on node B are as belows:
  DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
  (line 65) resolving 2 responses
  DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 

Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread ruslan usifov
Hello

We was under ddos attack, and as result we got high ksoftirqd activity
- as result cassandra begin answer very slow. But when ddos was gone
high ksoftirqd activity still exists, and dissaper when i stop
cassandra daemon, and repeat again when i start cassadra daemon, the
fully resolution of problem is full reboot of server. What this can be
(why ksoftirqd begin work very intensive when cassandra runing - we
disable all working traffic to cluster but this doesn't help so this
is can't be due heavy load )? And how to solve this?

PS:
 OS ubuntu 10.0.4 (2.6.32.41)
 cassandra 1.0.10
 java 1.6.32 (from oracle)


Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread Sergey Kondratyev
Hello,
it is not related to cassandra/ddos.
it is kernel problems due to leap second. See
http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second

On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov ruslan.usi...@gmail.com wrote:
 Hello

 We was under ddos attack, and as result we got high ksoftirqd activity
 - as result cassandra begin answer very slow. But when ddos was gone
 high ksoftirqd activity still exists, and dissaper when i stop
 cassandra daemon, and repeat again when i start cassadra daemon, the
 fully resolution of problem is full reboot of server. What this can be
 (why ksoftirqd begin work very intensive when cassandra runing - we
 disable all working traffic to cluster but this doesn't help so this
 is can't be due heavy load )? And how to solve this?

 PS:
  OS ubuntu 10.0.4 (2.6.32.41)
  cassandra 1.0.10
  java 1.6.32 (from oracle)


Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread David Daeschler
Good afternoon,

This again looks like it could be the leap second issue:

This looks like the problem a bunch of us were having yesterday that
isn't cleared without a reboot or a date command. It seems to be
related to the leap second that was added between the 30th June and
the 1st of July.

See the mailing list thread with subject High CPU usage as of 8pm eastern time

If you are seeing high CPU usage and a stall after restarting
cassandra still, and you are on Linux, try:

date; date `date +%m%d%H%M%C%y.%S`; date;

In a terminal and see if everything starts working again.

I hope this helps. Please spread the word if you see others having
issues with unresponsive kernels/high CPU.

-- 
David Daeschler



On Sun, Jul 1, 2012 at 1:05 PM, ruslan usifov ruslan.usi...@gmail.com wrote:
 Hello

 We was under ddos attack, and as result we got high ksoftirqd activity
 - as result cassandra begin answer very slow. But when ddos was gone
 high ksoftirqd activity still exists, and dissaper when i stop
 cassandra daemon, and repeat again when i start cassadra daemon, the
 fully resolution of problem is full reboot of server. What this can be
 (why ksoftirqd begin work very intensive when cassandra runing - we
 disable all working traffic to cluster but this doesn't help so this
 is can't be due heavy load )? And how to solve this?

 PS:
  OS ubuntu 10.0.4 (2.6.32.41)
  cassandra 1.0.10
  java 1.6.32 (from oracle)


Re: Cassandra consistency issue on cluster system

2012-07-01 Thread aaron morton
If you are reading at QUOURM there is no problem, this is how eventual 
consistency works in Cassandra.

The coordinator will resolve the differences between and the column with the 
higher timestamp will win. 

If the delete was applied to less then CL nodes the client should have received 
a TimedOutException.

 
Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 7:41 PM, 黄荣桢 wrote:

 Background: My application is running on a cluster system(which have 4 
 nodes), and system time of these four nodes are synchronizing by NTP. I use 
 Write.QUORUM and Read.QUORUM strategy. The probability of this problem is not 
 very high. Cassandra version is 1.0.3, I have tried Cassandra 1.1.1, this 
 problem is still exist.
 
 Problem: I deleted a column, but after 6 seconds, Cassandra can still get the 
 old record which isMarkedForDelete is still false.
 
 Is anybody meet the same problem? And how to solve it?
 
 Detail: See the log below:
 
 Node 3(Local node):
 [pool-2-thread-42] 2012-06-27 14:49:23,732 SliceQueryFilter.java (line 123) 
 collecting 0 of 2147483647: SuperColumn(667072 
 [..7fff01382ca96c8b636b698a:false:36@1340779097312016,..)
 
 [pool-2-thread-44] 2012-06-27 14:51:21,367 StorageProxy.java (line 172) 
 Mutations/ConsistencyLevel are [RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])]/QUORUM
 
 -- I delete this record at 14:51:21,367
 
 [pool-2-thread-37] 2012-06-27 14:51:27,400 SliceQueryFilter.java (line 123) 
 collecting 0 of 2147483647: SuperColumn(667072 
 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..)
 
 -- But I can still get the old record at 14:51:27,400
 
 Node2:
 [MutationStage:118] 2012-06-27 14:51:21,373 RowMutationVerbHandler.java (line 
 48) Applying RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])
 
 [MutationStage:118] 2012-06-27 14:51:21,374 RowMutationVerbHandler.java (line 
 60) RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])]) 
 applied. Sending response to 6692098@/192.168.0.3
 
 [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 
 48) Applying RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,..])
 
 [MutationStage:123] 2012-06-27 14:51:27,405 RowMutationVerbHandler.java (line 
 60) RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [..,7fff01382ca96c8b636b698a:false:36@1340779097312016,...]),])])
  applied. Sending response to 6698516@/192.168.0.3
 
 Node1:
 [MutationStage:98] 2012-06-27 14:51:24,661 RowMutationVerbHandler.java (line 
 48) Applying RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a:true:4@1340779881338000,]),])])
 
 [MutationStage:98] 2012-06-27 14:51:24,675 RowMutationVerbHandler.java (line 
 60) RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a: true :4@1340779881338000,]),])]) 
 applied. Sending response to 6692099@/192.168.0.3
 
 [MutationStage:93] 2012-06-27 14:51:40,932 RowMutationVerbHandler.java (line 
 48) Applying RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a:true:4@1340779900915004,]),])])
 
 DEBUG [MutationStage:93] 2012-06-27 14:51:40,933 RowMutationVerbHandler.java 
 (line 60) RowMutation(keyspace='drc', key='3332', 
 modifications=[ColumnFamily(fpr_index [SuperColumn(667072 
 [7fff01382ca96c8b636b698a: true :4@1340779900915004,]),])]) 
 applied. Sending response to 6706555@/192.168.0.3
 
 [ReadStage:55] 2012-06-27 14:51:43,074 SliceQueryFilter.java (line 123) 
 collecting 0 of 
 5000:7fff01382ca96c8b636b698a:true:4@1340779900915004
 
 Node 4:
 
 There is no log about this record on Node 4.
 



Re: No indexed columns present in by-columns clause with equals operator

2012-07-01 Thread aaron morton
Like the exception says:

 Bad Request: No indexed columns present in by-columns clause with equals 
 operator
 Same with other relational operators(,=,=)
You must include an equality operator in the where clause:

That is why
 SELECT * FROM STEST WHERE VALUE1 = 10; 

Works but 
 SELECT * FROM STEST WHERE VALUE1  10; 
does not. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 8:55 PM, Abhijit Chanda wrote:

 Hi All,
 I have got a strange exception while using cassandra cql. Relational 
 operators like (, , =, =) are not working.
 my columnfamily looks like this.
 CREATE COLUMNFAMILY STEST (
   ROW_KEY text PRIMARY KEY,
   VALUE1 text,
   VALUE2 text
 ) WITH
   comment='' AND
   comparator=text AND
   read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   default_validation=text AND
   min_compaction_threshold=4 AND
   max_compaction_threshold=32 AND
   replicate_on_write=True;
 
 CREATE INDEX VALUE1_IDX ON STEST (VALUE1);
 
 CREATE INDEX VALUE2_IDX ON STEST (VALUE2);
 
 
 Now in this columnfamily if i query this 
 SELECT * FROM STEST WHERE VALUE1 = 10; it returns -
  ROW_KEY | VALUE1 | VALUE2
  -+-+
 2 | 10 | AB
 
 But if i query like this 
 SELECT * FROM STEST WHERE VALUE1  10; 
 It is showing this exception
 Bad Request: No indexed columns present in by-columns clause with equals 
 operator
 Same with other relational operators(,=,=)
 
 these are  the datas available in my columnfamily 
 ROW_KEY | VALUE1 | VALUE2
 +--+
   3 | 100 |ABC
   5 |9 |  ABCDE
   2 |  10 | AB
   1 |1 |  A
   4 |  19 |   ABCD
 
 Looks like some configuration problem. Please help me. Thanks in Advance
 
 
 
 
 Regards,
 -- 
 Abhijit Chanda
 Analyst
 VeHere Interactive Pvt. Ltd.
 +91-974395
 



Re: BulkLoading SSTables and compression

2012-07-01 Thread aaron morton
When the data is streamed into the cluster by the bulk loader it is compressed 
on the receiving end (if the target CF has compression enabled).

If you are able to reproduce this  can you create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/06/2012, at 10:00 PM, Andy Cobley wrote:

 My (limited) experience of moving form 0.8 to 1.0 is that you do have to use 
 rebuildsstables.  I'm guessing BlukLoading is bypassing the compression ?
 
 Andy
 
 On 28 Jun 2012, at 10:53, jmodha wrote:
 
 Hi,
 
 We are migrating our Cassandra cluster from v1.0.3 to v1.1.1, the data is
 migrated using SSTableLoader to an empty Cassandra cluster.
 
 The data in the source cluster (v1.0.3) is uncompressed and the target
 cluster (1.1.1) has the column family created with compression turned on.
 
 What we are seeing is that once the data has been loaded into the target
 cluster, the size is similar to the data in the source cluster. Our
 expectation is that since we have turned on compression in the target
 cluster, the amount of data would be reduced.
 
 We have tried running the rebuildsstables nodetool command on a node after
 data has been loaded and we do indeed see a huge reduction in size e.g. from
 30GB to 10GB for a given column family. We were hoping to see this at the
 point of loading the data in via the SSTableLoader.
 
 Is this behaviour expected? 
 
 Do we need to run the rebuildsstables command on all nodes to actually
 compress the data after it has been streamed in?
 
 Thanks.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.
 
 
 The University of Dundee is a Scottish Registered Charity, No. SC015096.
 
 



Re: Amazingly bad compaction performance

2012-07-01 Thread aaron morton
 Can compression be changed or disabled on-the-fly with cassandra?
Yes. Disable it in the schema and then run nodetool upgradetables

As Tyler said, JDK7 is not officially supported yet and you may be running into 
issues others have not found. Any chance you could downgrade one node to JDK6 
and check the performance ? If it looks like a JDK issue could you post your 
findings to https://issues.apache.org/jira/browse/CASSANDRA and include the 
schema details ? 

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/06/2012, at 2:36 AM, Dustin Wenz wrote:

 My maximum and initial heap sizes are set to 6GB. Actual memory usage for the 
 VM is around 11-12GB. The machine has 24GB of physical memory, so there isn't 
 any paging going in.
 
 I don't see any GC events logged that are longer than a few hundred 
 milliseconds. Is it possible that GC is taking significant time without it 
 being reported?
 
   - .Dustin
 
 On Jun 27, 2012, at 1:31 AM, Igor wrote:
 
 Hello
 
 Too much GC? Check JVM heap settings and real usage.
 
 On 06/27/2012 01:37 AM, Dustin Wenz wrote:
 We occasionally see fairly poor compaction performance on random nodes in 
 our 7-node cluster, and I have no idea why. This is one example from the 
 log:
 
 [CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java 
 (line 221) Compacted to 
 [/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,].
   26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 
 0.006250MB/s.  Time: 4,071,163ms.
 
 That particular event took over an hour to compact only 25 megabytes. 
 During that time, there was very little disk IO, and the java process 
 (OpenJDK 7) was pegged at 200% CPU. The node was also completely 
 unresponsive to network requests until the compaction was finished. Most 
 compactions run just over 7MB/s. This is an extreme outlier, but users 
 definitely notice the hit when it occurs.
 
 I grabbed a sample of the process using jstack, and this was the only 
 thread in CompactionExecutor:
 
 CompactionExecutor:54 daemon prio=1 tid=41247522816 nid=0x99a5ff740 
 runnable [140737253617664]
java.lang.Thread.State: RUNNABLE
 at org.xerial.snappy.SnappyNative.rawCompress(Native Method)
 at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358)
 at 
 org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80)
 at 
 org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89)
 at 
 org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196)
 at 
 org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260)
 at 
 org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128)
 at 
 org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 - locked 36527862064 (a java.io.DataOutputStream)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 
 Is it possible that there is an issue with snappy compression? Based on the 
 lousy compression ratio, I think we could get by without it just fine. Can 
 compression be changed or disabled on-the-fly with cassandra?
 
 - .Dustin
 
 
 



Re: hector timeouts

2012-07-01 Thread aaron morton
Using Cassandra as a queue is generally thought of as a bas idea, owing to the 
high delete workload. Levelled compaction handles it better but it is still no 
the best approach. 

Depending on your needs consider running http://incubator.apache.org/kafka/ 

 could you share some details on this?  we're using hector and we see random 
 timeout warns in the logs and not sure how to address them.
First determine if they are server side or client side timeouts. Then determine 
what the query was. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/06/2012, at 7:02 AM, Deno Vichas wrote:

 On 6/28/2012 9:37 AM, David Leimbach wrote:
 
 That coupled with Hector timeout issues became a real problem for us.
 
 could you share some details on this?  we're using hector and we see random 
 timeout warns in the logs and not sure how to address them.
 
 
 thanks,
 deno



Re: BulkLoading SSTables and compression

2012-07-01 Thread jmodha
Sure, before I create a ticket, is there a way I can confirm that the
sstables are indeed not compressed other than running the rebuildsstables
nodetool command (and observing the live size go down)?

Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849p7580922.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Oftopic: ksoftirqd after ddos take more cpu? as result cassandra latensy very high

2012-07-01 Thread ruslan usifov
2012/7/1 David Daeschler david.daesch...@gmail.com:
 Good afternoon,

 This again looks like it could be the leap second issue:

 This looks like the problem a bunch of us were having yesterday that
 isn't cleared without a reboot or a date command. It seems to be
 related to the leap second that was added between the 30th June and
 the 1st of July.

 See the mailing list thread with subject High CPU usage as of 8pm eastern 
 time

 If you are seeing high CPU usage and a stall after restarting
 cassandra still, and you are on Linux, try:

 date; date `date +%m%d%H%M%C%y.%S`; date;

 In a terminal and see if everything starts working again.

 I hope this helps. Please spread the word if you see others having
 issues with unresponsive kernels/high CPU.

 Hello, this realy helps. In our case two problems cross each other-((
and we doesn't have assumed that might be a kernel problem. On one
data cluster we simply reboot it, and in seccond apply date solution
and everything is fine, thanks


Re: No indexed columns present in by-columns clause with equals operator

2012-07-01 Thread Abhijit Chanda
Hey Aaron,

I am able to sort out the problem. Thanks anyways.

Regards,
Abhijit