Side effects of hinted handoff lead to consistency problem
I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is Write quorum, Read quorum. Traffic has three major steps Create: Rowkey: Column: status=new, requests=x Update: Rowkey: Column: status=executing, requests=x Delete: Rowkey: When one node down, it can work according to consistency configuration, and the final status is all requests are finished and delete. So if running cassandra client to list the result (also set consistency quorum). It shows empty (only rowkey left), which is correct. But if we start the dead node, the hinted handoff model will write back the data to this node. So there are lots of create, update, delete. I don't know due to GC or compaction, the delete records on other two nodes seems not work, and if using cassandra client to list the data (also consistency quorum), the deleted row show again with column value. And if using client to check the data several times, you can find the data is changed, seems hinted handoff replay operation, the deleted data show up and then disappear. So the hinted handoff mechanism will faster the repair, but the temporary data will be seen from external (if data is deleted). Is there a way to have this procedure invisible from external, until the hinted handoff finished? What I want is final status synchronization, the temporary status is out of date and also incorrect, should never been seen from external. Is it due to row delete instead of column delete? Or compaction?
Error during cleanup
Hi, When running cleanup on a node with C* 2.0.1, I got the following error: cassandra01 - Error during cleanup: javax.management.MBeanException: java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot be cast to org.apache.cassandra.io.sstable.SSTableScanner However, cleanup appears to have worked, since the size of the data on that particular node has decreased. Is this an error to worry about?
How to determine which node(s) an insert would go to in C* 2.0 with vnodes?
Hi, When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and 256 tokens assigned to each node, is it possible to find out where a certain key is destined to go? If the keyspace defined has replication factor = 3, then a specific key like 'row-1' would be destined to go to 3 nodes, right? Is there a way I can pre-determine which of the 3 nodes out of 100 that that insert of 'row-1' would go to? Or alternatively, after I've already written the 'row-1', can I find out which 3 nodes it went to?
RE: How to determine which node(s) an insert would go to in C* 2.0 with vnodes?
In CQL there is a token() function you can use to find the result of your partitioning schemes hash function for any value. e.g. select token(value) from column_family1 where partition_column = value; You then need to find out which nodes are responsible for that value using nodetool ring or looking at system.peers table for tokens Not that straight forward esp. with 100 nodes and vNodes. Maybe someone has written a script or something to do this already? Or I suppose you could turn on tracing and repeat the query until you've seen it hit three different end nodes? i.e. tracing on; select * from column_family1 where partition_column = value; From: Sameer Farooqui [mailto:sam...@blueplastic.com] Sent: 08 October 2013 10:20 To: user@cassandra.apache.org Subject: How to determine which node(s) an insert would go to in C* 2.0 with vnodes? Hi, When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and 256 tokens assigned to each node, is it possible to find out where a certain key is destined to go? If the keyspace defined has replication factor = 3, then a specific key like 'row-1' would be destined to go to 3 nodes, right? Is there a way I can pre-determine which of the 3 nodes out of 100 that that insert of 'row-1' would go to? Or alternatively, after I've already written the 'row-1', can I find out which 3 nodes it went to?
Re: How to determine which node(s) an insert would go to in C* 2.0 with vnodes?
hi, you can try : nodetool getendpoints keyspace cf key - Print the end points that owns the key 2013/10/8 Christopher Wirt chris.w...@struq.com In CQL there is a token() function you can use to find the result of your partitioning schemes hash function for any value. ** ** e.g. select token(value) from column_family1 where partition_column = value; ** ** You then need to find out which nodes are responsible for that value using nodetool ring or looking at system.peers table for tokens ** ** Not that straight forward esp. with 100 nodes and vNodes. Maybe someone has written a script or something to do this already? ** ** Or I suppose you could turn on tracing and repeat the query until you’ve seen it hit three different end nodes? i.e. tracing on; select * from column_family1 where partition_column = value; ** ** ** ** ** ** *From:* Sameer Farooqui [mailto:sam...@blueplastic.com] *Sent:* 08 October 2013 10:20 *To:* user@cassandra.apache.org *Subject:* How to determine which node(s) an insert would go to in C* 2.0 with vnodes? ** ** Hi, When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and 256 tokens assigned to each node, is it possible to find out where a certain key is destined to go? If the keyspace defined has replication factor = 3, then a specific key like 'row-1' would be destined to go to 3 nodes, right? Is there a way I can pre-determine which of the 3 nodes out of 100 that that insert of 'row-1' would go to? Or alternatively, after I've already written the 'row-1', can I find out which 3 nodes it went to?
TimedOutException in CqlRecordWriter
Hello, I run Hadoop jobs which read data from Cassandra 1.2.8 and write results back to another tables. One of my reduce tasks was killed 2 times by job tracker, because it wasn't responding for more than 10 minutes, the 3rd attempt was succesfull. The error message for killed reduce tasks is: java.io.IOException: TimedOutException(acknowledged_by:0) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:245) Caused by: TimedOutException(acknowledged_by:0) at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:41884) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1689) at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1674) at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:229) , Task attempt_201310081258_0006_r_00_0 failed to report status for 600 seconds. Killing! I'm wondering how could it happen that task didn't report status for 600 seconds and how it's related to the TimedOutException at the top of the stacktrace. The write_request_timeout_in_ms is default 1, so it should fail much earlier. Thanks.
Proper Use of PreparedStatements in DataStax driver
Hello, I'm trying to determine the proper way to use PreparedStatements using the datastax CQL Java driver; so the API is Session session = ... PreparedStatement ps = session.prepare(INSERT INTO table-name (column, ...) VALUES (?, ...)) BoundStatement bs = new BoundStatement(ps); //bind variables session.execute(bs) 1. My question is should I only instantiate the PreparedStatement once and reuse *the same instance* every time I want to execute an INSERT? Is that necessary to get the performance benefits of using PreparedStatements? 2. Or does session.prepare(INSERT INTO table-name (column, ...) VALUES (?, ...)) return the same logical PreparedStatement each time (it need not necessarily be the identical (as in ==) instance), and you still get the same performance benefits? Note: this is how you used jdbc.PreparedStatements If the answer is #1, then that means PreparedStatements are effectively singletons and need to be multi-thread-safe and the app is responsible for dealing with lifecycle issues presuming that the original Session dies or whatever I've looked at the datastax API datastax API http://www.datastax.com/drivers/java/1.0/apidocs/ , chatted on IRC, and no answer. This seems to me as a pretty fundamental question -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Proper-Use-of-PreparedStatements-in-DataStax-driver-tp7590793.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Error during cleanup
Do you have a complete stacktrace available? On Tue, Oct 8, 2013 at 2:08 AM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, When running cleanup on a node with C* 2.0.1, I got the following error: cassandra01 - Error during cleanup: javax.management.MBeanException: java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot be cast to org.apache.cassandra.io.sstable.SSTableScanner However, cleanup appears to have worked, since the size of the data on that particular node has decreased. Is this an error to worry about? -- Tyler Hobbs DataStax http://datastax.com/
Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted
SizeTieredCompactionStrategy only compacts sstables that are a similar size (by default, they basically need to be within 50% of each other). Perhaps your first SSTable was very large or small compared to the others? On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, I have a fresh 1-node C* 2.0 install with a demo keyspace created with the SizeTiered compaction strategy. I've noticed that in the beginning this keyspace has just one SSTable: demodb-users-jb-1-Data.db But as I add more data to the table and do some flushes, the # of SSTables builds up. After I have a handful of SSTables, I trigger a flush using 'nodetool flush demodb users', but then not ALL of the SSTables get compacted. I've noticed that the 1st SSTable remains the same and doesn't disappear after the compaction, but the latter SSTables do get compacted into one new Data file. Is there a reason why the first SSTable is special and it is not disappearing after compaction? Also, I think I noticed that if I wait a few days and run another compaction, then that 1st SSTable does not compacted (and it disappears). Can someone help explain why the 1st SSTable behaves this way? -- Tyler Hobbs DataStax http://datastax.com/
Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted
Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables are mismatched in size... but upon closer inspection, that doesn't appear to be the case: -rw-r--r-- 1 cassandra cassandra 227 Oct 7 23:26 demodb-users-jb-1-Data.db -rw-r--r-- 1 cassandra cassandra 242 Oct 8 00:38 demodb-users-jb-6-Data.db The two files look to be nearly the same size. There just appears to be something special about that first SSTable and it not getting compacted. On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote: SizeTieredCompactionStrategy only compacts sstables that are a similar size (by default, they basically need to be within 50% of each other). Perhaps your first SSTable was very large or small compared to the others? On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, I have a fresh 1-node C* 2.0 install with a demo keyspace created with the SizeTiered compaction strategy. I've noticed that in the beginning this keyspace has just one SSTable: demodb-users-jb-1-Data.db But as I add more data to the table and do some flushes, the # of SSTables builds up. After I have a handful of SSTables, I trigger a flush using 'nodetool flush demodb users', but then not ALL of the SSTables get compacted. I've noticed that the 1st SSTable remains the same and doesn't disappear after the compaction, but the latter SSTables do get compacted into one new Data file. Is there a reason why the first SSTable is special and it is not disappearing after compaction? Also, I think I noticed that if I wait a few days and run another compaction, then that 1st SSTable does not compacted (and it disappears). Can someone help explain why the 1st SSTable behaves this way? -- Tyler Hobbs DataStax http://datastax.com/
Re: Error during cleanup
No, but I may be able to get one for you if the issue is reproducible when I trigger another cleanup. I originally issued the cleanup on the node via OpsCenter Community Edition 3.2.2 If I issue another cleanup, can you give me the rough steps for how to get the stacktrace? On Tue, Oct 8, 2013 at 2:46 PM, Tyler Hobbs ty...@datastax.com wrote: Do you have a complete stacktrace available? On Tue, Oct 8, 2013 at 2:08 AM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, When running cleanup on a node with C* 2.0.1, I got the following error: cassandra01 - Error during cleanup: javax.management.MBeanException: java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot be cast to org.apache.cassandra.io.sstable.SSTableScanner However, cleanup appears to have worked, since the size of the data on that particular node has decreased. Is this an error to worry about? -- Tyler Hobbs DataStax http://datastax.com/
Re: Error during cleanup
On Tue, Oct 8, 2013 at 2:02 PM, Sameer Farooqui sam...@blueplastic.comwrote: If I issue another cleanup, can you give me the rough steps for how to get the stacktrace? I'm hoping it will show up in Cassandra's system.log, but since it's triggered through JMX, it's possible that it will not. -- Tyler Hobbs DataStax http://datastax.com/
Re: Any suggestions about running Cassandra on Windows servers for production use?
On Mon, Oct 7, 2013 at 5:35 AM, Vassilis Bekiaris bekiar...@iconplatforms.com wrote: we are planning a Cassandra 1.2 installation at a client site; the client will run operations themselves and based on their IT team's experience they are more inclined towards running Cassandra nodes on Windows servers, however given proper arguments they would also consider using linux servers. On the other hand, our team has experience running Cassandra on Linux, so we have no idea what we might face on Windows. If you benchmark the two against each other, I would be shocked if the Windows version is not significantly slower. The differences are often at quite a low level, like for example the mmap example, or the fact that Cassandra uses fadvise on Linux. Various optimizations only apply to the Linux version; even running Cassandra on Solaris would not make use of some of them. There are also supplementary tools (Priam, tablesnap) which may or may not work under Windows. It would be great if someone enumerated the in-code differences.. I have noted it for my extensive TODO list with the optimism of a new week... :) On the other hand, if you don't mind taking the performance hit and being among a relatively small group of operators in Production on Windows, it should work fine? =Rob
Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted
Well, 6 was created by the other sstables being compacted, correct? If so, they were probably quite a bit smaller (~25% of the size). Once you have two more sstables of roughly that size, they should be compacted automatically. On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.comwrote: Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables are mismatched in size... but upon closer inspection, that doesn't appear to be the case: -rw-r--r-- 1 cassandra cassandra 227 Oct 7 23:26 demodb-users-jb-1-Data.db -rw-r--r-- 1 cassandra cassandra 242 Oct 8 00:38 demodb-users-jb-6-Data.db The two files look to be nearly the same size. There just appears to be something special about that first SSTable and it not getting compacted. On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote: SizeTieredCompactionStrategy only compacts sstables that are a similar size (by default, they basically need to be within 50% of each other). Perhaps your first SSTable was very large or small compared to the others? On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, I have a fresh 1-node C* 2.0 install with a demo keyspace created with the SizeTiered compaction strategy. I've noticed that in the beginning this keyspace has just one SSTable: demodb-users-jb-1-Data.db But as I add more data to the table and do some flushes, the # of SSTables builds up. After I have a handful of SSTables, I trigger a flush using 'nodetool flush demodb users', but then not ALL of the SSTables get compacted. I've noticed that the 1st SSTable remains the same and doesn't disappear after the compaction, but the latter SSTables do get compacted into one new Data file. Is there a reason why the first SSTable is special and it is not disappearing after compaction? Also, I think I noticed that if I wait a few days and run another compaction, then that 1st SSTable does not compacted (and it disappears). Can someone help explain why the 1st SSTable behaves this way? -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Using cassandra-cli with Client-server encryption
Hi, I am trying to use cassandra-cli with client-server encryption enabled. But somehow getting handshake failure error(given below): org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.cassandra.thrift.Cassandra$Client.send_describe_cluster_name(Cassandra.java:1095) at org.apache.cassandra.thrift.Cassandra$Client.describe_cluster_name(Cassandra.java:1088) at org.apache.cassandra.cli.CliMain.connect(CliMain.java:147) at org.apache.cassandra.cli.CliMain.main(CliMain.java:246) Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:1911) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1027) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1262) at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:680) at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:85) at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) I am trying to get it working on local machine. -Vivek
Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted
Hmm, good point. I'll test this out again and see the compaction behavior is as expected given the relative sizes of the SSTables. On Tue, Oct 8, 2013 at 3:06 PM, Tyler Hobbs ty...@datastax.com wrote: Well, 6 was created by the other sstables being compacted, correct? If so, they were probably quite a bit smaller (~25% of the size). Once you have two more sstables of roughly that size, they should be compacted automatically. On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.comwrote: Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables are mismatched in size... but upon closer inspection, that doesn't appear to be the case: -rw-r--r-- 1 cassandra cassandra 227 Oct 7 23:26 demodb-users-jb-1-Data.db -rw-r--r-- 1 cassandra cassandra 242 Oct 8 00:38 demodb-users-jb-6-Data.db The two files look to be nearly the same size. There just appears to be something special about that first SSTable and it not getting compacted. On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote: SizeTieredCompactionStrategy only compacts sstables that are a similar size (by default, they basically need to be within 50% of each other). Perhaps your first SSTable was very large or small compared to the others? On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, I have a fresh 1-node C* 2.0 install with a demo keyspace created with the SizeTiered compaction strategy. I've noticed that in the beginning this keyspace has just one SSTable: demodb-users-jb-1-Data.db But as I add more data to the table and do some flushes, the # of SSTables builds up. After I have a handful of SSTables, I trigger a flush using 'nodetool flush demodb users', but then not ALL of the SSTables get compacted. I've noticed that the 1st SSTable remains the same and doesn't disappear after the compaction, but the latter SSTables do get compacted into one new Data file. Is there a reason why the first SSTable is special and it is not disappearing after compaction? Also, I think I noticed that if I wait a few days and run another compaction, then that 1st SSTable does not compacted (and it disappears). Can someone help explain why the 1st SSTable behaves this way? -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
ArrayIndexOutOfBoundsException in StorageService.extractExpireTime
Hi, after upgrading our cluster from 1.1 to 1.2.10 I'm seeing this exception in system.log (on all nodes): ERROR [GossipStage:1] 2013-10-08 21:03:41,906 CassandraDaemon.java (line 185) Exception in thread Thread[GossipStage :1,5,main] java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1594) at org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1550) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1174) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1887) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:844) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:922) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I assume that's why some nodes display ring status incorrectly. There's a ticket for it a month old and with no reaction. https://issues.apache.org/jira/browse/CASSANDRA-6082 Wondering does someone know how to fix it? Thanks, Viliam signature.asc Description: Digital signature
cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
I have been expermimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty)
Question about read consistency level
Apologies if this is an obvious question, I have looked but not seen too much (particularly about what exactly latest version means when there is no data on a node for a key - though I'd assume it has to be treated as unknown since you couldn't tell if the data had never been created or the tombstone had been cleaned up) We are moving some stuff from prototype towards production. We have a new Cassandra 2.0 instance; 6 nodes. I'm using Astyanax 1.56.43 from Java via thrift, though we see the same thing with CQL 3 from node client 1) So the problem here is likely caused by one of our nodes maybe being in a dead state but not recognized as such - that is something we need to figure out… any suggestions on determine root cause here would be a help; I only have access to OpCenter which tells me little (except this one node was growing in size, whilst others weren't, and NO keyspaces have replication factor 1, so that seems odd even if keys were skewed which they aren't). Anyway that isn't the main question; I can follow up with our ops guys when they are online 2) I saw the problem when I fixed our read code to use QUORUM consistency level not ONE (the Astyanax default); it only affects some keys. The problem is a timeout exception. I assume that it is not getting a response from this one node. (I have read that 2.0.2 may have something that would help this resolve itself quicker) - note I had also seen some slow schema resolution today also Anyways, I can work around the problem with a ONE consistency level it seems, but I did have a more general question: - Our writes are all idempotent in so far as keyspace/cf/(key/column) may be written more than once but only with the same value. We use QUORUM writes (will be LOCAL_QUORUM as soon as we configure server to be DC aware), and that is fine. The point being, I wonder if there is something between ONE and QUORUM read consistency level that I could specify in my situation to return as soon as any node returns a non tombstone value? Thanks, Graham P.S. If not has this been discussed and dismissed - it would seem like a common case that people write data only once, and because of upstream replay/transactional behavior may want a cassandra write to be an idempotent side effect. smime.p7s Description: S/MIME cryptographic signature