Side effects of hinted handoff lead to consistency problem

2013-10-08 Thread Jason Tang
I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is
Write quorum, Read quorum.
Traffic has three major steps
Create:
Rowkey: 
Column: status=new, requests=x
Update:
 Rowkey: 
 Column: status=executing, requests=x
Delete:
 Rowkey: 

When one node down, it can work according to consistency configuration, and
the final status is all requests are finished and delete.

So if running cassandra client to list the result (also set consistency
quorum). It shows empty (only rowkey left), which is correct.

But if we start the dead node, the hinted handoff model will write back the
data to this node. So there are lots of create, update, delete.

I don't know due to GC or compaction, the delete records on other two nodes
seems not work, and if using cassandra client to list the data (also
consistency quorum), the deleted row show again with column value.

And if using client to check the data several times, you can find the data
is changed, seems hinted handoff replay operation, the deleted data show up
and then disappear.

So the hinted handoff mechanism will faster the repair, but the temporary
data will be seen from external (if data is deleted).

Is there a way to have this procedure invisible from external, until the
hinted handoff finished?

What I want is final status synchronization, the temporary status is out of
date and also incorrect, should never been seen from external.

Is it due to row delete instead of column delete? Or compaction?


Error during cleanup

2013-10-08 Thread Sameer Farooqui
Hi,

When running cleanup on a node with C* 2.0.1, I got the following error:

cassandra01 - Error during cleanup: javax.management.MBeanException:
java.util.concurrent.ExecutionException: java.lang.ClassCastException:
org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot
be cast to org.apache.cassandra.io.sstable.SSTableScanner


However, cleanup appears to have worked, since the size of the data on that
particular node has decreased.

Is this an error to worry about?


How to determine which node(s) an insert would go to in C* 2.0 with vnodes?

2013-10-08 Thread Sameer Farooqui
Hi,

When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and
256 tokens assigned to each node, is it possible to find out where a
certain key is destined to go?

If the keyspace defined has replication factor = 3, then a specific key
like 'row-1' would be destined to go to 3 nodes, right? Is there a way I
can pre-determine which of the 3 nodes out of 100 that that insert of
'row-1' would go to?

Or alternatively, after I've already written the 'row-1', can I find out
which 3 nodes it went to?


RE: How to determine which node(s) an insert would go to in C* 2.0 with vnodes?

2013-10-08 Thread Christopher Wirt
In CQL there is a token() function you can use to find the result of your
partitioning schemes hash function for any value.

 

e.g. select token(value) from column_family1 where partition_column = value;

 

You then need to find out which nodes are responsible for that value using
nodetool ring or looking at system.peers table for tokens

 

Not that straight forward esp. with 100 nodes and vNodes. Maybe someone has
written a script or something to do this already?

 

Or I suppose you could turn on tracing and repeat the query until you've
seen it hit three different end nodes?

i.e.

tracing on;

select * from column_family1 where partition_column = value;

 

 

 

From: Sameer Farooqui [mailto:sam...@blueplastic.com] 
Sent: 08 October 2013 10:20
To: user@cassandra.apache.org
Subject: How to determine which node(s) an insert would go to in C* 2.0 with
vnodes?

 

Hi,

When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and
256 tokens assigned to each node, is it possible to find out where a certain
key is destined to go?

If the keyspace defined has replication factor = 3, then a specific key like
'row-1' would be destined to go to 3 nodes, right? Is there a way I can
pre-determine which of the 3 nodes out of 100 that that insert of 'row-1'
would go to?

Or alternatively, after I've already written the 'row-1', can I find out
which 3 nodes it went to?



Re: How to determine which node(s) an insert would go to in C* 2.0 with vnodes?

2013-10-08 Thread Kais Ahmed
hi,

you can try :

nodetool  getendpoints keyspace cf key - Print the end points that
owns the key


2013/10/8 Christopher Wirt chris.w...@struq.com

 In CQL there is a token() function you can use to find the result of your
 partitioning schemes hash function for any value.

 ** **

 e.g. select token(value) from column_family1 where partition_column =
 value;

 ** **

 You then need to find out which nodes are responsible for that value using
 nodetool ring or looking at system.peers table for tokens

 ** **

 Not that straight forward esp. with 100 nodes and vNodes. Maybe someone
 has written a script or something to do this already?

 ** **

 Or I suppose you could turn on tracing and repeat the query until you’ve
 seen it hit three different end nodes?

 i.e.

 tracing on;

 select * from column_family1 where partition_column = value;

 ** **

 ** **

 ** **

 *From:* Sameer Farooqui [mailto:sam...@blueplastic.com]
 *Sent:* 08 October 2013 10:20
 *To:* user@cassandra.apache.org
 *Subject:* How to determine which node(s) an insert would go to in C* 2.0
 with vnodes?

 ** **

 Hi,

 When using C* 2.0 in a large 100 node cluster with Murmer3Hash, vnodes and
 256 tokens assigned to each node, is it possible to find out where a
 certain key is destined to go?

 If the keyspace defined has replication factor = 3, then a specific key
 like 'row-1' would be destined to go to 3 nodes, right? Is there a way I
 can pre-determine which of the 3 nodes out of 100 that that insert of
 'row-1' would go to?

 Or alternatively, after I've already written the 'row-1', can I find out
 which 3 nodes it went to?



TimedOutException in CqlRecordWriter

2013-10-08 Thread Renat Gilfanov
 Hello,

I run Hadoop jobs which read data from Cassandra 1.2.8 and write results back 
to another tables. One of my reduce tasks was killed 2 times by job tracker, 
because it wasn't responding for more than 10 minutes, the 3rd attempt was 
succesfull.

The error message for killed reduce tasks is:

java.io.IOException: TimedOutException(acknowledged_by:0) 
at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:245)
 
Caused by: TimedOutException(acknowledged_by:0) 
at 
org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:41884)
 
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) 
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1689)
 
at 
org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1674)
 
at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:229)
 ,
Task attempt_201310081258_0006_r_00_0 failed to report status for 600 
seconds. Killing!

I'm wondering how could it happen that task didn't report status for 600 
seconds and how it's related to the TimedOutException at the top of the 
stacktrace.  The write_request_timeout_in_ms is default 1, so it should 
fail much earlier.


Thanks.


Proper Use of PreparedStatements in DataStax driver

2013-10-08 Thread thn
Hello,

I'm trying to determine the proper way to use PreparedStatements using the
datastax CQL Java driver;

so the API is 
Session session = ...
PreparedStatement ps = session.prepare(INSERT INTO table-name (column,
...) VALUES (?, ...))
BoundStatement bs = new BoundStatement(ps);

//bind variables

session.execute(bs)


1.  My question is should I only instantiate the PreparedStatement once and
reuse *the same instance* every time I want to execute an INSERT? 
Is that necessary to get the performance benefits of using
PreparedStatements?

2.  Or does session.prepare(INSERT INTO table-name (column, ...) VALUES
(?, ...)) return the same logical PreparedStatement each time (it need
not necessarily be the identical (as in ==) instance), and you still get the
same performance benefits?  Note: this is how you used
jdbc.PreparedStatements

If the answer is #1, then that means PreparedStatements are effectively
singletons and need to be multi-thread-safe and the app is responsible for
dealing with lifecycle issues presuming that the original Session dies or
whatever

I've looked at the datastax API  datastax API
http://www.datastax.com/drivers/java/1.0/apidocs/  , chatted on IRC, and
no answer.
This seems to me as a pretty fundamental question







--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Proper-Use-of-PreparedStatements-in-DataStax-driver-tp7590793.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Error during cleanup

2013-10-08 Thread Tyler Hobbs
Do you have a complete stacktrace available?


On Tue, Oct 8, 2013 at 2:08 AM, Sameer Farooqui sam...@blueplastic.comwrote:

 Hi,

 When running cleanup on a node with C* 2.0.1, I got the following error:

 cassandra01 - Error during cleanup: javax.management.MBeanException:
 java.util.concurrent.ExecutionException: java.lang.ClassCastException:
 org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot
 be cast to org.apache.cassandra.io.sstable.SSTableScanner


 However, cleanup appears to have worked, since the size of the data on
 that particular node has decreased.

 Is this an error to worry about?





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

2013-10-08 Thread Tyler Hobbs
SizeTieredCompactionStrategy only compacts sstables that are a similar size
(by default, they basically need to be within 50% of each other).  Perhaps
your first SSTable was very large or small compared to the others?


On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote:

 Hi,

 I have a fresh 1-node C* 2.0 install with a demo keyspace created with the
 SizeTiered compaction strategy.

 I've noticed that in the beginning this keyspace has just one SSTable:
 demodb-users-jb-1-Data.db

 But as I add more data to the table and do some flushes, the # of SSTables
 builds up. After I have a handful of SSTables, I trigger a flush using
 'nodetool flush demodb users', but then not ALL of the SSTables get
 compacted.

 I've noticed that the 1st SSTable remains the same and doesn't disappear
 after the compaction, but the latter SSTables do get compacted into one new
 Data file.

 Is there a reason why the first SSTable is special and it is not
 disappearing after compaction?

 Also, I think I noticed that if I wait a few days and run another
 compaction, then that 1st SSTable does not compacted (and it disappears).

 Can someone help explain why the 1st SSTable behaves this way?




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

2013-10-08 Thread Sameer Farooqui
Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables
are mismatched in size... but upon closer inspection, that doesn't appear
to be the case:

-rw-r--r-- 1 cassandra cassandra  227 Oct  7 23:26 demodb-users-jb-1-Data.db
-rw-r--r-- 1 cassandra cassandra  242 Oct  8 00:38 demodb-users-jb-6-Data.db


The two files look to be nearly the same size. There just appears to be
something special about that first SSTable and it not getting compacted.


On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote:

 SizeTieredCompactionStrategy only compacts sstables that are a similar
 size (by default, they basically need to be within 50% of each other).
 Perhaps your first SSTable was very large or small compared to the others?


 On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.comwrote:

 Hi,

 I have a fresh 1-node C* 2.0 install with a demo keyspace created with
 the SizeTiered compaction strategy.

 I've noticed that in the beginning this keyspace has just one SSTable:
 demodb-users-jb-1-Data.db

 But as I add more data to the table and do some flushes, the # of
 SSTables builds up. After I have a handful of SSTables, I trigger a flush
 using 'nodetool flush demodb users', but then not ALL of the SSTables get
 compacted.

 I've noticed that the 1st SSTable remains the same and doesn't disappear
 after the compaction, but the latter SSTables do get compacted into one new
 Data file.

 Is there a reason why the first SSTable is special and it is not
 disappearing after compaction?

 Also, I think I noticed that if I wait a few days and run another
 compaction, then that 1st SSTable does not compacted (and it disappears).

 Can someone help explain why the 1st SSTable behaves this way?




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Error during cleanup

2013-10-08 Thread Sameer Farooqui
No, but I may be able to get one for you if the issue is reproducible when
I trigger another cleanup.

I originally issued the cleanup on the node via OpsCenter Community Edition
3.2.2

If I issue another cleanup, can you give me the rough steps for how to get
the stacktrace?


On Tue, Oct 8, 2013 at 2:46 PM, Tyler Hobbs ty...@datastax.com wrote:

 Do you have a complete stacktrace available?


 On Tue, Oct 8, 2013 at 2:08 AM, Sameer Farooqui sam...@blueplastic.comwrote:

 Hi,

 When running cleanup on a node with C* 2.0.1, I got the following error:

 cassandra01 - Error during cleanup: javax.management.MBeanException:
 java.util.concurrent.ExecutionException: java.lang.ClassCastException:
 org.apache.cassandra.io.sstable.SSTableReader$EmptyCompactionScanner cannot
 be cast to org.apache.cassandra.io.sstable.SSTableScanner


 However, cleanup appears to have worked, since the size of the data on
 that particular node has decreased.

 Is this an error to worry about?





 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Error during cleanup

2013-10-08 Thread Tyler Hobbs
On Tue, Oct 8, 2013 at 2:02 PM, Sameer Farooqui sam...@blueplastic.comwrote:

 If I issue another cleanup, can you give me the rough steps for how to get
 the stacktrace?


I'm hoping it will show up in Cassandra's system.log, but since it's
triggered through JMX, it's possible that it will not.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Any suggestions about running Cassandra on Windows servers for production use?

2013-10-08 Thread Robert Coli
On Mon, Oct 7, 2013 at 5:35 AM, Vassilis Bekiaris 
bekiar...@iconplatforms.com wrote:

 we are planning a Cassandra 1.2 installation at a client site; the client
 will run operations themselves and based on their IT team's experience they
 are more inclined towards running Cassandra nodes on Windows servers,
 however given proper arguments they would also consider using linux
 servers. On the other hand, our team has experience running Cassandra on
 Linux, so we have no idea what we might face on Windows.


If you benchmark the two against each other, I would be shocked if the
Windows version is not significantly slower.

The differences are often at quite a low level, like for example the mmap
example, or the fact that Cassandra uses fadvise on Linux. Various
optimizations only apply to the Linux version; even running Cassandra on
Solaris would not make use of some of them. There are also supplementary
tools (Priam, tablesnap) which may or may not work under Windows. It would
be great if someone enumerated the in-code differences.. I have noted it
for my extensive TODO list with the optimism of a new week... :)

On the other hand, if you don't mind taking the performance hit and being
among a relatively small group of operators in Production on Windows, it
should work fine?

=Rob


Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

2013-10-08 Thread Tyler Hobbs
Well, 6 was created by the other sstables being compacted, correct?  If so,
they were probably quite a bit smaller (~25% of the size).  Once you have
two more sstables of roughly that size, they should be compacted
automatically.


On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.comwrote:

 Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables
 are mismatched in size... but upon closer inspection, that doesn't appear
 to be the case:

 -rw-r--r-- 1 cassandra cassandra  227 Oct  7 23:26
 demodb-users-jb-1-Data.db
 -rw-r--r-- 1 cassandra cassandra  242 Oct  8 00:38
 demodb-users-jb-6-Data.db


 The two files look to be nearly the same size. There just appears to be
 something special about that first SSTable and it not getting compacted.


 On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote:

 SizeTieredCompactionStrategy only compacts sstables that are a similar
 size (by default, they basically need to be within 50% of each other).
 Perhaps your first SSTable was very large or small compared to the others?


 On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui 
 sam...@blueplastic.comwrote:

 Hi,

 I have a fresh 1-node C* 2.0 install with a demo keyspace created with
 the SizeTiered compaction strategy.

 I've noticed that in the beginning this keyspace has just one SSTable:
 demodb-users-jb-1-Data.db

 But as I add more data to the table and do some flushes, the # of
 SSTables builds up. After I have a handful of SSTables, I trigger a flush
 using 'nodetool flush demodb users', but then not ALL of the SSTables get
 compacted.

 I've noticed that the 1st SSTable remains the same and doesn't disappear
 after the compaction, but the latter SSTables do get compacted into one new
 Data file.

 Is there a reason why the first SSTable is special and it is not
 disappearing after compaction?

 Also, I think I noticed that if I wait a few days and run another
 compaction, then that 1st SSTable does not compacted (and it disappears).

 Can someone help explain why the 1st SSTable behaves this way?




 --
 Tyler Hobbs
 DataStax http://datastax.com/





-- 
Tyler Hobbs
DataStax http://datastax.com/


Using cassandra-cli with Client-server encryption

2013-10-08 Thread Vivek Mishra
Hi,
I am trying to use cassandra-cli with client-server encryption enabled. But
somehow getting handshake failure error(given below):

org.apache.thrift.transport.TTransportException:
javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
at
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
at
org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at
org.apache.cassandra.thrift.Cassandra$Client.send_describe_cluster_name(Cassandra.java:1095)
at
org.apache.cassandra.thrift.Cassandra$Client.describe_cluster_name(Cassandra.java:1088)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:147)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:246)
Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert:
handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:1911)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1027)
at
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1262)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:680)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:85)
at
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)



I am trying to get it working on local machine.

-Vivek


Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

2013-10-08 Thread Sameer Farooqui
Hmm, good point. I'll test this out again and see the compaction behavior
is as expected given the relative sizes of the SSTables.




On Tue, Oct 8, 2013 at 3:06 PM, Tyler Hobbs ty...@datastax.com wrote:

 Well, 6 was created by the other sstables being compacted, correct?  If
 so, they were probably quite a bit smaller (~25% of the size).  Once you
 have two more sstables of roughly that size, they should be compacted
 automatically.


 On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.comwrote:

 Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables
 are mismatched in size... but upon closer inspection, that doesn't appear
 to be the case:

 -rw-r--r-- 1 cassandra cassandra  227 Oct  7 23:26
 demodb-users-jb-1-Data.db
 -rw-r--r-- 1 cassandra cassandra  242 Oct  8 00:38
 demodb-users-jb-6-Data.db


 The two files look to be nearly the same size. There just appears to be
 something special about that first SSTable and it not getting compacted.


 On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote:

 SizeTieredCompactionStrategy only compacts sstables that are a similar
 size (by default, they basically need to be within 50% of each other).
 Perhaps your first SSTable was very large or small compared to the others?


 On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui 
 sam...@blueplastic.comwrote:

 Hi,

 I have a fresh 1-node C* 2.0 install with a demo keyspace created with
 the SizeTiered compaction strategy.

 I've noticed that in the beginning this keyspace has just one SSTable:
 demodb-users-jb-1-Data.db

 But as I add more data to the table and do some flushes, the # of
 SSTables builds up. After I have a handful of SSTables, I trigger a flush
 using 'nodetool flush demodb users', but then not ALL of the SSTables get
 compacted.

 I've noticed that the 1st SSTable remains the same and doesn't
 disappear after the compaction, but the latter SSTables do get compacted
 into one new Data file.

 Is there a reason why the first SSTable is special and it is not
 disappearing after compaction?

 Also, I think I noticed that if I wait a few days and run another
 compaction, then that 1st SSTable does not compacted (and it disappears).

 Can someone help explain why the 1st SSTable behaves this way?




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/



ArrayIndexOutOfBoundsException in StorageService.extractExpireTime

2013-10-08 Thread Viliam Holub

Hi,

after upgrading our cluster from 1.1 to 1.2.10 I'm seeing this exception in
system.log (on all nodes):

ERROR [GossipStage:1] 2013-10-08 21:03:41,906 CassandraDaemon.java (line 185) 
Exception in thread Thread[GossipStage
:1,5,main]
java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1594)
at 
org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1550)
at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1174)
at 
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1887)
at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:844)
at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:922)
at 
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

I assume that's why some nodes display ring status incorrectly.
There's a ticket for it a month old and with no reaction.
https://issues.apache.org/jira/browse/CASSANDRA-6082

Wondering does someone know how to fix it?

Thanks,
Viliam



signature.asc
Description: Digital signature


cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-08 Thread John Lumby
I have been expermimenting with using hadoop for a map/reduce operation on 
cassandra,
outputting to the CqlOutputFormat.class.
I based my first program fairly closely on the famous WordCount example in 
examples/hadoop_cql3_word_count
except  ---  I set my output colfamily to have a bigint primary key :

CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
KEY (recordid))

and simply tried setting this key as one of the keys in the output 
    keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

but it always failed with a strange error :

java.io.IOException: InvalidRequestException(why:Key may not be empty)  
  

Question about read consistency level

2013-10-08 Thread graham sanderson
Apologies if this is an obvious question, I have looked but not seen too much 
(particularly about what exactly latest version means when there is no data 
on a node for a key - though I'd assume it has to be treated as unknown since 
you couldn't tell if the data had never been created or the tombstone had been 
cleaned up)

We are moving some stuff from prototype towards production. We have a new 
Cassandra 2.0 instance; 6 nodes.

I'm using Astyanax 1.56.43 from Java via thrift, though we see the same thing 
with CQL 3 from node client

1) So the problem here is likely caused by one of our nodes maybe being in a 
dead state but not recognized as such - that is something we need to figure 
out… any suggestions on determine root cause here would be a help; I only have 
access to OpCenter which tells me little (except this one node was growing in 
size, whilst others weren't, and NO keyspaces have replication factor 1, so 
that seems odd even if keys were skewed which they aren't). Anyway that isn't 
the main question; I can follow up with our ops guys when they are online

2) I saw the problem when I fixed our read code to use QUORUM consistency 
level not ONE (the Astyanax default); it only affects some keys. The problem is 
a timeout exception. I assume that it is not getting a response from this one 
node. (I have read that 2.0.2 may have something that would help this resolve 
itself quicker) - note I had also seen some slow schema resolution today also

Anyways, I can work around the problem with a ONE consistency level it seems, 
but I did have a more general question:

- Our writes are all idempotent in so far as keyspace/cf/(key/column) may be 
written more than once but only with the same value. We use QUORUM writes (will 
be LOCAL_QUORUM as soon as we configure server to be DC aware), and that is 
fine.

The point being, I wonder if there is something between ONE and QUORUM read 
consistency level that I could specify in my situation to return as soon as any 
node returns a non tombstone value?

Thanks,

Graham

P.S. If not has this been discussed and dismissed - it would seem like a common 
case that people write data only once, and because of upstream 
replay/transactional behavior may want a cassandra write to be an idempotent 
side effect.

smime.p7s
Description: S/MIME cryptographic signature