Re: Consistency Level One Question

2014-02-21 Thread Duncan Sands

Hi Graham,

On 21/02/14 07:54, graham sanderson wrote:

Note also; that reading at ONE there will be no read repair, since the 
coordinator does not know that another replica has stale data (remember at ONE, 
basically only one node is asked for the answer).


I don't think this is right.  My understanding is that while only one node will 
be sent a direct read request, all other replicas will (not on every query - it 
depends on the value of read_repair_chance) get a background read repair 
request.  You can test this experimentally using cqlsh and turning tracing on: 
issue a read request many times.  Most of the time you will see that the 
coordinator sends a message to one node, but from time to time (depending on 
read_repair_chance) you will see it sending messages to many nodes.


Best wishes, Duncan.



In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole 
update if that doesn’t work - stale data is OK if 1 node is down), and we read 
at LOCAL_QUORUM, but (because stale data is better than no data), we will fall 
back per read request to LOCAL_ONE if we detect that there were insufficient nodes 
- this lets us cope with 2 down nodes in a 3 replica environment (or more if the 
nodes are not consecutive in the ring).

On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote:


Hi Guys,

I wanted to get some clarification on what happens when you write and read at 
consistency level 1. Say I have a keyspace with replication factor of 3 and a 
table which will contain write-once/read-only wide rows. If I write at 
consistency level 1 and the write happens on node A and I read back at 
consistency level 1 from another node other than A, say B, will C* return “not 
found” or will it trigger a read-repair before responding? In addition, what’s 
the best consistency level for reading/writing write-once/read-only wide rows?

Thanks,

Drew







Re: Intermittent long application pauses on nodes

2014-02-21 Thread Joel Samuelsson
What happens if a ParNew is triggered while CMS is running? Will it wait
for the CMS to finish? If so, that would be the eplanation of our long
ParNew above.

Regards,
Joel


2014-02-20 16:29 GMT+01:00 Joel Samuelsson samuelsson.j...@gmail.com:

 Hi Frank,

 We got a (quite) long GC pause today on 2.0.5:
  INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line
 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is
 4253024256
  INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line
 116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used;
 max is 4253024256

 Unfortunately it's a production cluster so I have no additional GC-logging
 enabled. This may be an indication that upgrading is not the (complete)
 solution.

 Regards,
 Joel


 2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith 
 belliottsm...@datastax.com:

 Hi Ondrej,

 It's possible you were hit by the problems in this thread before, but it
 looks potentially like you may have other issues. Of course it may be that
 on G1 you have one issue and CMS another, but 27s is extreme even for G1,
 so it seems unlikely. If you're hitting these pause times in CMS and you
 get some more output from the safepoint tracing, please do contribute as I
 would love to get to the bottom of that, however is it possible you're
 experiencing paging activity? Have you made certain the VM memory is locked
 (and preferably that paging is entirely disabled, as the bloom filters and
 other memory won't be locked, although that shouldn't cause pauses during
 GC)

 Note that mmapped file accesses and other native work shouldn't in anyway
 inhibit GC activity or other safepoint pause times, unless there's a bug in
 the VM. These threads will simply enter a safepoint as they return to the
 VM execution context, and are considered safe for the duration they are
 outside.




 On 17 February 2014 12:30, Ondřej Černoš cern...@gmail.com wrote:

 Hi,

 we tried to switch to G1 because we observed this behaviour on CMS too
 (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS
 were not easily traceable - JVM stopped even without stop-the-world pause
 scheduled (defragmentation, remarking). We thought the go-to-safepoint
 waiting time might have been involved (we saw waiting for safepoint
 resolution) - especially because access to mmpaped files is not preemptive,
 afaik, but it doesn't explain tens of seconds waiting times, even slow IO
 should read our sstables into memory in much less time. We switched to G1
 out of desperation - and to try different code paths - not that we'd
 thought it was a great idea. So I think we were hit by the problem
 discussed in this thread, just the G1 report wasn't very clear, sorry.

 regards,
 ondrej



 On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Ondrej,

 It seems like your issue is much less difficult to diagnose: your
 collection times are long. At least, the pause you printed the time for is
 all attributable to the G1 pause.

 Note that G1 has not generally performed well with Cassandra in our
 testing. There are a number of changes going in soon that may change that,
 but for the time being it is advisable to stick with CMS. With tuning you
 can no doubt bring your pauses down considerably.


 On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote:

 Hi all,

 we are seeing the same kind of long pauses in Cassandra. We tried to
 switch CMS to G1 without positive result. The stress test is read heavy, 2
 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in
 latency on 99.99 percentil and higher, caused by threads being stopped in
 JVM.

 The GC in G1 looks like this:

 {Heap before GC invocations=4073 (full 1):
 garbage-first heap   total 8388608K, used 3602914K
 [0x0005f5c0, 0x0007f5c0, 0x0007f5c0)
  region size 4096K, 142 young (581632K), 11 survivors (45056K)
 compacting perm gen  total 28672K, used 27428K [0x0007f5c0,
 0x0007f780, 0x0008)
   the space 28672K,  95% used [0x0007f5c0, 0x0007f76c9108,
 0x0007f76c9200, 0x0007f780)
 No shared spaces configured.
 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation
 Pause) (young)
 Desired survivor size 37748736 bytes, new threshold 15 (max 15)
 - age   1:   17213632 bytes,   17213632 total
 - age   2:   19391208 bytes,   36604840 total
 , 0.1664300 secs]
   [Parallel Time: 163.9 ms, GC Workers: 2]
  [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max:
 222346218.3, Diff: 0.0]
  [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7,
 Sum: 13.7]
  [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum:
 42.6]
 [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22,
 Sum: 120]
  [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum:
 46.5]
  [Object Copy (ms): Min: 112.3, 

TSocket read 0 bytes cqlsh error

2014-02-21 Thread Kasper Middelboe Petersen
Hi,

I'm getting a TSocket read 0 bytes error in cqlsh when doing a SELECT *
FROM tbl.

Anyone else experienced this?

It's a single node cluster running locally. I've tried doing a nodetool
cleanup but that didn't solve the issue.

Version information:
 INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 487)
Cassandra version: 2.0.5
 INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 488) Thrift
API version: 19.39.0
 INFO [main] 2014-02-21 10:20:25,227 StorageService.java (line 489) CQL
supported versions: 2.0.0,3.1.4 (default: 3.1.4)


I get this error in the cassandra logs:

ERROR [Thrift:1] 2014-02-21 10:21:03,963 CustomTThreadPoolServer.java (line
212) Error occurred during processing of message.
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
at
org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:874)
at
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:854)
at
org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:222)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:202)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:172)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58)
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:212)
at
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)


Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn rkla...@gmail.com wrote:

 Hi Sylvain,

 I applied the patch to the cassandra-2.0 branch (this required some manual
 work since I could not figure out which commit it was supposed to apply
 for, and it did not apply to the head of cassandra-2.0).


Yeah, some commit yesterday made the patch not apply cleanly anymore. In
any case, It's not committed to the cassandra-2.0 branch and will be part
of 2.0.6.


 The benchmark now runs in pretty much identical time to the thrift based
 benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!


Glad that it helped.



 I still have some questions regarding the mapping. Please bear with me if
 these are stupid questions. I am quite new to Cassandra.

 The basic cassandra data model for a keyspace is something like this,
 right?

 SortedMapbyte[], SortedMapbyte[], PairLong, byte[]
  ^ row key. determines which server(s) the rest is stored
 on
  ^ column key
^ timestamp
 (latest one wins)
 ^
 value (can be size 0)


It's a reasonable way to think of how things are stored internally, yes.
Though as DuyHai mentioned, the first map is really sorting by token and in
general that means you use mostly the sorting of the second map concretely.



 So if I have a table like the one in my benchmark (using blobs)

 CREATE TABLE IF NOT EXISTS test.wide (
   time blob,
   name blob,
   value blob,
   PRIMARY KEY (time,name))
   WITH COMPACT STORAGE

 From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
 that

 - time maps to the row key and name maps to the column key without any
 overhead
 - value directly maps to value in the model above without any prefix

 is that correct, or is there some overhead involved in CQL over the raw
 model as described above? If so, where exactly?


That's correct.
For completeness sake, if you were to remove the COMPACT STORAGE, there
would be some overhead in how it maps to the underlying column key, but
that overhead would buy you much more flexibility in how you could evolve
this table schema (you could add more CQL columns later if needs be, have
collections or have static columns following CASSANDRA-6561 that comes in
2.0.6; none of which you can have with COMPACT STORAGE). Note that it's
perfectly fine to use COMPACT STORAGE if you know you don't and won't need
the additional flexibility, but I generally advise people to actually check
first that using COMPACT STORAGE does make a concrete and meaningful
difference for their use case (be careful with premature optimization
really). The difference in performance/storage space used is not always all
that noticeable in practice (note that I didn't said it's never
noticeable!) and is narrowing with Cassandra evolution (it's not impossible
at all that we will get to never noticeable someday, while COMPACT
STORAGE tables will never get the flexibility of normal tables because
there is backwards compatibility issues). It's also my experience that more
often that not (again, not always), flexibility turns out to be more
important that squeezing every bit of performance you can (if it comes at
the price of that flexibility that is) in the long run. Do what you want of
that advise :)

--
Sylvain



 kind regards and many thanks for your help,

 Rüdiger


 On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne sylv...@datastax.comwrote:




 On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.comwrote:


 I have cloned the cassandra repo, applied the patch, and built it. But
 when I want to run the bechmark I get an exception. See below. I tried with
 a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not
 make a difference.

 So currently I don't know how to give the patch a try. Any ideas?

 cheers,

 Rüdiger

 Exception in thread main java.lang.IllegalArgumentException:
 replicate_on_write is not a column defined in this metadata
 at
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
 at
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
 at com.datastax.driver.core.Row.getBool(Row.java:117)
 at
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474)
 at
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
 at
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
 at
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
 at
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
 at
 

Re: TSocket read 0 bytes cqlsh error

2014-02-21 Thread Kasper Middelboe Petersen
Looks like the problem is caused by:
https://issues.apache.org/jira/browse/CASSANDRA-5202



On Fri, Feb 21, 2014 at 10:26 AM, Kasper Middelboe Petersen 
kas...@sybogames.com wrote:

 Hi,

 I'm getting a TSocket read 0 bytes error in cqlsh when doing a SELECT *
 FROM tbl.

 Anyone else experienced this?

 It's a single node cluster running locally. I've tried doing a nodetool
 cleanup but that didn't solve the issue.

 Version information:
  INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 487)
 Cassandra version: 2.0.5
  INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 488) Thrift
 API version: 19.39.0
  INFO [main] 2014-02-21 10:20:25,227 StorageService.java (line 489) CQL
 supported versions: 2.0.0,3.1.4 (default: 3.1.4)


 I get this error in the cassandra logs:

 ERROR [Thrift:1] 2014-02-21 10:21:03,963 CustomTThreadPoolServer.java
 (line 212) Error occurred during processing of message.
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
  at
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
  at
 org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:874)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:854)
  at
 org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:222)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:202)
  at
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:172)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58)
  at
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
 at
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222)
  at
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:212)
 at
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958)
  at
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
 at
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)




Re: Consistency Level One Question

2014-02-21 Thread graham sanderson
My bad; should have checked the code:

/**
 * This function executes local and remote reads, and blocks for the 
results:
 *
 * 1. Get the replica locations, sorted by response time according to the 
snitch
 * 2. Send a data request to the closest replica, and digest requests to 
either
 *a) all the replicas, if read repair is enabled
 *b) the closest R-1 replicas, where R is the number required to 
satisfy the ConsistencyLevel
 * 3. Wait for a response from R replicas
 * 4. If the digests (if any) match the data return the data
 * 5. else carry out read repair by getting data from all the nodes.
 */

On Feb 21, 2014, at 3:10 AM, Duncan Sands duncan.sa...@gmail.com wrote:

 Hi Graham,
 
 On 21/02/14 07:54, graham sanderson wrote:
 Note also; that reading at ONE there will be no read repair, since the 
 coordinator does not know that another replica has stale data (remember at 
 ONE, basically only one node is asked for the answer).
 
 I don't think this is right.  My understanding is that while only one node 
 will be sent a direct read request, all other replicas will (not on every 
 query - it depends on the value of read_repair_chance) get a background read 
 repair request.  You can test this experimentally using cqlsh and turning 
 tracing on: issue a read request many times.  Most of the time you will see 
 that the coordinator sends a message to one node, but from time to time 
 (depending on read_repair_chance) you will see it sending messages to many 
 nodes.
 
 Best wishes, Duncan.
 
 
 In practice for our use cases, we always write at LOCAL_QUORUM (failing the 
 whole update if that doesn’t work - stale data is OK if 1 node is down), 
 and we read at LOCAL_QUORUM, but (because stale data is better than no 
 data), we will fall back per read request to LOCAL_ONE if we detect that 
 there were insufficient nodes - this lets us cope with 2 down nodes in a 3 
 replica environment (or more if the nodes are not consecutive in the ring).
 
 On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote:
 
 Hi Guys,
 
 I wanted to get some clarification on what happens when you write and read 
 at consistency level 1. Say I have a keyspace with replication factor of 3 
 and a table which will contain write-once/read-only wide rows. If I write 
 at consistency level 1 and the write happens on node A and I read back at 
 consistency level 1 from another node other than A, say B, will C* return 
 “not found” or will it trigger a read-repair before responding? In 
 addition, what’s the best consistency level for reading/writing 
 write-once/read-only wide rows?
 
 Thanks,
 
 Drew
 
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Edward Capriolo
The main issue is that cassandra has two of everything. Two access apis,
two meta data systems, and two groups of users.

Those groups of users using the original systems thrift, cfmetadata, and
following the advice of three years ago have been labled obsolete (did you
ever see that twighlight zone episode?).

If you suggest a thrift only feature get ready to fight. People seem
oblivious to the fact that you may have a 38 node cluster with 12 tb of
data under compact storage, and that you can't just snap your fingers and
adopt whatever new system to pack data that someone comes up with.

Earlier in the thread I detailed a potential way to store collection like
things in compact storage. You would just assume that with all the
collective brain power in the project, that somehow, some way collections
could make their way into compact storage. Or the new language would offer
similiar features regardless of storage chosen (say like innodb and
mariadb).

The shelf life of codd's normal form has been what? 30 or 40 years and
still going strong? Im always rather pisswd that 3 years after i start
using cassandra everything has changed, that im not the future, and that no
one is really interested in supporting anything i used the datastore for.


On Friday, February 21, 2014, Sylvain Lebresne sylv...@datastax.com wrote:
 On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn rkla...@gmail.com
wrote:

 Hi Sylvain,

 I applied the patch to the cassandra-2.0 branch (this required some
manual work since I could not figure out which commit it was supposed to
apply for, and it did not apply to the head of cassandra-2.0).

 Yeah, some commit yesterday made the patch not apply cleanly anymore. In
any case, It's not committed to the cassandra-2.0 branch and will be part
of 2.0.6.

 The benchmark now runs in pretty much identical time to the thrift based
benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work!

 Glad that it helped.


 I still have some questions regarding the mapping. Please bear with me
if these are stupid questions. I am quite new to Cassandra.

 The basic cassandra data model for a keyspace is something like this,
right?

 SortedMapbyte[], SortedMapbyte[], PairLong, byte[]
  ^ row key. determines which server(s) the rest is
stored on
  ^ column key
^
timestamp (latest one wins)

^ value (can be size 0)

 It's a reasonable way to think of how things are stored internally, yes.
Though as DuyHai mentioned, the first map is really sorting by token and in
general that means you use mostly the sorting of the second map concretely.


 So if I have a table like the one in my benchmark (using blobs)

 CREATE TABLE IF NOT EXISTS test.wide (
 time blob,
 name blob,
 value blob,
 PRIMARY KEY (time,name))
 WITH COMPACT STORAGE

 From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
that

 - time maps to the row key and name maps to the column key without any
overhead
 - value directly maps to value in the model above without any prefix

 is that correct, or is there some overhead involved in CQL over the raw
model as described above? If so, where exactly?

 That's correct.
 For completeness sake, if you were to remove the COMPACT STORAGE, there
would be some overhead in how it maps to the underlying column key, but
that overhead would buy you much more flexibility in how you could evolve
this table schema (you could add more CQL columns later if needs be, have
collections or have static columns following CASSANDRA-6561 that comes in
2.0.6; none of which you can have with COMPACT STORAGE). Note that it's
perfectly fine to use COMPACT STORAGE if you know you don't and won't need
the additional flexibility, but I generally advise people to actually check
first that using COMPACT STORAGE does make a concrete and meaningful
difference for their use case (be careful with premature optimization
really). The difference in performance/storage space used is not always all
that noticeable in practice (note that I didn't said it's never
noticeable!) and is narrowing with Cassandra evolution (it's not impossible
at all that we will get to never noticeable someday, while COMPACT
STORAGE tables will never get the flexibility of normal tables because
there is backwards compatibility issues). It's also my experience that more
often that not (again, not always), flexibility turns out to be more
important that squeezing every bit of performance you can (if it comes at
the price of that flexibility that is) in the long run. Do what you want of
that advise :)
 --
 Sylvain


 kind regards and many thanks for your help,

 Rüdiger


 On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne sylv...@datastax.com
wrote:



 On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.com
wrote:

 I have cloned the cassandra repo, applied the patch, and built it. But
when I want to run the bechmark 

Re: Consistency Level One Question

2014-02-21 Thread Drew Kutcharian
Thanks, this clears things up. 

 On Feb 21, 2014, at 6:47 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 When you write at one, as soon as one node acknowledges the write the ack is 
 returned to the client. This means if you quickly read from aome other node
 1)you may get the result because by the time the read is processed the data 
 may be on that node
 2)the node you read from may proxy the request to the node woth the data or 
 not
 3)you may get a column not found because the read might hit a node where the 
 data does not exist yet.
 
 Generally even at level one the replication is fast. I have done an 
 experiment on what you are asking. Write.one read from another as soon as 
 client gets  an ack. Most of the time the data is replicated by the time the 
 second requeat is received. However most of the time is not a guarentee. If 
 the nodes are geographically separate who is to say if the firat request and 
 the second route around the internet a different way and the second action 
 arrives on a node before the first. That is eventual consistency for you.
 
 On Friday, February 21, 2014, graham sanderson gra...@vast.com wrote:
  My bad; should have checked the code:
 
  /**
   * This function executes local and remote reads, and blocks for the 
  results:
   *
   * 1. Get the replica locations, sorted by response time according to 
  the snitch
   * 2. Send a data request to the closest replica, and digest requests 
  to either
   *a) all the replicas, if read repair is enabled
   *b) the closest R-1 replicas, where R is the number required to 
  satisfy the ConsistencyLevel
   * 3. Wait for a response from R replicas
   * 4. If the digests (if any) match the data return the data
   * 5. else carry out read repair by getting data from all the nodes.
   */
 
  On Feb 21, 2014, at 3:10 AM, Duncan Sands duncan.sa...@gmail.com wrote:
 
  Hi Graham,
 
  On 21/02/14 07:54, graham sanderson wrote:
  Note also; that reading at ONE there will be no read repair, since the 
  coordinator does not know that another replica has stale data (remember 
  at ONE, basically only one node is asked for the answer).
 
  I don't think this is right.  My understanding is that while only one node 
  will be sent a direct read request, all other replicas will (not on every 
  query - it depends on the value of read_repair_chance) get a background 
  read repair request.  You can test this experimentally using cqlsh and 
  turning tracing on: issue a read request many times.  Most of the time you 
  will see that the coordinator sends a message to one node, but from time 
  to time (depending on read_repair_chance) you will see it sending messages 
  to many nodes.
 
  Best wishes, Duncan.
 
 
  In practice for our use cases, we always write at LOCAL_QUORUM (failing 
  the whole update if that doesn’t work - stale data is OK if 1 node is 
  down), and we read at LOCAL_QUORUM, but (because stale data is better 
  than no data), we will fall back per read request to LOCAL_ONE if we 
  detect that there were insufficient nodes - this lets us cope with 2 down 
  nodes in a 3 replica environment (or more if the nodes are not 
  consecutive in the ring).
 
  On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote:
 
  Hi Guys,
 
  I wanted to get some clarification on what happens when you write and 
  read at consistency level 1. Say I have a keyspace with replication 
  factor of 3 and a table which will contain write-once/read-only wide 
  rows. If I write at consistency level 1 and the write happens on node A 
  and I read back at consistency level 1 from another node other than A, 
  say B, will C* return “not found” or will it trigger a read-repair 
  before responding? In addition, what’s the best consistency level for 
  reading/writing write-once/read-only wide rows?
 
  Thanks,
 
  Drew
 
 
 
 
 
 
 -- 
 Sorry this was sent from mobile. Will do less grammar and spell check than 
 usual.


Re: How do you remote backup your cassandra nodes ?

2014-02-21 Thread user 01
I'm wanting to back up my data to amazon S3. Can anyone please tell about
which directories should I copy to the remote location for backup so as to
restore the entire Cassandra data in the event of any failures?


On Fri, Feb 21, 2014 at 1:43 AM, user 01 user...@gmail.com wrote:

 What is your strategy/tools set to backup your Cassandra nodes, apart from
 from cluster replication/ snapshots within cluster?



Re: How do you remote backup your cassandra nodes ?

2014-02-21 Thread Colin Blower
You might want to use the Priam tool for backups.
https://github.com/Netflix/Priam

If you don't want to use Priam, you should read this Datastax entry on
backup and restore.
http://www.datastax.com/docs/1.0/operations/backup_restore

On 02/21/2014 11:19 AM, user 01 wrote:
 I'm wanting to back up my data to amazon S3. Can anyone please tell
 about which directories should I copy to the remote location for
 backup so as to restore the entire Cassandra data in the event of any
 failures?


 On Fri, Feb 21, 2014 at 1:43 AM, user 01 user...@gmail.com
 mailto:user...@gmail.com wrote:

 What is your strategy/tools set to backup your Cassandra nodes,
 apart from from cluster replication/ snapshots within cluster?



-- 
*Colin Blower*
/Software Engineer/
Barracuda Networks Inc.

===

Find out how eSigning generates significant financial benefit.
Read the Barracuda SignNow ROI whitepaper at 
https://signnow.com/l/business/esignature_roi



Re: How do you remote backup your cassandra nodes ?

2014-02-21 Thread Robert Coli
On Thu, Feb 20, 2014 at 12:13 PM, user 01 user...@gmail.com wrote:

 What is your strategy/tools set to backup your Cassandra nodes, apart from
 from cluster replication/ snapshots within cluster?


https://github.com/synack/tablesnap

=Rob


Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Yogi Nerella
Sylvain,

I am trying ccm to install and it does from source directory, I have tried
2.0.4/3/2/1 and 1.2.15, all of them are reporting the same failure after
127 records inserted.

I am using 1.56.34 and 1.56.38 client both reports the same issue.

Is something wrong with the client or the server, none of the server logs
show any error.

Thanks,
Yogi


On Wed, Feb 19, 2014 at 11:36 PM, Sylvain Lebresne sylv...@datastax.comwrote:




 On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.com wrote:


 I have cloned the cassandra repo, applied the patch, and built it. But
 when I want to run the bechmark I get an exception. See below. I tried with
 a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not
 make a difference.

 So currently I don't know how to give the patch a try. Any ideas?

 cheers,

 Rüdiger

 Exception in thread main java.lang.IllegalArgumentException:
 replicate_on_write is not a column defined in this metadata
 at
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
 at
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
 at com.datastax.driver.core.Row.getBool(Row.java:117)
 at
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474)
 at
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
 at
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
 at
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
 at
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
 at
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
 at
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
 at
 com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
 at
 com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
 at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
 at
 cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
 at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
 at
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
 at scala.App$$anonfun$main$1.apply(App.scala:71)
 at scala.App$$anonfun$main$1.apply(App.scala:71)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
 at scala.App$class.main(App.scala:71)
 at
 cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
 at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)


 I believe you've tried the cassandra trunk branch? trunk is basically the
 future Cassandra 2.1 and the driver is currently unhappy because the
 replicate_on_write option has been removed in that version. I'm supposed to
 have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
 also using a slightly old version of the driver sources in there? Or maybe
 I've screwed up my fix, I'll double check. But anyway, it would be overall
 simpler to test with the cassandra-2.0 branch of Cassandra, with which you
 shouldn't run into that.

 --
 Sylvain



Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Yogi Nerella
I am using CCM to install the servers, it is bringing in the source code,
is there any option for CCM which I can set only to download the binary,
just to make sure it is not bringing in the working copy of the code.

I am using the following statements to create Keyspace and table definition.

 create keyspace test1 with replication = { 'class':'SimpleStrategy',
'replication_factor':1};

 CREATE TABLE IF NOT EXISTS wide (
  time varchar,
  name varchar,
  value varchar,
  PRIMARY KEY (time,name))
  WITH COMPACT STORAGE;


On Fri, Feb 21, 2014 at 11:47 AM, Yogi Nerella ynerella...@gmail.comwrote:

 Sylvain,

 I am trying ccm to install and it does from source directory, I have tried
 2.0.4/3/2/1 and 1.2.15, all of them are reporting the same failure after
 127 records inserted.

 I am using 1.56.34 and 1.56.38 client both reports the same issue.

 Is something wrong with the client or the server, none of the server logs
 show any error.

 Thanks,
 Yogi


 On Wed, Feb 19, 2014 at 11:36 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:




 On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.comwrote:


 I have cloned the cassandra repo, applied the patch, and built it. But
 when I want to run the bechmark I get an exception. See below. I tried with
 a non-managed dependency to
 cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
 compiled from source because I read that that might help. But that did not
 make a difference.

 So currently I don't know how to give the patch a try. Any ideas?

 cheers,

 Rüdiger

 Exception in thread main java.lang.IllegalArgumentException:
 replicate_on_write is not a column defined in this metadata
 at
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
 at
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
 at com.datastax.driver.core.Row.getBool(Row.java:117)
 at
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474)
 at
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
 at
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
 at
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
 at
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
 at
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
 at
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
 at
 com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
 at
 com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
 at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
 at
 cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
 at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
 at
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
 at scala.App$$anonfun$main$1.apply(App.scala:71)
 at scala.App$$anonfun$main$1.apply(App.scala:71)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
 at scala.App$class.main(App.scala:71)
 at
 cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
 at
 cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)


 I believe you've tried the cassandra trunk branch? trunk is basically the
 future Cassandra 2.1 and the driver is currently unhappy because the
 replicate_on_write option has been removed in that version. I'm supposed to
 have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
 also using a slightly old version of the driver sources in there? Or maybe
 I've screwed up my fix, I'll double check. But anyway, it would be overall
 simpler to test with the cassandra-2.0 branch of Cassandra, with which you
 shouldn't run into that.

 --
 Sylvain





abusing cassandra's multi DC abilities

2014-02-21 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.

Full Story:
We're planning on adding data centers throughout the US.  Our platform is
used for business communications.  Each DC currently utilizes elastic
search and redis.  A message can be sent from one user to another, and the
intent is that it would be seen in near-real-time.  This means that 2
people may be using different data centers, and the messages need to
propagate from one to the other.

On the plus side, we know we get this with Cassandra (fist pump) but the
other pieces, not so much.  Even if they did work, there's all sorts of
race conditions that could pop up from having different pieces of our
architecture communicating over different channels.  From this, we've
arrived at the idea that since Cassandra is the authoritative data source,
we might be able to trigger events in DC2 based on activity coming through
either the commit log or some other means.  One idea was to use a CF with a
low gc time as a means of transporting messages between DCs, and watching
the commit logs for deletes to that CF in order to know when we need to do
things like reindex a document (or a new document), bust cache, etc.
 Facebook did something similar with their modifications to MySQL to
include cache keys in the replication log.

Assuming this is sane, I'd want to avoid having the same event register on
3 servers, thus registering 3 items in the queue when only one should be
there.  So, for any piece of data replicated from the other DC, I'd need a
way to determine if it was supposed to actually trigger the event or not.
 (Maybe it looks at the token and determines if the current server falls in
the token range?)  Or is there a better way?

So, my questions to all ye Cassandra users:

1. Is this is even sane?
2. Is anyone doing it?


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: How do you remote backup your cassandra nodes ?

2014-02-21 Thread user 01
Thanks for the links.

@Colin: The Datastax doc about backup/restore (the one you linked to) is
more about onsite backups, but not for remote backups. It did not described
about what directories should I copy for backup to a remote location. After
taking the snapshots should I copy all the snapshots directories within the
tables directories to the remote location or do I need to copy the entire
data directory  commitlog directory.

Another way rather than snapshot, could be to flush all the keyspaces, CFs
 then copy the data  commitlog directory to remote location, isn't it ?

What directories/files needs to be copied for a remote location backup ?



On Sat, Feb 22, 2014 at 1:12 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Feb 20, 2014 at 12:13 PM, user 01 user...@gmail.com wrote:

 What is your strategy/tools set to backup your Cassandra nodes, apart
 from from cluster replication/ snapshots within cluster?


 https://github.com/synack/tablesnap

 =Rob




Update multiple rows in a CQL lightweight transaction

2014-02-21 Thread Clint Kelly
Folks,

Does anyone know how I can modify multiple rows at once in a
lightweight transaction in CQL3?

I saw the following ticket:

https://issues.apache.org/jira/browse/CASSANDRA-5633

but it was not obvious to me from the comments how (or whether) this
got resolved.  I also couldn't find anything in the DataStax
documentation about how to perform these operations.

I'm in particular interested in how to perform a compare-and-set
operation that modifies multiple rows (with the same partition key)
using the DataStax Java driver.

Thanks!

Best regards,
Clint


cell-level security for cassandra ?

2014-02-21 Thread Frank Hsueh
has there been any thought about adding cell-level security to Cassandra ?

something similar to:

http://accumulo.apache.org/1.5/accumulo_user_manual.html#_security

?


-- 
Frank Hsueh | frank.hs...@gmail.com


Fwd: Delivery Status Notification (Failure)

2014-02-21 Thread Spencer Brown
I'm trying to get CQL going for my CentOS 5 cassandra PHP platform.  I've
installed thrift, but when I try to make cassandra-pdo or YACassandraPDO
for that matter, none of the tests pass.  And when I install it with PHP,
phpinfo still doesn't show it loading and it doesn't work.

Any ideas would be appreciated.  There are pretty good instructions here -
https://code.google.com/a/apache-extras.org/p/cassandra-pdo/ - for other
platforms.  But I can't find anything devoted to CentOS.

Spencer


List support in Net::Async::CassandraCQL ?

2014-02-21 Thread Jacob Rhoden
This perl library has been extremely useful for scripting up data migrations. I 
wonder if anyone knows of the easiest way to use lists with this driver? 
Throwing a perl array in as a parameter doesn’t work as is:

my $q = $cass-prepare(update contact set name=?, address=? where 
uuid=?)-get;
push @f, $q-execute([$name, @address, $uuid]);
Future-needs_all( @f )-get;

Returns the following:

Cannot encode address: not an ARRAY at 
/usr/local/share/perl/5.14.2/Net/Async/CassandraCQL/Query.pm line 182

In the mean time I could resort to inserting one list item at a time, but 
surely there is a nicer way (:

Thanks as always,
Jacob