new question ;-) // RE: understanding batch atomicity
Thanks DuyHai ! Does anyone know if BATCH provides atomicity for all mutations of a given partition key for a __single__ table ? Or if BATCH provides atomicity for all mutations of a given partition key for __ALL__ mutated tables into the BATCH ? That is, in case of : BEGIN BATCH Update table_1 where PartitionKey_table_1 = 1 … => (A) mutation Update table_2 where PartitionKey_table_2 = 1 … => (B) mutation END BATCH Here, both mutations occur for the same PartitionKey = 1 => are mutations (A) & (B) done in an atomic way (all or nothing) ? Thanks. Dominique [@@ THALES GROUP INTERNAL @@] De : DuyHai Doan [mailto:doanduy...@gmail.com] Envoyé : vendredi 29 septembre 2017 17:10 À : user Objet : Re: understanding batch atomicity All updates here means all mutations == INSERT/UPDATE or DELETE On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique <dominique.dev...@thalesgroup.com<mailto:dominique.dev...@thalesgroup.com>> wrote: Hi, About BATCH, the Apache doc https://cassandra.apache.org/doc/latest/cql/dml.html?highlight=atomicity says : “The BATCH statement group multiple modification statements (insertions/updates and deletions) into a single statement. It serves several purposes: ... All updates in a BATCH belonging to a given partition key are performed in isolation” Is “All updates” meaning equivalent to “All modifications (whatever it’s sources: INSERT or UPDATE statements)” ? Or, is “updates” meaning partition-level isolation only for UPDATE statements into the batch (w/o taking into isolation the INSERT other statements into the batch) ? Thanks Regards Dominique
understanding batch atomicity
Hi, About BATCH, the Apache doc https://cassandra.apache.org/doc/latest/cql/dml.html?highlight=atomicity says : "The BATCH statement group multiple modification statements (insertions/updates and deletions) into a single statement. It serves several purposes: ... All updates in a BATCH belonging to a given partition key are performed in isolation" Is "All updates" meaning equivalent to "All modifications (whatever it's sources: INSERT or UPDATE statements)" ? Or, is "updates" meaning partition-level isolation only for UPDATE statements into the batch (w/o taking into isolation the INSERT other statements into the batch) ? Thanks Regards Dominique
RE: Cassandra CPU perfomance
Hi, A hint : depending on your data set size + your request rate per second, 8 GB of RAM may be too low. And then, CPU might be high due to too frequent GC. More RAM may bring: · More space for OS FS to cache the SSTable files in memory. · A greater heap size, and then, less frequent object promotion (young=>old space) and then, less major GC. Dominique [@@ THALES GROUP INTERNAL @@] De : D. Salvatore [mailto:dd.salvat...@gmail.com] Envoyé : mercredi 4 janvier 2017 16:28 À : user@cassandra.apache.org Objet : Cassandra CPU perfomance Hi, I deployed a Cassandra 2.2 ring composed by 4 nodes in the cloud with 8 vCPU and 8GB of ram. I am running some tests now with cassandra-stress and YCSB tools to test its performance. I am mainly interested in read requests with a small amount of write requests (95%/5%). Running the experiments, I noticed that even setting a high number of threads (or clients) the CPU (and disk) does not saturate, but still always around the 60% of utilisation. I am trying to figure out where is the bottleneck in my system. From the hardware point of view it seems all ok to me. I also looked into the Cassandra configuration file to see if there are some tuning parameters to increase the system throughput. I increase the value of concurrent_read/write parameter, but it doesn't increase the performance. The log file also does not contain any warning. What it could be that is limiting my system? Thanks
RE: quick questions
Ø I keep hearing that the minimum number of Cassandra nodes required to achieve Quorum consensus is 4 I wonder why not 3? In fact, many container deployments by default seem to deploy 4 nodes. Can anyone shine some light on this? I think it may be due to the following (note : I am assuming, here, a “vnode” cluster) a)When using 3 nodes, and QUORUM, the cluster can tolerate the loss of a node, but in that case, each of the remaining nodes will have a +50% workload b)When using 4 nodes, in case of the same loss, each of the remaining nodes will have (approximately) a +33% workload Option (a) will impact more the cluster stability than (b). Dominique [@@ THALES GROUP INTERNAL @@] De : Kant Kodali [mailto:k...@peernova.com] Envoyé : samedi 17 décembre 2016 22:21 À : user@cassandra.apache.org Objet : quick questions I keep hearing that the minimum number of Cassandra nodes required to achieve Quorum consensus is 4 I wonder why not 3? In fact, many container deployments by default seem to deploy 4 nodes. Can anyone shine some light on this? What happens if I have 3 nodes and replication factor of 3 and consistency level: quorum? I should be able to achieve quorum level consensus right. If Total node = 3, RF=2 and consistency level = Quorum. Then I understand the quorum level consensus is not possible because the number of replica nodes here are 2. This also brings up another question does number of replica nodes always have to be an odd number to achieve quorum level consensus? If so, what happens when a replica node goes down ? it would still serve the requests but the quorum level consensus is not possible? Thanks kant
RE: Cassandra runing on top of NAS (RAIN storage) !?? anyone ?
> Don't do it [C* on NAS storage] Yes, I know ;-) it’s an anti-pattern. Let’s me give more info. We expect moving from a high-end SAN (<= yes, I know !) to local hard drives. Due to lack time (I won’t dig into it here), we expect in fact moving like this: SAN => RAIN => local HDD (for storage) So, the questions: - for C* storage point of view, is RAIN performance similar to a SAN ? - or, is C* on RAIN much, much worse than C* on SAN ? Any return/experience will be appreciated. Thanks. Dominique De : Jonathan Haddad [mailto:j...@jonhaddad.com] Envoyé : vendredi 4 mars 2016 17:56 À : user@cassandra.apache.org Objet : Re: Cassandra runing on top of NAS (RAIN storage) !?? anyone ? Don't do it On Fri, Mar 4, 2016 at 8:39 AM DE VITO Dominique <dominique.dev...@thalesgroup.com<mailto:dominique.dev...@thalesgroup.com>> wrote: Hi, Is there any info about running C* on top of a NAS storage, well, a RAIN storage (to be precise) in fact ? I expect C* to run on top of a RAIN like on top of a high-end SAN: that is, with a drop (-50%) in performance. Any return available ? Thanks. Regards, Dominique
Cassandra runing on top of NAS (RAIN storage) !?? anyone ?
Hi, Is there any info about running C* on top of a NAS storage, well, a RAIN storage (to be precise) in fact ? I expect C* to run on top of a RAIN like on top of a high-end SAN: that is, with a drop (-50%) in performance. Any return available ? Thanks. Regards, Dominique
RE: Seed gossip version error
Hi Amlan, We have the same pb with Cassandra 2.1.5. I have no hint (yet) to follow. Did you found the root of this pb ? Thanks. Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : Amlan Roy [mailto:amlan@cleartrip.com] Envoyé : mercredi 1 juillet 2015 12:46 À : user@cassandra.apache.org Objet : Seed gossip version error Hi, I have a running cluster running with version 2.1.7. Two of the machines went down and they are not joining the cluster even after restart. I see the following WARN message in system.log in all the nodes: system.log:WARN [MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32http://MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32] 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip version is -2147483648; will not connect with that version Please let me know if you have faced the same problem. Regards, Amlan
RE: Seed gossip version error
Thanks for your reply. Yes, I am sure all nodes are running the same version. On second thoughts, I think my gossip pb is due to intense GC activities, leading to be even not able to do a gossip handshake ! Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : Carlos Rolo [mailto:r...@pythian.com] Envoyé : mardi 21 juillet 2015 18:33 À : user@cassandra.apache.org Objet : Re: Seed gossip version error That error should only occur when you have a mismatch between the Seed version and the new node version. Are you sure all your nodes are running in the same version? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.comhttp://www.pythian.com/ On Tue, Jul 21, 2015 at 5:37 PM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi Amlan, We have the same pb with Cassandra 2.1.5. I have no hint (yet) to follow. Did you found the root of this pb ? Thanks. Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : Amlan Roy [mailto:amlan@cleartrip.commailto:amlan@cleartrip.com] Envoyé : mercredi 1 juillet 2015 12:46 À : user@cassandra.apache.orgmailto:user@cassandra.apache.org Objet : Seed gossip version error Hi, I have a running cluster running with version 2.1.7. Two of the machines went down and they are not joining the cluster even after restart. I see the following WARN message in system.log in all the nodes: system.log:WARN [MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32http://MessagingService-Outgoing-cassandra2.cleartrip.com/172.18.3.32] 2015-07-01 13:00:41,878 OutboundTcpConnection.java:414 - Seed gossip version is -2147483648; will not connect with that version Please let me know if you have faced the same problem. Regards, Amlan --
RE: Cassandra Metrics
Hi, One valuable (IMHO) entry point is : « Guide to Cassandra Thread Pools » http://blackbird.io/guide-to-cassandra-thread-pools Take a look. Regards, Dominique De : pushdlim...@gmail.com [mailto:pushdlim...@gmail.com] De la part de Saurabh Chandolia Envoyé : vendredi 19 juin 2015 11:42 À : user@cassandra.apache.org Objet : Cassandra Metrics Hi, I have recently started using Cassandra. As of now, I am only using cfstats and cfhistograms to monitor individual CF stats. What all cassandra metrics should I watch for stability and performance? Are there any tools to do the same? Is there any performance overhead if I start monitoring too many metrics? Thanks - Saurabh
RE: Log Slow Queries
Hi Carlos, Different possibilities (to log slow queries). 1) A probabilistic way to catch slow queries (probabilistic, but with detailed info) = look for “nodetool settraceprobability” like in http://www.datastax.com/dev/blog/advanced-request-tracing-in-cassandra-1-2 2) Catch slow queries in the driver (it’s a recent feature, available for newest drivers only) = see http://datastax.github.io/java-driver/2.0.10/features/logging/#logging-query-latencies 3) Catch slow queries on server-side (but only with C* 2.1) = see slides 15-17 Lesser Known Features of Cassandra 2.1 http://fr.slideshare.net/planetcassandra/cassandra-summit-2014-lesser-known-features-of-cassandra-21 On our side, we are more keen to use (2) – which has the best ROI (IMHO). Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : Carlos Alonso [mailto:i...@mrcalonso.com] Envoyé : jeudi 18 juin 2015 12:33 À : user@cassandra.apache.org Objet : Log Slow Queries Hi guys. I'm facing slow read requests from time to time, I've spotted the keyspace/cf where this is happening but I can't see anything obvious (single partition slice query, no tombstones, ...) anything else where to look at? I'd like to have the slow queries logged to either log or saved to a particular column family to analyse them later. I've googled about this and the only 'easy' solution available out there seems to be DataStax Enterprise. What are you guys using? Thanks, Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso
is Thrift support, from Cassandra, really mandatory for OpsCenter monitoring ?
Hi, While reading the OpsCenter 5.1 docs, it looks like OpsCenter can't work if Cassandra does not provide a Thrift interface (see [1] below). Is it really the case ? At first sight, it sounded weird to me, as CQL 3 is provided for months. Just to know, is a OpsCenter future version, not relying on a mandatory Thrift interface, on the road ? Thanks. Regards, Dominique [1] in the OpsCenter 5.1 guide : *** Modifying how OpsCenter connects to clusters Cluster Connection settings define how OpsCenter connects to a cluster. About this task The Connection settings for a cluster define how OpsCenter connects to the cluster. For example, if you've enabled authentication or encryption on a cluster, you'll need to specify that information. Procedure 1. Select the cluster you want to edit from the Cluster menu. 2. Click Settings Cluster Connections. The Edit Cluster dialog appears. 3. Change the IP addresses of cluster nodes. 4. Change JMX and Thrift listen port numbers. 5. Edit the user credentials if the JMX or Thrift ports require authentication.
any nodetool-like showparameters to show loaded cassandra.yaml parameters ?
Hi, I have not seen any available cmd like nodetool showparameters to show loaded cassandra.yaml parameters of one node (to display them remotely, or to check if loaded parameters are the ones of the cassandra.yaml). Does anyone know if there is a cmd to display those parameters (I don't think there is one) ? Thanks. Regards, Dominique
RE: vnode and NetworkTopologyStrategy: not playing well together ?
The discussion about racks NTS is also mentioned in this recent article : planetcassandra.org/multi-data-center-replication-in-nosql-databases/ The last section may be of interest for you Thanks DuyHai. Note that this section is also part of C* anti-patterns http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html But I think it's missing some advice for vnodes (something like due to randomly-generated tokens, define one single rack when using vnodes ??). D. [@@ THALES GROUP INTERNAL @@] De : DuyHai Doan [mailto:doanduy...@gmail.com] Envoyé : mardi 5 août 2014 20:07 À : user@cassandra.apache.org Objet : RE: vnode and NetworkTopologyStrategy: not playing well together ? The discussion about racks NTS is also mentioned in this recent article : planetcassandra.org/multi-data-center-replication-in-nosql-databases/http://planetcassandra.org/multi-data-center-replication-in-nosql-databases/ The last section may be of interest for you Le 5 août 2014 18:14, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com a écrit : Jonathan wrote: Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. Storage/load distro = function(1st replica placement, other replica placement) vnode solves the balancing pb for 1st replica placement // so, yes, I agree with you, but for 1st replica placement only But NetworkTopologyStrategy (NTS) influences other (2+) replica placement = as NTS best behavior relies on token distro, and you have no control on tokens with vnodes, the best option I see with **vnode** is to use only one rack with NTS. Dominique -Message d'origine- De : jonathan.had...@gmail.commailto:jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.commailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 18:04 À : user@cassandra.apache.orgmailto:user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.commailto:jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.commailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.orgmailto:user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec ture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring
RE: vnode and NetworkTopologyStrategy: not playing well together ?
First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
RE: vnode and NetworkTopologyStrategy: not playing well together ?
Jonathan wrote: Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. Storage/load distro = function(1st replica placement, other replica placement) vnode solves the balancing pb for 1st replica placement // so, yes, I agree with you, but for 1st replica placement only But NetworkTopologyStrategy (NTS) influences other (2+) replica placement = as NTS best behavior relies on token distro, and you have no control on tokens with vnodes, the best option I see with **vnode** is to use only one rack with NTS. Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 18:04 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec ture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
RE: question about commitlog segments and memlocking
Robert, Thanks for your explanation! Regards, Dominique De : Robert Coli [mailto:rc...@eventbrite.com] Envoyé : vendredi 1 août 2014 19:50 À : user@cassandra.apache.org Objet : Re: question about commitlog segments and memlocking On Fri, Aug 1, 2014 at 2:53 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: The instruction « CLibrary.tryMlockall(); » is called at the very beginning of the setup() Cassandra method. So, the heap space is memlocked in memory (if OS rights are set). “mlockall()” is called with “MCL_CURRENT” : “MCL_CURRENT Lock all pages currently mapped into the process's address space.” So, AFAIU(nderstand), the commitlog segments (or other off-heap structures) are NOT memlocked, and may be swapped. Is it also your understanding ? If true, why not using “mlockall(MCL_FUTURE)” instead, or calling mlocka() after commitlog segments allocation ? The intent of the memlock patch at the time it was contributed was to ensure that stuff in the heap was memlocked, because Cassandra didn't do much off-heap allocation at the time. Your understanding is correct that off-heap allocation generally is not memlocked, though I'm not sure if this is on purpose or not. I have personally had some concern regarding the swapping of off-heap memory structures in Cassandra, though best practice is to run Cassandra on nodes without swap defined. In trunk, there is a bit more reporting of off-heap allocation, so at least you can monitor it via JMX. That said, your question is really more appropriate for the cassandra-dev mailing list. Someone who actually knows if there's a rationale for this decision may or may not reply here. :) =Rob
question about commitlog segments and memlocking
Hi, The instruction CLibrary.tryMlockall(); is called at the very beginning of the setup() Cassandra method. So, the heap space is memlocked in memory (if OS rights are set). mlockall() is called with MCL_CURRENT : MCL_CURRENT Lock all pages currently mapped into the process's address space. So, AFAIU(nderstand), the commitlog segments (or other off-heap structures) are NOT memlocked, and may be swapped. Is it also your understanding ? If true, why not using mlockall(MCL_FUTURE) instead, or calling mlocka() after commitlog segments allocation ? Thanks. Regards, Dominique
difference between AntiEntropySessions and AntiEntropyStage ?
Hi, Nodetool tpstats gives 2 lines for anti-entropy: one for AntiEntropySessions and one for AntiEntropyStage. What is the difference ? a) Is AntiEntropySessions for counting repairs on a node acting as a primary node (the target node for repair) ? And is AntiEntropyStage for counting repair tasks on a node participating as a secondary node (not the target node for repair) ? b) Or is it something different ? And then, what is the meaning of these two counter families ? Thanks. Regards, Dominique
RE: Which hector version is suitable for cassandra 2.0.6 ?
Hi, -Message d'origine- De : ssiv...@gmail.com [mailto:ssiv...@gmail.com] Envoyé : jeudi 27 mars 2014 10:41 À : user@cassandra.apache.org Objet : Re: Which hector version is suitable for cassandra 2.0.6 ? On 03/27/2014 12:23 PM, user 01 wrote: Btw both Hector Datastax java driver are maintained by Datastax, both for java, this speaks for itself ! I'm not sure about the first statement. What do you mean at the second part of the sentence? They are Java-based, but has different API (and I find DS and Astyanax API quite more convenient that the Hector's one). They are also can be fine-grained configured. Astyanax performance is about the same to DS (astx now about 5-10% faster), what I cannot say about Hector. And the main difference is that the DS supports C* v2 with cql3, prepared statements, its navite binary protocol and other features. For example, using Astyanax versus DS on C* v2 shows unstable results under the high-load. Which one shows unstable results under the high-load ? Astyanax ? DS ? Thanks. Dominique Also, CQL is now deprecated and in the future the move is towards CQL3 and thus the DataStax driver recommended for future development. Astyanax working on the facade API so, I guess, it will be possible to change the raw level driver in some cases. -- Thanks, Serj
sending notifications through data replication on remote clusters
Hi, I have the following use case: If I update a data on DC1, I just want apps connected-first to DC2 to be informed when this data is available on DC2 after replication. When using Thrift, one way could be to modify CassandraServer class, to send notification to apps according to data coming in into the coordinator node of DC2. Is it common (~ the way to do it) ? Is there another way to do so ? When using CQL, is there a precise src code place to modify for the same purpose ? Thanks. Regards, Dominique
RE: sending notifications through data replication on remote clusters
On 03/10/2014 07:49 AM, DE VITO Dominique wrote: If I update a data on DC1, I just want apps connected-first to DC2 to be informed when this data is available on DC2 after replication. If I run a SELECT, I'm going to receive the latest data per the read conditions (ONE, TWO, QUORUM), regardless of location of the client connection. If using network aware topology, you'll get the most current data in that DC. When using Thrift, one way could be to modify CassandraServer class, to send notification to apps according to data coming in into the coordinator node of DC2. Is it common (~ the way to do it) ? Is there another way to do so ? When using CQL, is there a precise src code place to modify for the same purpose ? Notifying connected clients about random INSERT or UPDATE statements that ran somewhere seems to be far, far outside the scope of storing data. Just configure your client to SELECT in the manner that you need. I may not fully understand your problem and could be simplifying things in my head, so feel free to expand. -- Michael First of all, thanks for you answer and your attention. I know about SELECT. The idea, here, is to avoid doing POLLING regularly, as it could be easily a performance nightmare. The idea is to replace POLLING with PUSH, just like in many cases like SEDA architecture, or CQRS architecture, or continuous querying with some data stores. So, following this PUSH idea, it would be nice to inform apps connected to a preferred DC that some new data have been replicated, and is now available. I hope it's clearer. Dominique
about trigger execution ??? // RE: sending notifications through data replication on remote clusters
You should be able to achieve what you're looking for with a trigger vs. a modification to the core of Cassandra. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support Well, good point. It leads to the question: (a) are triggers executed on all (local+remote) coordinator nodes (and then, N DC = N coordinator nodes = N executions of the triggers) ? (b) Or are triggers executed only on the first coordinator node, and not the (next/remote DC) coordinator nodes ? My opinion is (b), and in that case, triggers won't do the job. (b) would make sense, because the first coordinator node would augment original row mutations and propagate them towards other coordinator nodes. Then, no need to execute triggers on other (remote) coordinator nodes. Is there somebody knowing about trigger execution : is it (a) or (b) ? Thanks. Dominique On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: On 03/10/2014 07:49 AM, DE VITO Dominique wrote: If I update a data on DC1, I just want apps connected-first to DC2 to be informed when this data is available on DC2 after replication. If I run a SELECT, I'm going to receive the latest data per the read conditions (ONE, TWO, QUORUM), regardless of location of the client connection. If using network aware topology, you'll get the most current data in that DC. When using Thrift, one way could be to modify CassandraServer class, to send notification to apps according to data coming in into the coordinator node of DC2. Is it common (~ the way to do it) ? Is there another way to do so ? When using CQL, is there a precise src code place to modify for the same purpose ? Notifying connected clients about random INSERT or UPDATE statements that ran somewhere seems to be far, far outside the scope of storing data. Just configure your client to SELECT in the manner that you need. I may not fully understand your problem and could be simplifying things in my head, so feel free to expand. -- Michael First of all, thanks for you answer and your attention. I know about SELECT. The idea, here, is to avoid doing POLLING regularly, as it could be easily a performance nightmare. The idea is to replace POLLING with PUSH, just like in many cases like SEDA architecture, or CQRS architecture, or continuous querying with some data stores. So, following this PUSH idea, it would be nice to inform apps connected to a preferred DC that some new data have been replicated, and is now available. I hope it's clearer. Dominique
RE: about trigger execution ??? // RE: sending notifications through data replication on remote clusters
Thanks a lot. [@@ THALES GROUP INTERNAL @@] De : Edward Capriolo [mailto:edlinuxg...@gmail.com] Envoyé : lundi 10 mars 2014 16:47 À : user@cassandra.apache.org Objet : Re: about trigger execution ??? // RE: sending notifications through data replication on remote clusters Just so you know you should probably apply the [jira] [Commented] (CASSANDRA-6790) Triggers are broken in trunk ...https://www.google.com/url?sa=trct=jq=esrc=ssource=webcd=1ved=0CCcQFjAAurl=https%3A%2F%2Fwww.mail-archive.com%2Fcommits%40cassandra.apache.org%2Fmsg80563.htmlei=Hd4dU-7IGOTq0wHT-YHQCwusg=AFQjCNELJe3hmp_gJWXih91S1CL2f4KLtQsig2=xJ5h_7FqX-qZ6iVgXwpr-gbvm=bv.62578216,d.dmQ patch because triggers are currently only called on batch_mutate and will fail if called on insert. On Mon, Mar 10, 2014 at 10:50 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: You should be able to achieve what you're looking for with a trigger vs. a modification to the core of Cassandra. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support Well, good point. It leads to the question: (a) are triggers executed on all (local+remote) coordinator nodes (and then, N DC = N coordinator nodes = N executions of the triggers) ? (b) Or are triggers executed only on the first coordinator node, and not the (next/remote DC) coordinator nodes ? My opinion is (b), and in that case, triggers won't do the job. (b) would make sense, because the first coordinator node would augment original row mutations and propagate them towards other coordinator nodes. Then, no need to execute triggers on other (remote) coordinator nodes. Is there somebody knowing about trigger execution : is it (a) or (b) ? Thanks. Dominique On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: On 03/10/2014 07:49 AM, DE VITO Dominique wrote: If I update a data on DC1, I just want apps connected-first to DC2 to be informed when this data is available on DC2 after replication. If I run a SELECT, I'm going to receive the latest data per the read conditions (ONE, TWO, QUORUM), regardless of location of the client connection. If using network aware topology, you'll get the most current data in that DC. When using Thrift, one way could be to modify CassandraServer class, to send notification to apps according to data coming in into the coordinator node of DC2. Is it common (~ the way to do it) ? Is there another way to do so ? When using CQL, is there a precise src code place to modify for the same purpose ? Notifying connected clients about random INSERT or UPDATE statements that ran somewhere seems to be far, far outside the scope of storing data. Just configure your client to SELECT in the manner that you need. I may not fully understand your problem and could be simplifying things in my head, so feel free to expand. -- Michael First of all, thanks for you answer and your attention. I know about SELECT. The idea, here, is to avoid doing POLLING regularly, as it could be easily a performance nightmare. The idea is to replace POLLING with PUSH, just like in many cases like SEDA architecture, or CQRS architecture, or continuous querying with some data stores. So, following this PUSH idea, it would be nice to inform apps connected to a preferred DC that some new data have been replicated, and is now available. I hope it's clearer. Dominique
RE: Cassandra book/tutorial
Hi Erwin, Few books are coming out these months : * Octobre : Mastering Apache Cassandra http://www.packtpub.com/mastering-apache-cassandra/book * November : Cassandra High Performance Cookbook: Second Edition http://www.packtpub.com/cassandra-high-performance-cookbook/book * December : Practical Cassandra: A Developer's Approach http://www.amazon.com/Practical-Cassandra-Developers-Addison-Wesley-Analytics/dp/032193394X/ref=sr_1_5?s=booksie=UTF8qid=1382953729sr=1-5keywords=cassandra I expected them to be more up-to-date than the oldie Cassandra: The Definitive Guide by Eben Hewitt (Nov 29, 2010) This being said, there is also quite a bunch of great content online ! Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : erwin.karb...@gmail.com [mailto:erwin.karb...@gmail.com] De la part de Erwin Karbasi Envoyé : lundi 28 octobre 2013 07:16 À : user@cassandra.apache.org Objet : Re: Cassandra book/tutorial Thanks a lot to all for information. I think so that all the current Cassandra are pretty old and outdated. On Oct 28, 2013 6:51 AM, Joe Stein crypt...@gmail.commailto:crypt...@gmail.com wrote: Reading previous version's documentation and related information from that time in the past (like books) has value! It helps to understand decisions that were made and changed and some that are still the same like Secondary Indexes which were introduced in 0.7 when http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412 came out back in 2011. If you are really just getting started then I say go and start here http://www.planetcassandra.org/ /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoophttp://www.twitter.com/allthingshadoop / On Mon, Oct 28, 2013 at 12:15 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.commailto:deepuj...@gmail.com wrote: With lot of enthusiasm i started reading it. Its out-dated, error prone. I could not even get Cassandra running from that book. Eventually i could not get start with cassandra. On Mon, Oct 28, 2013 at 9:41 AM, Joe Stein crypt...@gmail.commailto:crypt...@gmail.com wrote: http://www.planetcassandra.org has a lot of great resources on it. Eben Hewitt's book is great, as are the other C* books like the High Performance Cookbook http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123 I would recommend reading both of those books. You can also read http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings. From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoophttp://www.twitter.com/allthingshadoop / On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.commailto:l.mohan...@gmail.com wrote: And here also good intro: http://10kloc.wordpress.com/category/nosql-2/ Thanks Mohan L On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.commailto:dav...@gmail.com wrote: Not a book, but I think this is a good start: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius dbros...@mebigfatguy.commailto:dbros...@mebigfatguy.com wrote: Unfortunately, as tech books tend to be, it's quite a bit out of date, at this point. On 10/27/2013 09:54 PM, Mohan L wrote: On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.commailto:er...@optinity.com wrote: Hey Guys, What is the best book to learn Cassandra from scratch? Thanks in advance, Erwin Hi, Buy : Cassandra: The Definitive Guide By Eben Hewitt : http://shop.oreilly.com/product/0636920010852.do Thanks Mohan L -- Deepak
RE: Cassandra book/tutorial
I don’t know : most of these books are not out, yet ;-) [@@ THALES GROUP INTERNAL @@] De : erwin.karb...@gmail.com [mailto:erwin.karb...@gmail.com] De la part de Erwin Karbasi Envoyé : lundi 28 octobre 2013 12:24 À : DE VITO Dominique Cc : user@cassandra.apache.org Objet : Re: Cassandra book/tutorial Dominique, Which of the books do you most recommend? IMO, the Practical Cassandra is the best one. Erwin Karbasi ATT, Senior Software Architect On Mon, Oct 28, 2013 at 1:21 PM, Erwin Karbasi er...@optinity.commailto:er...@optinity.com wrote: Thanks a lot Dominique for the great update, it helped me pretty much! Erwin Karbasi ATT, Senior Software Architect On Mon, Oct 28, 2013 at 11:51 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi Erwin, Few books are coming out these months : * Octobre : Mastering Apache Cassandra http://www.packtpub.com/mastering-apache-cassandra/book * November : Cassandra High Performance Cookbook: Second Edition http://www.packtpub.com/cassandra-high-performance-cookbook/book * December : Practical Cassandra: A Developer's Approach http://www.amazon.com/Practical-Cassandra-Developers-Addison-Wesley-Analytics/dp/032193394X/ref=sr_1_5?s=booksie=UTF8qid=1382953729sr=1-5keywords=cassandra I expected them to be more up-to-date than the oldie Cassandra: The Definitive Guide by Eben Hewitt (Nov 29, 2010) This being said, there is also quite a bunch of great content online ! Regards, Dominique [@@ THALES GROUP INTERNAL @@] De : erwin.karb...@gmail.commailto:erwin.karb...@gmail.com [mailto:erwin.karb...@gmail.commailto:erwin.karb...@gmail.com] De la part de Erwin Karbasi Envoyé : lundi 28 octobre 2013 07:16 À : user@cassandra.apache.orgmailto:user@cassandra.apache.org Objet : Re: Cassandra book/tutorial Thanks a lot to all for information. I think so that all the current Cassandra are pretty old and outdated. On Oct 28, 2013 6:51 AM, Joe Stein crypt...@gmail.commailto:crypt...@gmail.com wrote: Reading previous version's documentation and related information from that time in the past (like books) has value! It helps to understand decisions that were made and changed and some that are still the same like Secondary Indexes which were introduced in 0.7 when http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412 came out back in 2011. If you are really just getting started then I say go and start here http://www.planetcassandra.org/ /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoophttp://www.twitter.com/allthingshadoop / On Mon, Oct 28, 2013 at 12:15 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.commailto:deepuj...@gmail.com wrote: With lot of enthusiasm i started reading it. Its out-dated, error prone. I could not even get Cassandra running from that book. Eventually i could not get start with cassandra. On Mon, Oct 28, 2013 at 9:41 AM, Joe Stein crypt...@gmail.commailto:crypt...@gmail.com wrote: http://www.planetcassandra.org has a lot of great resources on it. Eben Hewitt's book is great, as are the other C* books like the High Performance Cookbook http://www.amazon.com/Cassandra-Performance-Cookbook-Edward-Capriolo/dp/1849515123 I would recommend reading both of those books. You can also read http://www.datastax.com/dev/blog/thrift-to-cql3 to help understandings. From there go with CQL http://cassandra.apache.org/doc/cql3/CQL.html /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoophttp://www.twitter.com/allthingshadoop / On Sun, Oct 27, 2013 at 11:58 PM, Mohan L l.mohan...@gmail.commailto:l.mohan...@gmail.com wrote: And here also good intro: http://10kloc.wordpress.com/category/nosql-2/ Thanks Mohan L On Mon, Oct 28, 2013 at 8:02 AM, Danie Viljoen dav...@gmail.commailto:dav...@gmail.com wrote: Not a book, but I think this is a good start: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html On Mon, Oct 28, 2013 at 3:14 PM, Dave Brosius dbros...@mebigfatguy.commailto:dbros...@mebigfatguy.com wrote: Unfortunately, as tech books tend to be, it's quite a bit out of date, at this point. On 10/27/2013 09:54 PM, Mohan L wrote: On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.commailto:er...@optinity.com wrote: Hey Guys, What is the best book to learn Cassandra from scratch? Thanks in advance, Erwin Hi, Buy : Cassandra: The Definitive Guide By Eben Hewitt : http://shop.oreilly.com/product/0636920010852.do Thanks Mohan L -- Deepak
about compression enabled by default in Cassandra 1.1.
Hi, Is compression working for whatever column value type ? in all cases ? For example, if my CF has columns with value type of byte[] (or blob when speaking CQL), is C* still doing compression ? Thanks. Regards, Dominique
RE: cost estimate about some Cassandra patchs
-Message d'origine- De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : mardi 7 mai 2013 10:22 À : user@cassandra.apache.org Objet : Re: cost estimate about some Cassandra patchs Use case = rows with rowkey like (folder id, file id) And operations read/write multiple rows with same folder id = so, it could make sense to have a partitioner putting rows with same folder id on the same replicas. The entire row key the thing we use to make the token used to both locate the replicas and place the row in the node. I don't see that changing. Well, we can't do that, because of secondary indexes on rows. Only the C* v2 will allow the row design you mention, with secondary index. So, this row design you mention is a no go for us, with C* 1.1 or 1.2. Have you done any performance testing to see if this is a problem? Unfortunately, we have just some pieces, today, for doing performance testing. We are beginning. But still, I investigate to know if alternative designs are (at least) possible. Because if no alternative design is easy to develop, then there's no need to compare performance. The lesson I learnt here is that, if I would restart our project from the beginning, I would start a more extensive performance testing project along with business project development. It's a kind of must-have for a NoSQL database. So, the only tests we have done so far with our FolderPartitioner is with a one machine-cluster. As expected, due to the more important work of this FolderPartitioner, the CPU is a better higher (~10%), memory and network consumptions are the same than with RP, but I have strange results for I/O (average hard drive), for example, for a write-only test. I don't know why the I/O consumption could be much higher with our FolderPartitioner than with the RP. So, I am questioning my measurement methods, and my C* understanding. Well, the use of such FolderPartitioner is quite a long way to go... Regards. Dominique Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/05/2013, at 5:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : dimanche 28 avril 2013 22:54 À : user@cassandra.apache.org Objet : Re: cost estimate about some Cassandra patchs Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ? Assuming you mean have the coordinator send multiple row read/write requests in a single message to replicas Pretty sure this has been raised as a ticket before but I cannot find one now. It would be a significant change and I'm not sure how big the benefit is. To send the messages the coordinator places them in a queue, there is little delay sending. Then it waits on them async. So there may be some saving on networking but from the coordinators point of view I think the impact is minimal. What is your use case? Use case = rows with rowkey like (folder id, file id) And operations read/write multiple rows with same folder id = so, it could make sense to have a partitioner putting rows with same folder id on the same replicas. But so far, Cassandra is not able to exploit this locality as batch effect ends at the coordinator node. So, my question about the cost estimate for patching Cassandra. The closest (or exactly corresponding to my need ?) JIRA entries I have found so far are: CASSANDRA-166: Support batch inserts for more than one key at once https://issues.apache.org/jira/browse/CASSANDRA-166 = WON'T FIX status CASSANDRA-5034: Refactor to introduce Mutation Container in write path https://issues.apache.org/jira/browse/CASSANDRA-5034 = I am not very sure if it's related to my topic Thanks. Dominique Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 4:04 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, We are created a new partitioner that groups some rows with **different** row keys on the same replicas. But neither the batch_mutate, or the multiget_slice are able to take opportunity of this partitioner-defined placement to vectorize/batch communications between the coordinator and the replicas. Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ? Thanks. Regards, Dominique
RE: cost estimate about some Cassandra patchs
De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : dimanche 28 avril 2013 22:54 À : user@cassandra.apache.org Objet : Re: cost estimate about some Cassandra patchs Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ? Assuming you mean have the coordinator send multiple row read/write requests in a single message to replicas Pretty sure this has been raised as a ticket before but I cannot find one now. It would be a significant change and I'm not sure how big the benefit is. To send the messages the coordinator places them in a queue, there is little delay sending. Then it waits on them async. So there may be some saving on networking but from the coordinators point of view I think the impact is minimal. What is your use case? Use case = rows with rowkey like (folder id, file id) And operations read/write multiple rows with same folder id = so, it could make sense to have a partitioner putting rows with same folder id on the same replicas. But so far, Cassandra is not able to exploit this locality as batch effect ends at the coordinator node. So, my question about the cost estimate for patching Cassandra. The closest (or exactly corresponding to my need ?) JIRA entries I have found so far are: CASSANDRA-166: Support batch inserts for more than one key at once https://issues.apache.org/jira/browse/CASSANDRA-166 = WON'T FIX status CASSANDRA-5034: Refactor to introduce Mutation Container in write path https://issues.apache.org/jira/browse/CASSANDRA-5034 = I am not very sure if it's related to my topic Thanks. Dominique Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 4:04 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi, We are created a new partitioner that groups some rows with **different** row keys on the same replicas. But neither the batch_mutate, or the multiget_slice are able to take opportunity of this partitioner-defined placement to vectorize/batch communications between the coordinator and the replicas. Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ? Thanks. Regards, Dominique
cost estimate about some Cassandra patchs
Hi, We are created a new partitioner that groups some rows with **different** row keys on the same replicas. But neither the batch_mutate, or the multiget_slice are able to take opportunity of this partitioner-defined placement to vectorize/batch communications between the coordinator and the replicas. Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to enable such communication vectorization/batch ? Thanks. Regards, Dominique
RE: data modeling from batch_mutate point of view
Thanks Aaron. It helped. Let's me rephrase a little bit my questions. It's about data modeling impact on batch_mutate advantages. I have one CF for storing data, and ~10 (all different) CF used for indexing that data. when adding a piece of data, I need to add indexes too, and then, I need to add columns to one row for each of the 10 indexing CF = 2 main designs are possible for adding these new indexes. a) all the updated 10 rows of indexing CF have different rowkeys b) all the updated 10 rows of indexing CF have all the same rowkey AFAIK, this has effect on batch_mutate: a) the batch_mutate advantages stop at the coordinator node. The advantage appears for the communication client=coordinator node b) the batch_mutate advantages are better, for the communication client=coordinator node __and__ for the communications coordinator node=replicas. So, for resuming: a) CF with few data repeats (good) but the coordinator node needs to communicate to different replicas according to different rowkeys b) CF with more denormalization, repeating some data, again and again over composite columns, but batch_mutate performs better (good) up to replicas, and not only up to coordinator node. Each option has one pro and one con. Is there any experience out there about such data modeling (option_a vs option_b) from the batch_mutate perspective ? Thanks. Dominique De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : mardi 9 avril 2013 10:12 À : user@cassandra.apache.org Objet : Re: data modeling from batch_mutate point of view So, one alternative design for indexing CF could be: rowkey = folder_id colname = (indexed value, timestamp, file_id) colvalue = If you always search in a folder what about rowkey = folder_id : property_name : property_value colname = file_id (That's closer to secondary indexes in cassandra with the addition of the folder_id) According to pro vs con, is the alternative design more or less interesting ? IMHO it's normally better to spread the rows and consider how they grow over time. You can send updates for multiple rows in the same batch mutation. Hope that helps. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/04/2013, at 3:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi, I have a use case that sounds like storing data associated with files. So, I store them with the CF: rowkey = (folder_id, file_id) colname = property name (about the file corresponding to file_id) colvalue = property value And I have CF for manual indexing: rowkey = (folder_id, indexed value) colname = (timestamp, file_id) colvalue = like rowkey = (folder_id, note_of_5) or (folder_id, some_status) colname = (some_date, some_filename) colvalue = I have many CF for indexing, as I index according to different (file) properties. So, one alternative design for indexing CF could be: rowkey = folder_id colname = (indexed value, timestamp, file_id) colvalue = Alternative design : * pro: same rowkey for all indexing CF = **all** indexing CF could be updated through one batch_mutate * con: repeating indexed value (1er colname part) again ang again (= a string up to 20c) According to pro vs con, is the alternative design more or less interesting ? Thanks. Dominique
data modeling from batch_mutate point of view
Hi, I have a use case that sounds like storing data associated with files. So, I store them with the CF: rowkey = (folder_id, file_id) colname = property name (about the file corresponding to file_id) colvalue = property value And I have CF for manual indexing: rowkey = (folder_id, indexed value) colname = (timestamp, file_id) colvalue = like rowkey = (folder_id, note_of_5) or (folder_id, some_status) colname = (some_date, some_filename) colvalue = I have many CF for indexing, as I index according to different (file) properties. So, one alternative design for indexing CF could be: rowkey = folder_id colname = (indexed value, timestamp, file_id) colvalue = Alternative design : * pro: same rowkey for all indexing CF = **all** indexing CF could be updated through one batch_mutate * con: repeating indexed value (1er colname part) again ang again (= a string up to 20c) According to pro vs con, is the alternative design more or less interesting ? Thanks. Dominique
other questions about // RE: batch_mutate
When the coordinator node receives a batch_mutate with different N row keys (for different CF) : a) does it treat them as N independent requests to replicas, or b) does the coordinator node split the the initial batch_mutate into M batch_mutate (M = N) according to rowkeys ? Thanks, Dominique De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : vendredi 6 juillet 2012 01:21 À : user@cassandra.apache.org Objet : Re: batch_mutate Does it mean that the popular use case is when we need to update multiple column families using the same key? Yes. Shouldn't we design our space in such a way that those columns live in the same column family? Design a model where the data for common queries is stored in one row+cf. You can also take into consideration the workload. e.g. things are are updated frequently often live together, things that are updated infrequently often live together. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/07/2012, at 3:16 AM, Leonid Ilyevsky wrote: I actually found an answer to my first question at http://wiki.apache.org/cassandra/API. So I got it wrong: actually the outer key is the key in the table, and the inner key is the table name (this was somewhat counter-intuitive). Does it mean that the popular use case is when we need to update multiple column families using the same key? Shouldn't we design our space in such a way that those columns live in the same column family? From: Leonid Ilyevsky [mailto:lilyev...@mooncapital.com] Sent: Thursday, July 05, 2012 10:39 AM To: 'user@cassandra.apache.orgmailto:user@cassandra.apache.org' Subject: batch_mutate My current way of inserting rows one by one is too slow (I use cql3 prepared statements) , so I want to try batch_mutate. Could anybody give me more details about the interface? In the javadoc it says: public voidbatch_mutate(java.util.Mapjava.nio.ByteBuffer,java.util.Mapjava.lang.String,java.util.ListMutationfile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\Mutation.html mutation_map, ConsistencyLevelfile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\ConsistencyLevel.html consistency_level) throws InvalidRequestExceptionfile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\InvalidRequestException.html, UnavailableExceptionfile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\UnavailableException.html, TimedOutExceptionfile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\TimedOutException.html, org.apache.thrift.TException Description copied from interface: Cassandra.Ifacefile:///C:\Tools\apache-cassandra-1.1.1\javadoc\org\apache\cassandra\thrift\Cassandra.Iface.html#batch_mutate%28java.util.Map,%20org.apache.cassandra.thrift.ConsistencyLevel%29 Mutate many columns or super columns for many row keys. See also: Mutation. mutation_map maps key to column family to a list of Mutation objects to take place at that scope. * I need to understand the meaning of the elements of mutation_map parameter. My guess is, the key in the outer map is columnfamily name, is this correct? The key in the inner map is, probably, a key to the columnfamily (it is somewhat confusing that it is String while the outer key is ByteBuffer, I wonder what is the rational). If this is correct, how should I do it if my key is a composite one. Does anybody have an example? Thanks, Leonid This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email. This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email
RE: Cassandra 1.2 Atomic Batches and Thrift API
Is Cassandra 1.1 Row Level Isolation (a kind of batch-like) related to traditional batch_mutate or atomic_batch_mutate Thrift API ? Thanks for the answer. Dominique De : Sylvain Lebresne [mailto:sylv...@datastax.com] Envoyé : mardi 12 février 2013 10:19 À : user@cassandra.apache.org Objet : Re: Cassandra 1.2 Atomic Batches and Thrift API Yes, it's called atomic_batch_mutate and is used like batch_mutate. If you don't use thrift directly (which would qualify as a very good idea), you'll need to refer to whatever client library you are using to see if 1) support for that new call has been added and 2) how to use it. If you are not sure what is the best way to contact the developers of you client library, then you may try the Cassandra client mailing list: client-...@cassandra.apache.orgmailto:client-...@cassandra.apache.org. -- Sylvain On Tue, Feb 12, 2013 at 4:44 AM, Drew Kutcharian d...@venarc.commailto:d...@venarc.com wrote: Hey Guys, Is the new atomic batch feature in Cassandra 1.2 available via the thrift API? If so, how can I use it? -- Drew
about validity of recipe A node join using external data copy methods
Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
RE: about validity of recipe A node join using external data copy methods
Now streaming is very efficient rarely fails and there is no need to do it this way anymore I guess it's true in v1.2. Is it true also in v1.1 ? Thanks. Dominique De : Edward Capriolo [mailto:edlinuxg...@gmail.com] Envoyé : mardi 8 janvier 2013 16:01 À : user@cassandra.apache.org Objet : Re: about validity of recipe A node join using external data copy methods Basically this recipe is from the old days when we had anti-compaction. Now streaming is very efficient rarely fails and there is no need to do it this way anymore. This recipe will be abolished from the second edition. It still likely works except when using counters. Edward On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split, with 2 nodes: the existing one, and a new one. And for starting a new node, one needs to: - copy the data records from the corresponding node (without the system records) - start the new node with auto_bootstrap: false This raises 2 questions: A) is this recipe still valid with v1.1 and v1.2 ? B) do we still need to start the new node with auto_bootstrap: false ? My guess is yes as the happening of the bootstrap phase is not recorded into the data records. Thanks. Dominique [1] see recipe A node join using external data copy methods, page 165
replace_token versus nodetool repair
Hi, Is nodetool repair only usable if the node to repair has a valid (= up-to-date with its neighbors) schema? If the data records are completely broken on a node with token, is it valid to clean the (data) records and to execute replace_token=token on the *same* node? Thanks. Regards, Dominique
RE: property 'disk_access_mode' not found in cassandra.yaml
That's what I guesseed too (as this property could be found into the src code). But... I am still a bit surprised it's not, at least, mentionned into the doc (into some dedicated section). Dominique De : Alain RODRIGUEZ [mailto:arodr...@gmail.com] Envoyé : vendredi 4 janvier 2013 15:57 À : user@cassandra.apache.org Objet : Re: property 'disk_access_mode' not found in cassandra.yaml Is the default 'auto' the best possible option so that no one has to worry about it anymore ? Yes something like that I guess. You can add the disk_access_mode property to your cassandra.yaml and set it to standard to disable memory mapped access, it will work. It's the same thing for the auto_bootstrap parameter and some other property like these two, they are not written in the conf anymore but still read if they exist. Alain 2013/1/4 DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com Is the default 'auto' the best possible option so that no one has to worry about it anymore ?
RE: Force data to a specific node
Hi Everton, AFAIK, the pb is not forcing data to a specific node, but forcing some kind of data locality. There is things into cql to do it: you define a composite key (K1, K2), and K1 part is used as a rowkey and K2 is used within column name. So, all rows with same K1 are on the same node. See also https://issues.apache.org/jira/browse/CASSANDRA-5054 Dominique De : Everton Lima [mailto:peitin.inu...@gmail.com] Envoyé : mercredi 2 janvier 2013 19:20 À : user@cassandra.apache.org Objet : Re: Force data to a specific node We need to do this to minimize the network I/O. We have our own load data balance algorithm. We have some data that is best to process in a local machine. Is it possible? How? 2013/1/2 Edward Sargisson edward.sargis...@globalrelay.netmailto:edward.sargis...@globalrelay.net Why would you want to? From: Everton Lima peitin.inu...@gmail.commailto:peitin.inu...@gmail.com To: Cassandra-User user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Wed Jan 02 18:03:49 2013 Subject: Force data to a specific node It is possible to force a data to stay in a specific node? -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
RE: Force data to a specific node
Hi Sávio, There is no definitive response: it depends on your business model ;-) I just guess here it should be something like the id of some data root. Take also a look at http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 and look for partition key, if you want to go through CQL. De : Sávio Teles [mailto:savio.te...@lupa.inf.ufg.br] Envoyé : jeudi 3 janvier 2013 14:58 À : user@cassandra.apache.org Objet : Re: Force data to a specific node Hi Dominique, I have the same problem! I would like to place an object in a specific node because I'm working in a spatial application. How should I choose the K1 part to forcing a given object to go to a node? 2013/1/3 DE VITO Dominique dominique.dev...@thalesgroup.commailto:dominique.dev...@thalesgroup.com Hi Everton, AFAIK, the pb is not forcing data to a specific node, but forcing some kind of data locality. There is things into cql to do it: you define a composite key (K1, K2), and K1 part is used as a rowkey and K2 is used within column name. So, all rows with same K1 are on the same node. See also https://issues.apache.org/jira/browse/CASSANDRA-5054 Dominique De : Everton Lima [mailto:peitin.inu...@gmail.commailto:peitin.inu...@gmail.com] Envoyé : mercredi 2 janvier 2013 19:20 À : user@cassandra.apache.orgmailto:user@cassandra.apache.org Objet : Re: Force data to a specific node We need to do this to minimize the network I/O. We have our own load data balance algorithm. We have some data that is best to process in a local machine. Is it possible? How? 2013/1/2 Edward Sargisson edward.sargis...@globalrelay.netmailto:edward.sargis...@globalrelay.net Why would you want to? From: Everton Lima peitin.inu...@gmail.commailto:peitin.inu...@gmail.com To: Cassandra-User user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Wed Jan 02 18:03:49 2013 Subject: Force data to a specific node It is possible to force a data to stay in a specific node? -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
RE: what happens while node is bootstrapping?
De : Tyler Hobbs [mailto:ty...@datastax.com] Envoyé : mardi 16 octobre 2012 17:04 À : user@cassandra.apache.org Objet : Re: what happens while node is bootstrapping? On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh ailin...@gmail.com wrote: Does it mean that during bootstrapping process only replicas serve read requests for new node range? In other words, replication factor is RF-1? No. The bootstrapping node will writes for its new range while bootstrapping as consistency optimization (more or less), but does not contribute to the replication factor or consistency level; all of the original replicas for that range still receive writes, serve reads, and are the nodes that count for consistency level. Basically, the bootstrapping node has no effect on the existing replicas in terms of RF or CL until the bootstrap completes. -- Tyler Hobbs DataStax For the purposes of the consistency optimization, I would have written that the (new) node bootstrapping should receive writes, but also its own replicas (!). In case of SimpleStrategy, it's obvious that the new node replicas are included into the original replicas. So, it's valid to say The bootstrapping node will writes for its new range while bootstrapping as consistency optimization without mentionning the new node replicas. In case of NetworkTopologyStrategy, after having played with some examples to support my following claim, I suspect the new node replicas are also included into the original replicas. So, I am inclined to say it's valid too to say The bootstrapping node will writes for its new range while bootstrapping as consistency optimization without mentionning the new node replicas. 1) In case of NetworkTopologyStrategy, for a bootstrap, is it correct to say that the new node replicas are *always* included into the original replicas ? (I think Cassandra dev have already proved it). 2) in case of bootstrapping multiple nodes at the same time, the replicas of a new node are *not*always included into the original replicas. Is it a pb for Cassandra (and then, do we need to bootstrap nodes one by one ?), or is Cassandra able to detect multiple nodes are bootstrapping and to deal with it to fetch data on the right nodes ? Thanks. Regards, Dominique
what happens while node is decommissioning ?
De : Tyler Hobbs [mailto:ty...@datastax.com] Envoyé : mardi 16 octobre 2012 17:04 À : user@cassandra.apache.org Objet : Re: what happens while node is bootstrapping? On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh ailin...@gmail.com wrote: Does it mean that during bootstrapping process only replicas serve read requests for new node range? In other words, replication factor is RF-1? No. The bootstrapping node will writes for its new range while bootstrapping as consistency optimization (more or less), but does not contribute to the replication factor or consistency level; all of the original replicas for that range still receive writes, serve reads, and are the nodes that count for consistency level. Basically, the bootstrapping node has no effect on the existing replicas in terms of RF or CL until the bootstrap completes. -- Tyler Hobbs DataStax Is it symmetric for the decommission ? Well, is it correct that: - during a decommission, all of the original replicas for that range still receive writes, serve reads, and are the nodes that count for consistency level ? - and so, basically, the decommissioning node has no effect on the existing replicas in terms of RF or CL until the end of decommission ? - as a consistency optimization, all the new replicas will receive too the writes ? Thanks. Regards, Dominique
question about config leading to an unbalanced ring
Hi, Let's imagine a cluster of 6 nodes, 5 on rack1 and 1 on rack2. With RF=3 and NetworkTopologyStrategy, The first replica per data center is placed according to the partitioner (same as with SimpleStrategy). Additional replicas in the same data center are then determined by walking the ring clockwise until a node in a different rack from the previous replica is found. So, if I understand correctly the data of rack1's 5 nodes will be replicated on the single node of rack2. And then, the node of rack1 will host all the data of the cluster. Is it correct? Thanks. Best regards, Dominique
what about an hybrid partitioner for CF with composite row key ?
Hi, We have defined a CF with a composite row key that sounds like (folder id, doc id). For our app, one very common pattern is accessing, through one ui action, some bunch of data with the following row keys: (id, id_1), (id, id_2), (id, id_3)... So, multiple rows are accessed, but all row keys have the same 1st part folder id. * with the BOP: for one ui action, one simple node is requested (in average), but it's much harder to balances the cluster nodes * with the RP: for one ui action, many nodes may be requested, but it's simpler to balance the cluster one sweeter(?) partitioner would be a partitioner that would distribute a row according only to the first part of its key (= according to folder id only). Is it doable to implement such a partitioner ? Thanks. Regards, Dominique
question about updates internal work in case of cache
Hi, Let's suppose a column (name+value) is cached in memory, with timestamp T. 1) An update, for this column, arrives with exactly the *same* timestamp, and the *same* value. Is the commitlog updated ? 2) An update, for this column, arrives with a timestamp T. Is the commitlog updated ? Thanks for your help. Regards, Dominique
some tests with Composite (Composite as column names=OK, as row keys=KO)
Hi, I have tried few experiments with Composite (first, as columns, and next, as rows). I have followed the paths described by http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1 My composite is (UTF8, UTF8): (folderId, filename) And I have inserted for all tests, the following data: (FID, AT2), (FID, BT2)... (FID, FT2)... (FID, ZT2) I am using Hector + Cassandra 1.0.7 with the ByteOrderedPartitioner a) using a Composite for column names See [a] for the CF definition, and the algorithms used. All data is into 1 row, named 1, including all the columns with the following Composite names: (FID, AT2), (FID, BT2)... (FID, FT2)... (FID, ZT2) I wanted to get only filenames (2nd component of the Composite) starting with FT. If the algorithm is correct, I should get one single result FT2. I have used a SliceQuery following the DataStax algo above: just OK I have used too a RangeSlicesQuery following the DataStax algo above: just OK For both cases, I have got only one result: FT2. Perfect. b) using a Composite for row keys See [b] for the CF definition, and the algorithms used. All data are in rows with the following Composite row keys: (FID, AT2), (FID, BT2)... (FID, FT2)... (FID, ZT2) I wanted to get only filenames (2nd component of the Composite) starting with FT. If the algorithm is correct, I should get one single result FT2. I have used a RangeSlicesQuery following the DataStax algo above: KO Instead of getting a single result FT2, I have got 6 items: AT2, BT2, CT2, DT2, ET2, FT2 Well, I am still wondering why I didn't get the same result for Composite as row keys than Composite as column names. Is anyone having an idea ? and a solution for fetching all the rows with a key like (FID, FT*) - that is, the 1st component is fixed FID, and the 2nd component is expected to start with FT ? Thanks. Regards, Dominique [a] create column family CF_COL with comparator='CompositeType(UTF8Type, UTF8Type)' and default_validation_class=UTF8Type and key_validation_class=UTF8Type; [a.1] using SliceQuery SliceQueryString, Composite, String q = hector.createSliceQuery(getKeyspace(), stringSerializer, new CompositeSerializer(), stringSerializer); q.setColumnFamily(CF_COL); Composite startRange = new Composite(); startRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL); startRange.addComponent(1, FT, AbstractComposite.ComponentEquality.EQUAL); Composite endRange = new Composite(); endRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL); endRange.addComponent(1, FT + Character.MAX_VALUE, AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL); q.setKey(1); q.setRange(startRange, endRange, false, 1000); [a.2] using RangeSlicesQuery RangeSlicesQueryString, Composite, String q = hector.createRangeSlicesQuery(getKeyspace(), stringSerializer, new CompositeSerializer(), stringSerializer); q.setColumnFamily(CF_COL); Composite startRange = new Composite(); startRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL); startRange.addComponent(1, FT, AbstractComposite.ComponentEquality.EQUAL); Composite endRange = new Composite(); endRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL); endRange.addComponent(1, FT + Character.MAX_VALUE, AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL); q.setKeys(1, 1); q.setRange(startRange, endRange, false, 1000); q.setRowCount(1); [b] create column family CF_ROW with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class='CompositeType(UTF8Type, UTF8Type)'; RangeSlicesQueryComposite, String, String q = hector.createRangeSlicesQuery(getKeyspace(), new CompositeSerializer(), stringSerializer, stringSerializer); q.setColumnFamily(CF_ROW); Composite startRange = new Composite(); startRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL); startRange.addComponent(1, FT, AbstractComposite.ComponentEquality.EQUAL); Composite endRange = new Composite(); endRange.addComponent(0, FID, AbstractComposite.ComponentEquality.EQUAL);
is there a no disk storage mode ?
Hi, I want to use Cassandra for (fast) unit testing with a small number of data. So, I imagined the Cassandra embedded server I plan to use would start faster and would be more portable (because no file path depending on OS), without disk storage mode (so, diskless if you want). Is there some no disk storage mode for Cassandra ? Thanks. Regards, Dominique