Re:Re: Data export with consistency problem

2016-03-25 Thread xutom
Thanks for ur reply! I am so sorry for my poor English. My keyspace replication is 3 and client read and write CL both are QUORUM. If we remove the network cable of one node, import 30 million rows of data into that table, and thenreconnect the network cable, we export the data immediately and

Re: How is the coordinator node in LOCAL_QUORUM chosen?

2016-03-25 Thread Robert Coli
On Fri, Mar 25, 2016 at 1:04 PM, X. F. Li wrote: > Suppose I have replication factor 3. If one of the node fails, will > queries with ALL consistency fail if the queried partition is on the failed > node? Or would they continue to work with 2 replicas during the time while >

How is the coordinator node in LOCAL_QUORUM chosen?

2016-03-25 Thread X. F. Li
Hello, Local quorum works in the same data center as the coordinator node, but when an app server execute the write query, how is the coordinator node chosen? I use the node.js driver. How do the driver client determine which cassandra nodes are in the same DC as the client node? Does it

Re: What is the best way to model my time series?

2016-03-25 Thread K. Lawson
Sorry Gerard, I'm afraid I'm not familiar with that project. The time series I've described is a relatively minor component of an application which is already powered by Cassandra, so you can see why I'd prefer a viable way (which I'm quickly learning may not exist) to modelit in Cassandra. On

Re: apache cassandra for trading system

2016-03-25 Thread Jonathan Haddad
You can use keyspaces with multiple data centers to get what you want. That said, if you're going to use only 1 node, I don't think Cassandra is the right fit for you. http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/ On Fri, Mar 25, 2016 at 11:09 AM Vero Kato

Re: apache cassandra for trading system

2016-03-25 Thread Russell Bradberry
One option could be to set up two data centers and have two separate keyspaces, one for today data and the other for historical data. You can write to the today_data keyspace with a TTL of 24 hours then write the same data to the historical_data keyspace. You then set up your replication to

Re: How many nodes do we require

2016-03-25 Thread Jonathan Haddad
Why would using CL-ONE make your cluster fragile? This isn't obvious to me. It's the most practical setting for high availability, which very much says "not fragile". On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet < jacques-henri.berthe...@genesys.com> wrote: > I found this calculator

apache cassandra for trading system

2016-03-25 Thread Vero Kato
hi, we are building trading system and we want to use cassandra as our database. we want this set-up: one node which stored one day set of data which will be running on the same host as trading application two nodes which store all data (for the current date and historical) which will be running

Re: What is the best way to model my time series?

2016-03-25 Thread Gerard Maas
Hi, It sounds to me like Apache Kafka would be a better fit for your requirements. Have you considered that option? kr, Gerard Datastax MVP for Apache Cassandra (so, I'm not suggesting other tech for any other reason that seeing it as a better fit) On Fri, Mar 25, 2016 at 1:31 PM, K. Lawson

Re: Understanding Cassandra tuning

2016-03-25 Thread Giampaolo Trapasso
Yes, RF=1 as https://gist.github.com/giampaolotrapasso/9f0242fc60144ada458c#file-stress-yaml and there you can find also my stress schema. Any case, with a so low throughput I think I should move to other type of AWS instances before repeating the test and investigate further on tuning. Thanks,

Re: What is the best way to model my time series?

2016-03-25 Thread K. Lawson
Hi Jack, thanks for the interest in my inquiry. Let me see if I can answer your questions. 1. The growth rate of the time series is expected to be relatively constant throughout a given day, while processing is expected to be carried out in bursts, several times a day. 2. I'm not sure what you

RE: How many nodes do we require

2016-03-25 Thread Jacques-Henri Berthemet
I found this calculator very convenient: http://www.ecyrd.com/cassandracalculator/ Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 if you write/read at ONE. Obviously using ONE as CL makes your cluster very fragile. -- Jacques-Henri Berthemet -Original

Re: How many nodes do we require

2016-03-25 Thread Rakesh Kumar
On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky wrote: > It depends on how much data you have. A single node can store a lot of data, > but the more data you have the longer a repair or node replacement will > take. How long can you tolerate for a full repair or node

Re: Understanding Cassandra tuning

2016-03-25 Thread Jack Krupansky
Your IOPS and throughput seem to be below the AWS limits, but... I wonder if replication is doubling those numbers and then a little write amplification may then bump you into the AWS limits. See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html What RF are you

Re: How many nodes do we require

2016-03-25 Thread Jack Krupansky
It depends on how much data you have. A single node can store a lot of data, but the more data you have the longer a repair or node replacement will take. How long can you tolerate for a full repair or node replacement? Generally, RF=3 is both sufficient and recommended. -- Jack Krupansky On

Understanding Cassandra tuning

2016-03-25 Thread Giampaolo Trapasso
Hi to all, I want to understand better Cassandra 2.2.5 tuning for my app (and C* tuning in general). In the app I'm developing, the typical scenario is the upload on cluster of a large binary file (in the order of GBs). Long story short, after many failed try to get specific upload throughput

How many nodes do we require

2016-03-25 Thread Rakesh Kumar
We have two data centers. Our requirement is simple Assuming that we have equal number of nodes in each DC we should be able to run with the loss of one DC and loss of at most one node in the surviving DC. Can this be achieved with 6 nodes (3 in each). Obviously for that all data must be

Re: What is the best way to model my time series?

2016-03-25 Thread Jack Krupansky
Still trying to get a handle on the magnitude of the problem... 1. You said that the rate of growth is a max of a few hundred, but no mention of the rate of processing (removal). 2. Are these numbers per item or for all items? In any case, how many items are you anticipating? Ballpark - dozens,

RE: What is the best way to model my time series?

2016-03-25 Thread SEAN_R_DURITY
I think this one is better…

Re: What is the best way to model my time series?

2016-03-25 Thread K. Lawson
Sean, the link you have supplied does not seem to work. On Fri, Mar 25, 2016 at 9:43 AM, wrote: > You might take a look at this previous conversation on queue-type > applications and Cassandra. Generally this is an anti-pattern for a > distributed system like

RE: What is the best way to model my time series?

2016-03-25 Thread SEAN_R_DURITY
You might take a look at this previous conversation on queue-type applications and Cassandra. Generally this is an anti-pattern for a distributed system like Cassandra.

What is the best way to model my time series?

2016-03-25 Thread K. Lawson
While adhering to best practices, I am trying to model a time series in Cassandra that is compliant with the following access pattern directives: - Is to be both read and shrank by a single party, grown by multiple parties - Is to be read as a queue (in other words, its entries, from first to

Re: Data export with consistency problem

2016-03-25 Thread Alain RODRIGUEZ
Hi Jerry, It is all a matter of replication server side and consistency level client side. The minimal setup to ensure availability and a strong consistency is RF= 3 and CL = (LOCAL_)QUORUM. This way, one node can go down, you still can reach the 2 needed nodes to validate your reads & writes

Re: Scenarios when a node can be missing writes

2016-03-25 Thread Alain RODRIGUEZ
Hi, 1) and 2) I understand it the same way you do :-). > 3) Node is up and receives the write but is too overloaded to handle it > and drops the mutation. This should be visible in tpstats as dropped > mutation. Does the write still stay in the hinted handoff table of the > coordinator and if

Re: disk space used vs nodetool status

2016-03-25 Thread Alain RODRIGUEZ
Hi Anishek they were created from more than a couple of months ago You then probably free a fair amount of data :-). We didn't do any actions that would create a snapshot You shouldn't have any snapshot unless you drop or truncate a table, call them through "nodetool snapshot" or run repair

Re: Pending compactions not going down on some nodes of the cluster

2016-03-25 Thread Alain RODRIGUEZ
Hi, Any improvement on this? 2 ideas coming to my mind: Yes, we are storing timeseries-like binary blobs where data is heavily > TTLed (essentially the entire column family is incrementally refreshed with > completely new data every few days) This looks to me like a good fit for TWCS

Re: datastax java driver Batch vs BatchStatement

2016-03-25 Thread Alexandre Dutra
Hi, Query builder's Batch simply sends a QUERY message through the wire where the query string is a CQL batch statement : "BEGIN BATCH ... APPLY BATCH". BatchStatement actually sends a BATCH message

Re: cqlsh problem

2016-03-25 Thread Alain RODRIGUEZ
Hi Joseph. As I can't reproduce here, I believe you are having network issue of some kind. MacBook-Pro:~ alain$ cqlsh --version cqlsh 5.0.1 MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh --connect-timeout=5 --request-timeout=10 system_traces system MacBook-Pro:~ alain$ It's been a few