Re: Realtime data and (C)AP

2015-10-09 Thread Graham Sanderson
Most of our writes are not user facing so local_quorum is good... We also read at local_quorum because we prefer guaranteed consistency... But we very quickly fall back to local_one in the cases where some data fast is better than a failure. Currently we do that on a per read basis but we could

Re: Realtime data and (C)AP

2015-10-09 Thread Graham Sanderson
Actually maybe I'll open a JIRA issue for a (local)quorum_or_one consistency level... It should be trivial to implement on server side with exist timeouts ... I'll need to check the CQL protocol to see if there is a good place to indicate you didn't reach quorum (in time) Sent from my iPhone

Post portem of a large Cassandra datacenter migration.

2015-10-09 Thread Kevin Burton
We just finished up a pretty large migration of about 30 Cassandra boxes to a new datacenter. We'll be migrating to about 60 boxes here in the next month so scalability (and being able to do so cleanly) is important. We also completed an Elasticsearch migration at the same time. The ES

Re: CLUSTERING ORDER BY importance with ssd's

2015-10-09 Thread Nate McCall
> > > If I am selecting a range from the bottom of the partition, does it make > much of a difference (considering I only use ssd's) if the clustering order > is ASC or DESC. > The only impact is that there is an extra seek to the bottom of the partition.

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Nazario Parsacala
So the trace is varying a lot. And does not seem to correlate with the data return from the client ? Maybe datastax java driver related. ..? (not likely).. Just checkout the results. Below is the one that I took when from the client (java application) perspective it was returning data in

Re: CLUSTERING ORDER BY importance with ssd's

2015-10-09 Thread Ricardo Sancho
this probably depends on the number of rows we have but should one worry performance wise about this seek? or from how many rows should we worry about this? On 9 October 2015 at 21:26, Nate McCall wrote: > >> If I am selecting a range from the bottom of the partition,

Re: Is replication possible with already existing data?

2015-10-09 Thread anuja jain
Hi Ajay, On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg wrote: > On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg wrote: > > Thanks Eric for the reply. > > > > > > On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens wrote: > >> If you're at 1

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Jonathan Haddad
I'd be curious to see GC logs. jstat -gccause On Fri, Oct 9, 2015 at 2:16 PM Tyler Hobbs wrote: > Hmm, it seems off to me that the merge step is taking 1 to 2 seconds, > especially when there are only ~500 cells from one sstable and the > memtables. Can you open a ticket

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Nazario Parsacala
I will send the jstat output later. I have created the ticket: https://issues.apache.org/jira/browse/CASSANDRA-10502 > On Oct 9, 2015, at 5:20 PM, Jonathan Haddad wrote: > > I'd be curious to see GC logs. > > jstat -gccause > > On Fri, Oct 9, 2015 at 2:16 PM Tyler

Re: Realtime data and (C)AP

2015-10-09 Thread Steve Robenalt
Hi Brice, I agree with your nit-picky comment, particularly with respect to the OP's emphasis, but there are many cases where read at ONE is sufficient and performance is "better enough" to justify the possibility of a wrong result. As with anything Cassandra, it's highly dependent on the nature

OpsCenter issue with DCE 2.1.9

2015-10-09 Thread Kai Wang
Hi, OpsCenter/Agent works sporadically for me. I am testing with DCE 2.1.9 on Win7 x64. I seem to narrow it down to the following log messages. When it works: INFO [Initialization] 2015-10-01 08:49:02,016 New JMX connection ( 127.0.0.1:7199) ERROR [Initialization] 2015-10-01 08:49:02,344 Error

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Tyler Hobbs
Hmm, it seems off to me that the merge step is taking 1 to 2 seconds, especially when there are only ~500 cells from one sstable and the memtables. Can you open a ticket ( https://issues.apache.org/jira/browse/CASSANDRA) with your schema, details on your data layout, and these traces? On Fri,

Re: Realtime data and (C)AP

2015-10-09 Thread Steve Robenalt
Hi Graham, I've used the Java driver's DowngradingConsistencyRetryPolicy for that in cases where it makes sense. Ref: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html Steve On Fri, Oct 9, 2015 at 6:06 PM, Graham Sanderson

Re: compaction with LCS

2015-10-09 Thread Anishek Agarwal
Looks like some of the nodes have higher sstables on L0 and compaction is running there, so only few nodes run compaction at a time and the preference is given to lower level nodes for compaction before going to higher levels ? so is compaction cluster aware then ? On Fri, Oct 9, 2015 at 5:17

compaction with LCS

2015-10-09 Thread Anishek Agarwal
hello, on doing cfstats for the column family i see SSTables in each level: [1, 10, 109/100, 1, 0, 0, 0, 0, 0] i thought compaction would trigger since the 3rd level tables are move than expected number, but on doing compactionstats its shows "n/a" -- any reason why its not triggering, should

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-09 Thread sai krishnam raju potturi
thanks Jonathan. I see a advantage in doing it one AZ or rack at a time. On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad wrote: > My hunch is the bigger your cluster the less impact it will have, as each > node takes part in smaller and smaller % of total queries.

Re: Spark and intermediate results

2015-10-09 Thread Jonathan Haddad
You can run spark against your Cassandra data directly without using a shared filesystem. https://github.com/datastax/spark-cassandra-connector On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > Hello, > > I saw this nice link from an event:

Re: Spark and intermediate results

2015-10-09 Thread Marcelo Valle (BLOOMBERG/ LONDON)
I know the connector, but having the connector only means it will take *input* data from Cassandra, right? What about intermediate results? If it stores intermediate results on Cassandra, could you please clarify how data locality is handled? Will it store in other keyspace? I could not find

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Nazario Parsacala
So I upgraded to 2.2.2 and change the compaction strategy from DateTieredCompactionStrategy to LeveledCompactionStrategy. But the problem still exists. At the start we were getting responses around 80 to a couple of hundred of ms. But after 1.5 hours of running, it is now hitting 1447 ms. I

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Tyler Hobbs
That looks like CASSANDRA-10478 , which will probably result in 2.2.3 being released shortly. I'm not sure how that affects performance, but as mentioned in the ticket, you can add "disk_access_mode: standard" to cassandra.yaml to avoid it.

RE: Why can't nodetool status include a hostname?

2015-10-09 Thread SEAN_R_DURITY
I ended up writing some of my own utilities and aliases to make output more useful for me (and reduce some typing, too). Resolving host names was a big one for me, too. Ip addresses are almost useless. Up time in seconds is useless. The –r in nodetool is a nice addition, but I like the short

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Carlos Alonso
Yeah, I was about to suggest the compaction strategy too. Leveled compaction sounds like a better fit when records are being updated Carlos Alonso | Software Engineer | @calonso On 8 October 2015 at 22:35, Tyler Hobbs wrote: > Upgrade to 2.2.2.

Spark and intermediate results

2015-10-09 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Hello, I saw this nice link from an event: http://www.datastax.com/dev/blog/zen-art-spark-maintenance?mkt_tok=3RkMMJWWfF9wsRogvqzIZKXonjHpfsX56%2B8uX6GylMI%2F0ER3fOvrPUfGjI4GTcdmI%2BSLDwEYGJlv6SgFSrXMMblswLgIXBY%3D I would like to test using Spark to perform some operations on a column family,

Re: Cassandra query degradation with high frequency updated tables.

2015-10-09 Thread Nazario Parsacala
Compaction did not help too. > On Oct 9, 2015, at 1:01 PM, Nazario Parsacala wrote: > > So I upgraded to 2.2.2 and change the compaction strategy from > DateTieredCompactionStrategy to LeveledCompactionStrategy. But the problem > still exists. > At the start we were

Re: Spark and intermediate results

2015-10-09 Thread karthik prasad
Spark's core module uses this connector to read data from Cassandra and create RDD's or DataFrames in its workspace (In memory/on disc, depending on the spark configurations). Then transformations or queries are applied on RDD's or DataFrames respectively. The end results are stored back into

CLUSTERING ORDER BY importance with ssd's

2015-10-09 Thread Ricardo Sancho
If I have a table CREATE TABLE status ( user text, time timestamp, status text, PRIMARY KEY (user, time)) WITH CLUSTERING ORDER BY (time ASC); adapted from http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1 This means at the top of the partition the oldest date

Re: Realtime data and (C)AP

2015-10-09 Thread Brice Dutheil
On Fri, Oct 9, 2015 at 2:27 AM, Steve Robenalt wrote: In general, if you write at QUORUM and read at ONE (or LOCAL variants > thereof if you have multiple data centers), your apps will work well > despite the theoretical consistency issues. Nit-picky comment : if

SSTableWriter error: incorrect row data size

2015-10-09 Thread Eiti Kimura
Hello Guys, Have a cluster with 6 nodes using Cassandra 1.2. I have my Keyspace and tables created using thrift cassandra-cli. Now I just created a new table using cqlsh as follows: CREATE TABLE idx_conf ( conf_id int, ref_id text, subs_key text, data text, enabled boolean,