Re: Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-18 Thread daemeon reiydelle
Given you only have 16 columns vs. over 200 ... I would expect a substantial improvement in writes, but not 5x. Ditto reads. I would be interested to understand where that 5x comes from. *...* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Thu, Feb 18, 2016

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Jan, thanks! That makes perfect sense to run a second time before stopping cassandra. I'll add that in when I do the production cluster. On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten wrote: > Hi Branton, > > two cents from me - I didnt look through the script, but for the

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Here's what I ended up doing on a test cluster. It seemed to work well. I'm running a full repair on the production cluster, probably over the weekend, then I'll have a go at the test cluster again and go for broke. # sync to temporary directory on original volume rsync -azvuiP

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Jan Kesten
Hi Branton, two cents from me - I didnt look through the script, but for the rsyncs I do pretty much the same when moving them. Since they are immutable I do a first sync while everything is up and running to the new location which runs really long. Meanwhile new ones are created and I sync

Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-18 Thread Chandra Sekar KR
Hi, I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) data types in Cassandra & its ease of use/impact across other DSE products - Spark & Solr. We are migrating an OLTP database from RDBMS to Cassandra which has 200+ columns and with an average daily volume of

Re: High Bloom filter false ratio

2016-02-18 Thread Anishek Agarwal
Hey all, @Jaydeep here is the cfstats output from one node. Read Count: 1721134722 Read Latency: 0.04268825050756254 ms. Write Count: 56743880 Write Latency: 0.014650376727851532 ms. Pending Tasks: 0 Table: user_stay_points SSTable count: 1289 Space used (live), bytes: 122141272262 Space

Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-18 Thread Sotirios Delimanolis
We have a Cassandra cluster with 24 nodes. These nodes were running 2.0.16.  While the nodes are in the ring and handling queries, we perform the upgrade to 2.1.12 as follows (more or less) one node at a time: - Stop the Cassandra process - Deploy jars, scripts, binaries, etc. -

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Alain, thanks for sharing! I'm confused why you do so many repetitive rsyncs. Just being cautious or is there another reason? Also, why do you have --delete-before when you're copying data to a temp (assumed empty) directory? On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ

Re: High Bloom filter false ratio

2016-02-18 Thread daemeon reiydelle
The bloom filter buckets the values in a small number of buckets. I have been surprised by how many cases I see with large cardinality where a few values populate a given bloom leaf, resulting in high false positives, and a surprising impact on latencies! Are you seeing 2:1 ranges between mean

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Anuj Wadehra
Whats the GC overhead? Can you your share your GC collector and settings ? Whats your query pattern? Do you use secondary indexes, batches, in clause etc? Anuj Sent from Yahoo Mail on Android On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner wrote: Alain, Thanks for the

Re: High Bloom filter false ratio

2016-02-18 Thread Tyler Hobbs
You can try slightly lowering the bloom_filter_fp_chance on your table. Otherwise, it's possible that you're repeatedly querying one or two partitions that always trigger a bloom filter false positive. You could try manually tracing a few queries on this table (for non-existent partitions) to

Re: „Using Timestamp“ Feature

2016-02-18 Thread Tyler Hobbs
2016-02-18 2:00 GMT-06:00 Matthias Niehoff : > > * is the 'using timestamp' feature (and providing statement timestamps) > sufficiently robust and mature to build an application on? > Yes. It's been there since the start of CQL3. > * In a BatchedStatement,

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Mike Heffner
Alain, Thanks for the suggestions. Sure, tpstats are here: https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the metrics across the ring, there were no blocked tasks nor dropped messages. Iowait metrics look fine, so it doesn't appear to be blocking on disk. Similarly, there are

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thanks a lot Alian. We did rely on "unsafeassasinate" earlier, which worked. We were planning to upgrade from 2.0.14 version to 2.1.12, on all our clusters. But we are trying to figure out why decommissioned nodes are showing up in the "nodetool describecluster" as "UNREACHABLE". thanks Sai On

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thank you Ben. We are using cassandra 2.1.12 version. We did face the bug mentioned https://issues.apache.org/jira/browse/CASSANDRA-10371 in DSE 4.6.7, in another cluster. It's strange we are seeing that even in cassandra 2.1.12 version. The "nodetool describecluster" showing decommissioned

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Mike Heffner
Following up from our earlier post... We have continued to do exhaustive testing and measuring of the numerous hardware and configuration variables here. What we have uncovered is that on identical hardware (including the configuration we run in production), something between versions 2.0.17 and

Re: How Cassandra reduce the size of stored data ?

2016-02-18 Thread Alain RODRIGUEZ
I know no paper, but here is some informations that might be of interest http://www.datastax.com/2015/12/storage-engine-30 Also Cassandra uses standard compression (LZ4, Snappy, Deflate) depending on user choice - for data storage

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Alain RODRIGUEZ
I did the process a few weeks ago and ended up writing a runbook and a script. I have anonymised and share it fwiw. https://github.com/arodrime/cassandra-tools/tree/master/remove_disk It is basic bash. I tried to have the shortest down time possible, making this a bit more complex, but it allows

How Cassandra reduce the size of stored data ?

2016-02-18 Thread Thouraya TH
Hi all, Please, is there a scientific paper about this topic "How Cassandra reduce the size of stored data on nodes and exchanged between nodes"? Thank you so much for help. Best Regards.

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Alain RODRIGUEZ
Hi Mike, What about the output of tpstats ? I imagine you have dropped messages there. Any blocked threads ? Could you paste this output here ? May this be due to some network hiccup to access the disks as they are EBS ? Can you think of anyway of checking this ? Do you have a lot of GC logs,

„Using Timestamp“ Feature

2016-02-18 Thread Matthias Niehoff
Hi, I have a few questions regarding the „Using timestamp“ feature. I would be glad if you can help me. * is the 'using timestamp' feature (and providing statement timestamps) sufficiently robust and mature to build an application on? * In a BatchedStatement, can different statements have