OOM and high SSTables count

2015-03-04 Thread Roni Balthazar
Hi there, We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers / DC2 - 10 Servers. DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of RAM and 5GB HEAP. DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB. DC2 is used only for backup purposes. There are no

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread Mikhail Strebkov
We have observed the same issue in our production Cassandra cluster (5 nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to realize we shouldn’t user 2.1.x yet) on Amazon machines (created from community AMI). In addition to count variations with 5 to 10% we observe

Re: OOM and high SSTables count

2015-03-04 Thread daemeon reiydelle
Are you finding a correlation between the shards on the OOM DC1 nodes and the OOM DC2 nodes? Does your monitoring tool indicate that the DC1 nodes are using significantly more CPU (and memory) than the nodes that are NOT failing? I am leading you down the path to suspect that your sharding is

Write timeout under load but Read is fine

2015-03-04 Thread Jaydeep Chovatia
Hi, In my test program when I increase load then I keep getting few write timeout from Cassandra say every 10~15 mins. My read:write ratio is 50:50. My reads are fine but only writes time out. Here is my Cassandra details: Version: 2.0.11 Ring of 3 nodes with RF=3 Node configuration: 24 core +

Re: OOM and high SSTables count

2015-03-04 Thread Patrick McFadin
What kind of disks are you running here? Are you getting a lot of GC before the OOM? Patrick On Wed, Mar 4, 2015 at 9:26 AM, Jan cne...@yahoo.com wrote: HI Roni; You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of RAM and 5GB HEAP. Best practices would

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread Jens Rantil
Frens, What consistency are you querying with? Could be you are simply receiving result from different nodes each time. Jens – Skickat från Mailbox On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov streb...@gmail.com wrote: We have observed the same issue in our production Cassandra

Re: Write timeout under load but Read is fine

2015-03-04 Thread Jan
HI Jaydeep;  - look at the i/o  on all three nodes - Increase the write_request_timeout_in_ms: 1 - check the time-outs if any on the client inserting the Writes - check the Network for  dropped/lost packets hope this helpsJan/ On Wednesday, March 4,

Re: cassandra node jvm stall intermittently

2015-03-04 Thread Jan
HI Jason;  Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com wrote: Hi, our cassandra node using java 7

Re: Streaming failures during bulkloading data using CqlBulkOutputFormat

2015-03-04 Thread Yuki Morishita
Do you have corresponding error in the other side of the stream (/192.168.56.11)? On Wed, Mar 4, 2015 at 9:11 AM, Aby Kuruvilla aby.kuruvi...@envisagesystems.com wrote: I am trying to use the CqlBulkOutputFormat in a Hadoop job to bulk load data into Cassandra. Was not able to find any

Re: Input/Output Error

2015-03-04 Thread Jens Rantil
Hi, Check your Cassandra and kernel (if on Linux) log files for errors. Cheers, Jens – Skickat från Mailbox On Wed, Mar 4, 2015 at 2:18 AM, 曹志富 cao.zh...@gmail.com wrote: Some times My C* 2.1.3 cluster compaction or streaming occur this error ,do this because of disk or filesystem

Re: OOM and high SSTables count

2015-03-04 Thread graham sanderson
We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously did not match our production ones in some critical way) We have about 20k sstables on each of 6 nodes right now; actually a quick glance shows 15k of those are from OpsCenter, which may have something to do with

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread DuyHai Doan
Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? Actually, depending on the definition of your primary key, select count(*) and select distinct

Re: Howto remove currently assigned data directory from 2.0.12 nodes

2015-03-04 Thread Robert Coli
On Wed, Mar 4, 2015 at 3:28 PM, Steffen Winther cassandra.u...@siimnet.dk wrote: Howto remove already assigned data file directories from running nodes? 1) stop node 2) move sstables from no-longer-data-directories into still-data-directories 3) modify conf file 4) start node I wonder how

Does it makes sense to split Gossip from Thrift network

2015-03-04 Thread Steffen Winther
Hi Wondering if if makes sense to split network for client traffic vs Gossip/Internode traffic (possible with larger MTU for storage traffic). So I tried this: - Gossip storage listener (port 700x) on one network - Thrift/CQL listeners (port 9160/9042) on another Only I find it a bit confusing

Re: Does it makes sense to split Gossip from Thrift network

2015-03-04 Thread Steffen Winther
daemeon reiydelle daemeonr at gmail.com writes: If your cluster is typical, your most critical resource is your network bandwidth, if this is the case, I would not do this split you are proposing. One issue with large MTU's is that they are often split at the switch fabric. Got control of my

java consuming lot of cpu with lots of futex calls

2015-03-04 Thread Steffen Winther
Hi Trying to make a test lab workable with cassandra 1.2.15 nodes on Centos 6.6 kernel 2.6.32-504.8.1.el6.x86_64 on top of KVM nodes. But I finding java perf very poor, seems JVM is doing a lot of futext sys calls which times out, thus spinning a lot of cpu cycles. Tried with both Oracle java

Re: Howto remove currently assigned data directory from 2.0.12 nodes

2015-03-04 Thread Steffen Winther
Robert Coli rcoli at eventbrite.com writes: 1) stop node 2) move sstables from no-longer-data-directories into still-data-directories Okay, just into any other random data dir? Few files here and there to spread amount of data between still-data-dirs? 3) modify conf file 4) start node I

Howto remove currently assigned data directory from 2.0.12 nodes

2015-03-04 Thread Steffen Winther
HI Got a cassandra cluster 2.0.12 with three nodes, that I would like to reduce storage capacity as I would like to reuse some disks for a PoC cassandra 1.2.15 cluster on the same nodes. Howto remove already assigned data file directories from running nodes? f.ex. got: data_file_directories :

Re: Input/Output Error

2015-03-04 Thread 曹志富
thanks! -- Ranger Tsao 2015-03-05 3:40 GMT+08:00 Jens Rantil jens.ran...@tink.se: Hi, Check your Cassandra and kernel (if on Linux) log files for errors. Cheers, Jens – Skickat från Mailbox https://www.dropbox.com/mailbox On Wed, Mar 4, 2015 at

Re: OOM and high SSTables count

2015-03-04 Thread J. Ryan Earl
We think it is this bug: https://issues.apache.org/jira/browse/CASSANDRA-8860 We're rolling a patch to beta before rolling it into production. On Wed, Mar 4, 2015 at 4:12 PM, graham sanderson gra...@vast.com wrote: We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously did

Cassandra Dead but pid file exists

2015-03-04 Thread Mohit Garg
I have novice to cassandra and tried my hands to install cassandra-2.1.2 on centos 7.0. After complete installation execute cqlsh command and created few keyspace(s) and column family. Which seems to me in first glance its working perfectly. But later onwards i realized below issues: 1 when i

Re: Issue restarting cassandra with a cluster running Cassandra 1.2.x and Cassandra 2.0.x

2015-03-04 Thread Fabrice Facorat
Upgrade a node from 1.2.13 to 2.0.10 works correctly and we did run upgradesstable on the new 2.0.x node. The issue lies with the others nodes still running Cassandra 1.2.x which failed to start if you did just a restart of the node. Here is the describecluster output during the upgrade

Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread Rumph, Frens Jan
Hi, Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? I have a table in a keyspace with replication_factor = 1 which is something like: CREATE

cassandra node jvm stall intermittently

2015-03-04 Thread Jason Wee
Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such stalling cause timeout to our cluster.