Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
I read that I shouldn't install version less than 6 in the end. But I
started with 2.1.0. Then I upgraded to 2.1.3.

But as I know, I cannot downgrade it

On Wed, Feb 25, 2015 at 12:05 PM, Carlos Rolo r...@pythian.com wrote:

 Your latency doesn't seem that high that can cause that problem. I suspect
 more of a problem with the Cassandra version (2.1.3) than that with the
 hard drives. I didn't look deep into the information provided but for your
 reference, the only time I had serious (leading to OOM and all sort of
 weird behavior) my hard drives where near 70ms latency.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Wed, Feb 25, 2015 at 11:19 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 I write some question before about my problems with C* cluster. All my
 environment is described here:
 https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
 To sum up I have thousands SSTables in one DC and much much less in
 second. I write only to first DC.

 Anyway after reading a lot of post/mails/google I start to think that the
 only reason of above is disk problems.

 My OpsCenter with some stats is following:
 https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

 My iostats are like this:
 https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
 (dm-XX are C* drives. dm-11 is for commitlog)

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?

 Regards
 Piotrek



 --






Re: how to scan all rows of cassandra using multiple threads

2015-02-25 Thread Clint Kelly
Hi Gaurav,

I recommend you just run a MapReduce job for this computation.

Alternatively, you can look at the code for the C* MapReduce input format:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java

That should give you what you need to iterate over independent token ranges.

If you want, you can also just divide up the total token range for the
partitioner you are using into equal chunks and have each of your threads
execute a separate scan.

Best regards,
Clint


On Tue, Feb 24, 2015 at 9:50 AM, Gaurav Bhatnagar gauravb...@gmail.com
wrote:

 Hi,
  I have a cassandra cluster of 3 nodes holding around 300 million rows
 of items. I have a replication factor of 3 with read/write consistency as
 Quorum. I want to scan all rows of database to generate sum of items having
 value available in column name state and value batch1 in column name
 batch. Row key for item is a 15 digit random number.
 I want to do this processing in multiple threads for instance one
 thread generating sum for one portion of data and other thread generating
 sum for another disjoint portion of data and later I would add up total
 from these 2 threads to get final sum.
 What can be the possible way to achieve this? Can I use concept of
 virtual nodes here. Each node owns set of virtual nodes.
  Can I get data owned by a particular node and this way generate sum
 on different nodes by iterating over data from virtual nodes and later
 generate total sum by doing sum of data from all virtual nodes.

 Regards,
 Gaurav



Unexplained query slowness

2015-02-25 Thread Robert Wille
Our Cassandra database just rolled to live last night. I’m looking at our query 
performance, and overall it is very good, but perhaps 1 in 10,000 queries takes 
several hundred milliseconds (up to a full second). I’ve grepped for GC in the 
system.log on all nodes, and there aren’t any recent GC events. I’m executing 
~500 queries per second, which produces negligible load and CPU utilization. I 
have very minimal writes (one every few minutes). The slow queries are across 
the board. There isn’t one particular query that is slow.

I’m running 2.0.12 with SSD’s. I’ve got a 10 node cluster with RF=3.

I have no idea where to even begin to look. Any thoughts on where to start 
would be greatly appreciated.

Robert



Re: Unexplained query slowness

2015-02-25 Thread Carlos Rolo
You can use query tracing to check what is happening. Also you fire
jconsole/JavaVisualVM and push out some metrics like the 99th read Beans
for that column family.
A simpler check is using cfstats and look for weird numbers (high number
sstables, if you are deleting check how much tombstones per scan, etc).

Another is checking if compactions are not running when you query.
Opscenter can provide some graphs and help out.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Wed, Feb 25, 2015 at 4:32 PM, Robert Wille rwi...@fold3.com wrote:

 Our Cassandra database just rolled to live last night. I’m looking at our
 query performance, and overall it is very good, but perhaps 1 in 10,000
 queries takes several hundred milliseconds (up to a full second). I’ve
 grepped for GC in the system.log on all nodes, and there aren’t any recent
 GC events. I’m executing ~500 queries per second, which produces negligible
 load and CPU utilization. I have very minimal writes (one every few
 minutes). The slow queries are across the board. There isn’t one particular
 query that is slow.

 I’m running 2.0.12 with SSD’s. I’ve got a 10 node cluster with RF=3.

 I have no idea where to even begin to look. Any thoughts on where to start
 would be greatly appreciated.

 Robert



-- 


--





Re: Possible problem with disk latency

2015-02-25 Thread Nate McCall

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?



You disks are effectively idle. What consistency level are you using for
reads and writes?

Actually, 'await' is sort of weirdly high for idle SSDs. Check your
interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
not being stacked on a single CPU.


Re:Unexplained query slowness

2015-02-25 Thread Marcelo Valle (BLOOMBERG/ LONDON)
I am sorry if it's too basic and you already looked at that, but the first 
thing I would ask would be the data model.

What data model are you using (how is your data partitioned)? What queries are 
you running? If you are using ALLOW FILTERING, for instance, it will be very 
easy to say why it's slow. 

Most times people get slow queries in Cassandra they are using the wrong data 
model.

[]s

From: user@cassandra.apache.org 
Subject: Re:Unexplained query slowness

Our Cassandra database just rolled to live last night. I’m looking at our query 
performance, and overall it is very good, but perhaps 1 in 10,000 queries takes 
several hundred milliseconds (up to a full second). I’ve grepped for GC in the 
system.log on all nodes, and there aren’t any recent GC events. I’m executing 
~500 queries per second, which produces negligible load and CPU utilization. I 
have very minimal writes (one every few minutes). The slow queries are across 
the board. There isn’t one particular query that is slow.

I’m running 2.0.12 with SSD’s. I’ve got a 10 node cluster with RF=3.

I have no idea where to even begin to look. Any thoughts on where to start 
would be greatly appreciated.

Robert




Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi Ja,

How are the pending compactions distributed between the nodes?
Run nodetool compactionstats on all of your nodes and check if the
pendings tasks are balanced or they are concentrated in only few
nodes.
You also can check the if the SSTable count is balanced running
nodetool cfstats on your nodes.

Cheers,

Roni Balthazar



On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
 I do NOT have SSD. I have normal HDD group by JBOD.
 My CF have SizeTieredCompactionStrategy
 I am using local quorum for reads and writes. To be precise I have a lot of
 writes and almost 0 reads.
 I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
 compactionthrouput to 999.

 So if my disk are idle, my CPU is less then 40%, I have some free RAM - why
 SSTables count is growing? How I can speed up compactions?

 On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote:



 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with above
 cluster? Maybe I have misconfiguration?



 You disks are effectively idle. What consistency level are you using for
 reads and writes?

 Actually, 'await' is sort of weirdly high for idle SSDs. Check your
 interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
 not being stacked on a single CPU.





Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi Roni,

It is not balanced. As I wrote you last week I have problems only in DC in
which we writes (on screen it is named as AGRAF:
https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The
problem is on ALL nodes in this dc.
In second DC (ZETO) only one node have more than 30 SSTables and pending
compactions are decreasing to zero.

In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg on
screen from opscenter is less then 5000)


Regards
Piotrek.

p.s. I don't know why my mail client display my name as Ja Sam instead of
Piotr Stapp, but this doesn't change anything :)


On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 Hi Ja,

 How are the pending compactions distributed between the nodes?
 Run nodetool compactionstats on all of your nodes and check if the
 pendings tasks are balanced or they are concentrated in only few
 nodes.
 You also can check the if the SSTable count is balanced running
 nodetool cfstats on your nodes.

 Cheers,

 Roni Balthazar



 On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
  I do NOT have SSD. I have normal HDD group by JBOD.
  My CF have SizeTieredCompactionStrategy
  I am using local quorum for reads and writes. To be precise I have a lot
 of
  writes and almost 0 reads.
  I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
  compactionthrouput to 999.
 
  So if my disk are idle, my CPU is less then 40%, I have some free RAM -
 why
  SSTables count is growing? How I can speed up compactions?
 
  On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com
 wrote:
 
 
 
  If You could be so kind and validate above and give me an answer is my
  disk are real problems or not? And give me a tip what should I do with
 above
  cluster? Maybe I have misconfiguration?
 
 
 
  You disks are effectively idle. What consistency level are you using for
  reads and writes?
 
  Actually, 'await' is sort of weirdly high for idle SSDs. Check your
  interrupt mappings (cat /proc/interrupts) and make sure the interrupts
 are
  not being stacked on a single CPU.
 
 
 



Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
I do NOT have SSD. I have normal HDD group by JBOD.
My CF have SizeTieredCompactionStrategy
I am using local quorum for reads and writes. To be precise I have a lot of
writes and almost 0 reads.
I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
compactionthrouput to 999.

So if my disk are idle, my CPU is less then 40%, I have some free RAM - why
SSTables count is growing? How I can speed up compactions?

On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote:



 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?



 You disks are effectively idle. What consistency level are you using for
 reads and writes?

 Actually, 'await' is sort of weirdly high for idle SSDs. Check your
 interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
 not being stacked on a single CPU.





Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread Garret Pick
Hello,

I'm having problems getting cassandra to start with the configuration
listed above.

Yum wants to install 3.2.4-2.el6 of the JNA along with several other
packages including java-1.7.0-openjdk

The documentation states that a JNA version earlier that 3.2.7 should not
be used, so the jar file should be downloaded and installed directly into
C*'s lib directory per

http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html

From /var/log/cassandra/system.log

all I see is

 INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
Classpath:
/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar

and it never actually starts

Note that JNA is in the classpath above and is when I remove it, cassandra
starts successfully.

I tried installing the DSE package and it looks like it wants to install
the older 3.2.4 JNA as a dependency so there seems to be a discrepancy in
documentation

Per

http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html

Note: JNA (Java Native Access) is automatically installed.

thanks for any help,
Garret


Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi Roni,
The repair results is following (we run it Friday): Cannot proceed on
repair because a neighbor (/192.168.61.201) is dead: session failed

But to be honest the neighbor did not died. It seemed to trigger a series
of full GC events on the initiating node. The results form logs are:

[2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
[2015-02-21 02:21:55,640] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:22:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:23:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:24:55,644] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 04:41:08,607] Repair session
d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]
failed with error org.apache.cassandra.exceptions.RepairException: [repair
#d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
Sync failed between /192.168.71.196 and /192.168.61.199
[2015-02-21 04:41:08,608] Repair session
eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
(68056473384187696470568107782069813248,85070591730234615865843651857942052874]
failed with error java.io.IOException: Endpoint /192.168.61.199 died
[2015-02-21 04:41:08,608] Repair session
c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
(42535295865117307932921825928971026442,68056473384187696470568107782069813248]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
(127605887595351923798765477786913079306,136112946768375392941136215564139626496]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,619] Repair session
c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
(136112946768375392941136215564139626496,0] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair session
c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
(102084710076281535261119195933814292480,127605887595351923798765477786913079306]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair command #2 finished


We tried to run repair one more time. After 24 hour have some streaming
errors. Moreover, 2-3 hours later, we have to stop it because we start to
have write timeouts on client and our system starts to dying.
The iostats from dying time plus tpstats are available here:
https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view



On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 Hi Piotr,

 Are your repairs finishing without errors?

 Regards,

 Roni Balthazar

 On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote:
  Hi, Roni,
  They aren't exactly balanced but as I wrote before they are in range from
  2500-6000.
  If you need exactly data I will check them tomorrow morning. But all
 nodes
  in AGRAF have small increase of pending compactions during last week,
 which
  is wrong direction
 
  I will check in the morning get compaction throuput, but my feeling about
  this parameter is that it doesn't change anything.
 
  Regards
  Piotr
 
 
 
 
  On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com
 
  wrote:
 
  Hi Piotr,
 
  What about the nodes on AGRAF? Are the pending tasks balanced between
  this DC nodes as well?
  You can check the pending compactions on each node.
 
  Also try to run nodetool getcompactionthroughput on all nodes and
  check if the compaction throughput is set to 999.
 
  Cheers,
 
  Roni Balthazar
 
  On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote:
   Hi Roni,
  
   It is not balanced. As I wrote you last week I have problems only in
 DC
   in
   which we writes (on screen it is named as AGRAF:
   https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view).
 The
   problem is on ALL nodes in this dc.
   In second DC (ZETO) only one node have more than 30 SSTables and
 pending
   compactions are decreasing to zero.
  
   In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg
   on
   screen from opscenter is less 

Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
One more thing. Hinted Handoff for last week for all nodes was less than 5.
For me every READ is a problem because it must open too many files (3
SSTables), which occurs as an error in reads, repairs, etc.
Regards
Piotrek

On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 It is not obvious, because data is replicated to second data center. We
 check it manually for random records we put into Cassandra and we find
 all of them in secondary DC.
 We know about every single GC failure, but this doesn't change anything.
 The problem with GC failure is only one: restart the node. For few days we
 do not have GC errors anymore. It looks for me like memory leaks.
 We use Chef.

 By MANUAL compaction you mean running nodetool compact?  What does it
 change to permanently running compactions?

 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the
 secondary data center (yeah, obvious). I would suspect the GC errors are
 (also obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the
 primary you are looking at (and can of course be much more, depending on
 e.g. vnode count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your
 DevOps team may have a favorite log shipping/monitoring tool, could use
 e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.





 *...*






 *“Life should not be a journey to the grave with the intention of
 arriving safely in apretty and well preserved body, but rather to skid in
 broadside in a cloud of smoke,thoroughly used up, totally worn out, and
 loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
 ReiydelleUSA (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0)
 20 8144 9872 %28%2B44%29%20%280%29%2020%208144%209872*

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs
 are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi Piotr,

Are your repairs finishing without errors?

Regards,

Roni Balthazar

On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote:
 Hi, Roni,
 They aren't exactly balanced but as I wrote before they are in range from
 2500-6000.
 If you need exactly data I will check them tomorrow morning. But all nodes
 in AGRAF have small increase of pending compactions during last week, which
 is wrong direction

 I will check in the morning get compaction throuput, but my feeling about
 this parameter is that it doesn't change anything.

 Regards
 Piotr




 On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 Hi Piotr,

 What about the nodes on AGRAF? Are the pending tasks balanced between
 this DC nodes as well?
 You can check the pending compactions on each node.

 Also try to run nodetool getcompactionthroughput on all nodes and
 check if the compaction throughput is set to 999.

 Cheers,

 Roni Balthazar

 On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote:
  Hi Roni,
 
  It is not balanced. As I wrote you last week I have problems only in DC
  in
  which we writes (on screen it is named as AGRAF:
  https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The
  problem is on ALL nodes in this dc.
  In second DC (ZETO) only one node have more than 30 SSTables and pending
  compactions are decreasing to zero.
 
  In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg
  on
  screen from opscenter is less then 5000)
 
 
  Regards
  Piotrek.
 
  p.s. I don't know why my mail client display my name as Ja Sam instead
  of
  Piotr Stapp, but this doesn't change anything :)
 
 
  On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar
  ronibaltha...@gmail.com
  wrote:
 
  Hi Ja,
 
  How are the pending compactions distributed between the nodes?
  Run nodetool compactionstats on all of your nodes and check if the
  pendings tasks are balanced or they are concentrated in only few
  nodes.
  You also can check the if the SSTable count is balanced running
  nodetool cfstats on your nodes.
 
  Cheers,
 
  Roni Balthazar
 
 
 
  On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
   I do NOT have SSD. I have normal HDD group by JBOD.
   My CF have SizeTieredCompactionStrategy
   I am using local quorum for reads and writes. To be precise I have a
   lot
   of
   writes and almost 0 reads.
   I changed cold_reads_to_omit to 0.0 as someone suggest me. I used
   set
   compactionthrouput to 999.
  
   So if my disk are idle, my CPU is less then 40%, I have some free RAM
   -
   why
   SSTables count is growing? How I can speed up compactions?
  
   On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com
   wrote:
  
  
  
   If You could be so kind and validate above and give me an answer is
   my
   disk are real problems or not? And give me a tip what should I do
   with
   above
   cluster? Maybe I have misconfiguration?
  
  
  
   You disks are effectively idle. What consistency level are you using
   for
   reads and writes?
  
   Actually, 'await' is sort of weirdly high for idle SSDs. Check your
   interrupt mappings (cat /proc/interrupts) and make sure the
   interrupts
   are
   not being stacked on a single CPU.
  
  
  
 
 




Re: Possible problem with disk latency

2015-02-25 Thread daemeon reiydelle
I think you may have a vicious circle of errors: because your data is not
properly replicated to the neighbour, it is not replicating to the
secondary data center (yeah, obvious). I would suspect the GC errors are
(also obviously) the result of a backlog of compactions that take out the
neighbour (assuming replication of 3, that means each neighbour is
participating in compaction from at least one other node besides the
primary you are looking at (and can of course be much more, depending on
e.g. vnode count if used).

What happens is that when a node fails due to a GC error (can't reclaim
space), that causes a cascade of other errors, as you see. Might I suggest
you have someone in devops with monitoring experience install a monitoring
tool that will notify you of EVERY SINGLE java GC failure event? Your
DevOps team may have a favorite log shipping/monitoring tool, could use
e.g. Puppet

I think you may have to go through a MANUAL, table by table compaction.





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a series
 of full GC events on the initiating node. The results form logs are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 (136112946768375392941136215564139626496,0] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair session
 c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
 (102084710076281535261119195933814292480,127605887595351923798765477786913079306]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair command #2 finished


 We tried to run repair one more time. After 24 hour have some streaming
 errors. Moreover, 2-3 hours later, we have to stop it because we start to
 have write timeouts on client and our system starts to dying.
 The iostats from dying time plus tpstats are available here:
 https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view



 On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 Hi Piotr,

 Are your repairs finishing without errors?

 Regards,

 Roni Balthazar

 On 25 

Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
It is not obvious, because data is replicated to second data center. We
check it manually for random records we put into Cassandra and we find
all of them in secondary DC.
We know about every single GC failure, but this doesn't change anything.
The problem with GC failure is only one: restart the node. For few days we
do not have GC errors anymore. It looks for me like memory leaks.
We use Chef.

By MANUAL compaction you mean running nodetool compact?  What does it
change to permanently running compactions?

Regards
Piotrek

On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the
 secondary data center (yeah, obvious). I would suspect the GC errors are
 (also obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the
 primary you are looking at (and can of course be much more, depending on
 e.g. vnode count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your
 DevOps team may have a favorite log shipping/monitoring tool, could use
 e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.





 *...*






 *“Life should not be a journey to the grave with the intention of arriving
 safely in apretty and well preserved body, but rather to skid in broadside
 in a cloud of smoke,thoroughly used up, totally worn out, and loudly
 proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
 (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0) 20 8144 9872
 %28%2B44%29%20%280%29%2020%208144%209872*

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs
 are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 (136112946768375392941136215564139626496,0] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair session
 c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
 

Re: Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread Carlos Rolo
Hello,

I always install JNA into the lib directory of java itself

Since I normally have java in /opt/java I put the JNA into /opt/java/lib.

~$ grep  JNA /var/log/cassandra/system.log
INFO  HH:MM:SS JNA mlockall successful

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Wed, Feb 25, 2015 at 9:12 PM, Garret Pick pic...@whistle.com wrote:

 Hello,

 I'm having problems getting cassandra to start with the configuration
 listed above.

 Yum wants to install 3.2.4-2.el6 of the JNA along with several other
 packages including java-1.7.0-openjdk

 The documentation states that a JNA version earlier that 3.2.7 should not
 be used, so the jar file should be downloaded and installed directly into
 C*'s lib directory per


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html

 From /var/log/cassandra/system.log

 all I see is

  INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
 Classpath:
 /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar

 and it never actually starts

 Note that JNA is in the classpath above and is when I remove it, cassandra
 starts successfully.

 I tried installing the DSE package and it looks like it wants to install
 the older 3.2.4 JNA as a dependency so there seems to be a discrepancy in
 documentation

 Per


 http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html

 Note: JNA (Java Native Access) is automatically installed.

 thanks for any help,
 Garret


-- 


--





Re: Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread Garret Pick
Hi,

On this page

http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaRHEL.html

it says

Cassandra requires JNA 3.2.7 or later. Some Yum repositories may provide
earlier versions

and at the bottom

If you can't install using Yum or it provides a version of the JNA earlier
than 3.2.7, install as described in Installing the JNA from the JAR file.

Which version of OS and Cassandra are you running?

thanks,
Garret



On Wed, Feb 25, 2015 at 10:46 AM, J. Ryan Earl o...@jryanearl.us wrote:

 We've been using jna-3.2.4-2.el6.x86_64 with the Sun/Oracle JDK for
probably 2-years now, and it works just fine.  Where are you seeing 3.2.7
required at?  I searched the pages you link and that string isn't even in
there.

 Regardless, I assure you the newest jna that ships in the EL6 repo works
without issues.

 On Wed, Feb 25, 2015 at 2:12 PM, Garret Pick pic...@whistle.com wrote:

 Hello,

 I'm having problems getting cassandra to start with the configuration
listed above.

 Yum wants to install 3.2.4-2.el6 of the JNA along with several other
packages including java-1.7.0-openjdk

 The documentation states that a JNA version earlier that 3.2.7 should
not be used, so the jar file should be downloaded and installed directly
into C*'s lib directory per


http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html

 From /var/log/cassandra/system.log

 all I see is

  INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
Classpath:
/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar

 and it never actually starts

 Note that JNA is in the classpath above and is when I remove it,
cassandra starts successfully.

 I tried installing the DSE package and it looks like it wants to install
the older 3.2.4 JNA as a dependency so there seems to be a discrepancy in
documentation

 Per


http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html

 Note: JNA (Java Native Access) is automatically installed.

 thanks for any help,
 Garret




Re: Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread Carlos Rolo
Also I always install JNA from the JNA page.

I did the installation for this blog post in CentOS 6.5:
http://www.pythian.com/blog/from-0-to-cassandra-an-exhaustive-approach-to-installing-cassandra/

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Wed, Feb 25, 2015 at 9:53 PM, Garret Pick pic...@whistle.com wrote:

 Hi,

 On this page


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaRHEL.html

 it says

 Cassandra requires JNA 3.2.7 or later. Some Yum repositories may provide
 earlier versions

 and at the bottom

 If you can't install using Yum or it provides a version of the JNA
 earlier than 3.2.7, install as described in Installing the JNA from the JAR
 file.

 Which version of OS and Cassandra are you running?

 thanks,
 Garret




 On Wed, Feb 25, 2015 at 10:46 AM, J. Ryan Earl o...@jryanearl.us wrote:
 
  We've been using jna-3.2.4-2.el6.x86_64 with the Sun/Oracle JDK for
 probably 2-years now, and it works just fine.  Where are you seeing 3.2.7
 required at?  I searched the pages you link and that string isn't even in
 there.
 
  Regardless, I assure you the newest jna that ships in the EL6 repo works
 without issues.
 
  On Wed, Feb 25, 2015 at 2:12 PM, Garret Pick pic...@whistle.com wrote:
 
  Hello,
 
  I'm having problems getting cassandra to start with the configuration
 listed above.
 
  Yum wants to install 3.2.4-2.el6 of the JNA along with several other
 packages including java-1.7.0-openjdk
 
  The documentation states that a JNA version earlier that 3.2.7 should
 not be used, so the jar file should be downloaded and installed directly
 into C*'s lib directory per
 
 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html
 
  From /var/log/cassandra/system.log
 
  all I see is
 
   INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
 Classpath:
 /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
 
  and it never actually starts
 
  Note that JNA is in the classpath above and is when I remove it,
 cassandra starts successfully.
 
  I tried installing the DSE package and it looks like it wants to
 install the older 3.2.4 JNA as a dependency so there seems to be a
 discrepancy in documentation
 
  Per
 
 
 http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html
 
  Note: JNA (Java Native Access) is automatically installed.
 
  thanks for any help,
  Garret
 
 


-- 


--





Re: Setting up JNA on CentOS 6.6. with cassandra20-2.0.12 and Oracle Java 1.7.0_75

2015-02-25 Thread J. Ryan Earl
We've been using jna-3.2.4-2.el6.x86_64 with the Sun/Oracle JDK for
probably 2-years now, and it works just fine.  Where are you seeing 3.2.7
required at?  I searched the pages you link and that string isn't even in
there.

Regardless, I assure you the newest jna that ships in the EL6 repo works
without issues.

On Wed, Feb 25, 2015 at 2:12 PM, Garret Pick pic...@whistle.com wrote:

 Hello,

 I'm having problems getting cassandra to start with the configuration
 listed above.

 Yum wants to install 3.2.4-2.el6 of the JNA along with several other
 packages including java-1.7.0-openjdk

 The documentation states that a JNA version earlier that 3.2.7 should not
 be used, so the jar file should be downloaded and installed directly into
 C*'s lib directory per


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJnaTar.html

 From /var/log/cassandra/system.log

 all I see is

  INFO [main] 2015-02-25 20:06:10,202 CassandraDaemon.java (line 191)
 Classpath:
 /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.12.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.12.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar

 and it never actually starts

 Note that JNA is in the classpath above and is when I remove it, cassandra
 starts successfully.

 I tried installing the DSE package and it looks like it wants to install
 the older 3.2.4 JNA as a dependency so there seems to be a discrepancy in
 documentation

 Per


 http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installRHELdse.html

 Note: JNA (Java Native Access) is automatically installed.

 thanks for any help,
 Garret



Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi,

Check how many active CompactionExecutors is showing in nodetool tpstats.
Maybe your concurrent_compactors is too low. Enforce 1 per CPU core,
even it's the default value on 2.1.
Some of our nodes were running with 2 compactors, but we have an 8 core CPU...
After that monitor your nodes to be sure that the value is not too
high. You may get too much IO if you increase concurrent compactors
when using spinning disks.

Regards,

Roni Balthazar

On 25 February 2015 at 16:37, Ja Sam ptrstp...@gmail.com wrote:
 Hi,
 One more thing. Hinted Handoff for last week for all nodes was less than 5.
 For me every READ is a problem because it must open too many files (3
 SSTables), which occurs as an error in reads, repairs, etc.
 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 It is not obvious, because data is replicated to second data center. We
 check it manually for random records we put into Cassandra and we find all
 of them in secondary DC.
 We know about every single GC failure, but this doesn't change anything.
 The problem with GC failure is only one: restart the node. For few days we
 do not have GC errors anymore. It looks for me like memory leaks.
 We use Chef.

 By MANUAL compaction you mean running nodetool compact?  What does it
 change to permanently running compactions?

 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the secondary
 data center (yeah, obvious). I would suspect the GC errors are (also
 obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the primary
 you are looking at (and can of course be much more, depending on e.g. vnode
 count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your DevOps
 team may have a favorite log shipping/monitoring tool, could use e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.




 ...
 “Life should not be a journey to the grave with the intention of arriving
 safely in a
 pretty and well preserved body, but rather to skid in broadside in a
 cloud of smoke,
 thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a
 Ride!”
 - Hunter Thompson

 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0) 20 8144 9872

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor
 (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor 

Turning on internal security with no downtime

2015-02-25 Thread SEAN_R_DURITY
Cassandra 1.2.19

We would like to turn on Cassandra's internal security (PasswordAuthenticator 
and CassandraAuthorizer) on the ring (away from AllowAll). (Clients are already 
passing credentials in their connections.) However, I know all nodes have to be 
switched to those before the basic security objects (system_auth) are created. 
So, an outage would be required to change all the nodes, let system_auth get 
created, alter system_auth for replication strategy, create all the 
users/permissions, repair system_auth.

For DataStax, there is a TransitionalAuthorizer that allows the system_auth to 
get created, but doesn't really require passwords. So, with a double, rolling 
bounce, you can implement security with no downtime. Anything like that for 
open source? Any other ways you have activated security without downtime?



Sean R. Durity





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Why and How I didn't get the result back in cqlsh

2015-02-25 Thread Duncan Sands

Hi,

On 26/02/15 01:24, java8964 wrote:
...

select * from myTable;
   59 |  336 | 1100390163336 | A |
[{updated_at:1424844362530,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]
   59 |  336 | 1100390163336 | D |
[{updated_at:1424844365062,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]

Obviously, the table has lots of data. Now the problem is I cannot get any data
back in my query using key of existing data. Why?

cqlsh:mykeyspace select * from myTable where key=59 and key2=336;
cqlsh:mykeyspace select * from myTable where key=59 and key2=336;


try at a higher consistency level, eg first do this in cqlsh:
  CONSISTENCY ALL;
then try your queries.  If that works then the issue is that some replicas are 
missing data.  The default cqlsh consistency level is ONE.


Best wishes, Duncan.


RE: Why and How I didn't get the result back in cqlsh

2015-02-25 Thread java8964
Hi, Duncan:
Thanks for your reply, but it didn't help.
yzhang@yzhangmac1:~/dse/bin$ ./cqlsh hostname 9160 -u user -p passwordConnected 
to P2 QA Cluster at xxx:9160.[cqlsh 3.1.2 | Cassandra 1.2.18.1 | CQL spec 3.0.0 
| Thrift protocol 19.36.2]Use HELP for help.cqlsh use 
myKeyspace;cqlsh:myKeyspace  consistency all;Consistency level set to 
ALL.cqlsh: myKeyspace select * from myTable where key=59 and 
key2=336;cqlsh: myKeyspace  select * from myTable where key=59 and 
key2=336;cqlsh: myKeyspace 
This table in fact was created by old column family way in Cassandra 1.1, using 
composite key and composite column names.After we upgrade to Cassandra 1.2, you 
can see the column name in CQL comes from the Cassandra. So this table is NOT 
created in CQL.
I think it maybe due to the column name key is a reserved word. But even I 
quote it like key in the CQL query, it still didn't help.
 Date: Thu, 26 Feb 2015 03:55:04 +0100
 From: duncan.sa...@gmail.com
 To: user@cassandra.apache.org
 Subject: Re: Why and How I didn't get the result back in cqlsh
 
 Hi,
 
 On 26/02/15 01:24, java8964 wrote:
 ...
  select * from myTable;
 59 |  336 | 1100390163336 | A |
  [{updated_at:1424844362530,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]
 59 |  336 | 1100390163336 | D |
  [{updated_at:1424844365062,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]
 
  Obviously, the table has lots of data. Now the problem is I cannot get any 
  data
  back in my query using key of existing data. Why?
 
  cqlsh:mykeyspace select * from myTable where key=59 and key2=336;
  cqlsh:mykeyspace select * from myTable where key=59 and key2=336;
 
 try at a higher consistency level, eg first do this in cqlsh:
CONSISTENCY ALL;
 then try your queries.  If that works then the issue is that some replicas 
 are 
 missing data.  The default cqlsh consistency level is ONE.
 
 Best wishes, Duncan.
  

Re: Node stuck in joining the ring

2015-02-25 Thread Robert Coli
On Wed, Feb 25, 2015 at 3:38 PM, Batranut Bogdan batra...@yahoo.com wrote:

 I have a new node that I want to add to the ring. The problem is that
 nodetool says UJ I have left it for several days and the status has not
 changed. In Opscenter it is seen as in an unknown cluster.


If I were you, I would do the following [1]  :

1) stop the joining node
2) make sure that the other nodes no longer see it joining
3) wipe the joining node's data directory
4) verify cluster name is correct in cassandra.yaml, and matches the other
nodes
5) re-join the node

What version of Cassandra?

=Rob
[1] Which, jeesh, I should put into a dealing with failed bootstrap blog
post one of these days...


Why and How I didn't get the result back in cqlsh

2015-02-25 Thread java8964
Here is the version of the cqlsh and Cassandra I am using:
yzhang@yzhangmac1:~/dse/bin$ ./cqlsh hostname 9160 -u username -p 
passwordConnected to P2 QA Cluster at c1-cass01.roving.com:9160.[cqlsh 3.1.2 | 
Cassandra 1.2.18.1 | CQL spec 3.0.0 | Thrift protocol 19.36.2]Use HELP for 
help.cqlsh use mykeyspace;cqlsh: mykeyspace 
cqlsh:automation_d1 describe table myTable;
CREATE TABLE myTable (  key varint,  key2 varint,  column1 bigint,  column2 
ascii,  value text,  PRIMARY KEY ((key, key2), column1, column2)) WITH COMPACT 
STORAGE AND  bloom_filter_fp_chance=0.01 AND  caching='KEYS_ONLY' AND  
comment='' AND  dclocal_read_repair_chance=0.00 AND  
gc_grace_seconds=864000 AND  read_repair_chance=0.10 AND  
replicate_on_write='true' AND  compaction={'class': 
'SizeTieredCompactionStrategy'} AND  compression={'sstable_compression': 
'SnappyCompressor'};
select * from myTable;   59 |  336 | 1100390163336 | A |   
[{updated_at:1424844362530,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]  59 
|  336 | 1100390163336 | D |   
[{updated_at:1424844365062,ids:668e5520-bb71-11e4-aecd-00163e56be7c}]
Obviously, the table has lots of data. Now the problem is I cannot get any data 
back in my query using key of existing data. Why?
cqlsh:mykeyspace select * from myTable where key=59 and 
key2=336;cqlsh:mykeyspace select * from myTable where key=59 and 
key2=336;
As you can see, I know key=59 and key2=336 existed, but no matter what I try, I 
cannot query them out. The last 2 query didn't return the any result to me.
Now I tried the cassandra-cli:
./cassandra-cli -h hostname -u username -pw password[user1@unknown] use 
mykeyspace;Authenticated to keyspace: mykeyspace[user1@unknown] list 
myTable. lots of 
data---RowKey: 51:855= (name=1100393052855:D, 
value=[{updated_at:1424269592866,id:fc31b6d0-5479-11e4-8a79-00163e56be7c}],
 timestamp=1424269592866000, ttl=2764800)---RowKey: 59:336= 
(name=1100390163336:A, 
value=[{updated_at:1424844362530,id:668e5520-bb71-11e4-aecd-00163e56be7c}],
 timestamp=1424844362533000, ttl=2764800)= (name=1100390163336:D, 
value=[{updated_at:1424844365062,id:668e5520-bb71-11e4-aecd-00163e56be7c}],
 timestamp=1424844365063000, ttl=2764800)
[default@mykeyspace] get myTable['59:336'];Returned 0 results.Elapsed time: 62 
msec(s).
Why I cannot get the data by key, in neither cqlsh nor cassandra-cli?
Thanks
Yong
  

Node stuck in joining the ring

2015-02-25 Thread Batranut Bogdan
Hello all,
I have a new node that I want to add to the ring. The problem is that nodetool 
says UJ I have left it for several days and the status has not changed. In 
Opscenter it is seen as in an unknown cluster. 
From the time that I started it, it was streaming data and the data size is 
5,9 TB. This is very strange since all other nodes in the cluster have about 
3,3 TB of data. Also tonight I saw that it stopped getting streams and the 
status in nodetool was still UJ. So I thought to decommission the node delete 
the data and start again. Nodetool throws unsupported operation: local node is 
not a member of the token ring yet. So I have just restarted the node. Now 
streaming data begins again. At this rate, I'll run out of disk space on that 
node.One ideea that comes to mind is to stop, clear the data and restart. But 
I am not sure about the implications for that. Also, I have tried nodetool 
join. I got: This node has already joined the ring.
So nodetool status says UJ but nodetool join says otherwise, or am I not 
understanding someting here.
Any ideeas?  

TTL

2015-02-25 Thread Parth Setya
Hi

I am adding a new expiring column to an existing column family in
cassandra. I want this new column to be expired at the same time as all the
other expiring columns in the Column Family.
One way of doing this is to get the ttl of existing expiring Columns in
that CF and set that value in my new column. But i want to avoid querying
the database.
Is there any other way achieving the same without querying the row?


API: HECTOR
Version: Cassandra 2.0.3


Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
I write some question before about my problems with C* cluster. All my
environment is described here:
https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
To sum up I have thousands SSTables in one DC and much much less in second.
I write only to first DC.

Anyway after reading a lot of post/mails/google I start to think that the
only reason of above is disk problems.

My OpsCenter with some stats is following:
https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

My iostats are like this:
https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
(dm-XX are C* drives. dm-11 is for commitlog)

If You could be so kind and validate above and give me an answer is my disk
are real problems or not? And give me a tip what should I do with above
cluster? Maybe I have misconfiguration?

Regards
Piotrek


Re: Schema changes: where in Java code are they sent?

2015-02-25 Thread Richard Dawe
Good morning,

Sorry for the slow reply here. I finally had some time to test cqlsh tracing on 
a ccm cluster with 2 of 3 nodes down, to see if the unavailable error was due 
to cqlsh or my query. Reply inline below.

On 15/01/2015 12:46, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:

On Thu, Jan 15, 2015 at 6:30 AM, Richard Dawe 
rich.d...@messagesystems.commailto:rich.d...@messagesystems.com wrote:

I thought it might be quorum consistency level, because of the because I was 
seeing with cqlsh. I was testing with ccm with C* 2.0.8, 3 nodes, vnodes 
enabled (ccm create test -v 2.0.8 -n 3 --vnodes -s”). With all three nodes up, 
my schema operations were working fine. When I took down two nodes using “ccm 
node2 stop”, “ccm node3 stop”, I found that schema operations through “ccm 
node1 cqlsh” were failing like this:

  cqlsh ALTER TABLE test.test3 ADD fred text;
  Unable to complete request: one or more nodes were unavailable.

That’s the full output — I had enabled tracing, but only that error came back.

After reading your reply, I went back and re-ran my tests with cqlsh, and it 
seems like the “one or more nodes were unavailable” may be due to cqlsh’s error 
handling.

If I wait a bit, and re-run my schema operations, they work fine with only one 
node up. I can see in the tracing that it’s only talking to node1 (127.0.0.1) 
to make the schema modifications.

Is this a known issue in cqlsh? If it helps I can send the full command-line 
session log.

That Unavailable error may actually be from the tracing-related queries failing 
(that's what I suspect, at least).  Starting cqlsh with --debug might show you 
a stacktrace in that case, but I'm not 100% sure.

Yes, it does seem to be cqlsh tracing. The debug output below was generated 
with:

 * A 3 node ccm cluster, running Cassandra 2.0.8 on Ubuntu 14.10 x86_64.
 * I took down 2 of the 3 nodes.
 * Table test5 has a replication factor of 3, primary key is “id text”.
 * cqlsh session was started after 2 of the 3 nodes had been shut down.

Debug output:

rdawe@cstar:~$ ccm node1 cqlsh --debug
Using CQL driver: module 'cql' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'
Using thrift lib: module 'thrift' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'
Connected to test at 127.0.0.1:9160.
[cqlsh 4.1.1 | Cassandra 2.0.8-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 
19.39.0]
Use HELP for help.
cqlsh USE test;
cqlsh:test TRACING ON
Now tracing requests.
cqlsh:test SELECT * FROM test5;

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

Traceback (most recent call last):
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 827, in onecmd
self.handle_statement(st, statementtext)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 865, in 
handle_statement
return custom_handler(parsed)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 901, in do_select
with_default_limit=with_default_limit)
  File /home/rdawe/.ccm/repository/2.0.8/bin/cqlsh, line 910, in 
perform_statement
print_trace_session(self, self.cursor, session_id)
  File /home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py, 
line 26, in print_trace_session
rows  = fetch_trace_session(cursor, session_id)
  File /home/rdawe/.ccm/repository/2.0.8/bin/../pylib/cqlshlib/tracing.py, 
line 47, in fetch_trace_session
consistency_level='ONE')
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/cursor.py,
 line 80, in execute
response = self.get_response(prepared_q, cl)
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py,
 line 77, in get_response
return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
  File 
/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/thrifteries.py,
 line 102, in handle_cql_execution_errors
raise cql.OperationalError(Unable to complete request: one or 
OperationalError: Unable to complete request: one or more nodes were 
unavailable.

Sometimes I get a different error:

rdawe@cstar:~$ echo -e 'TRACING ON\nSELECT * FROM test.test5;\n' | ccm node1 
cqlsh --debug
Using CQL driver: module 'cql' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/cql-internal-only-1.4.1.zip/cql-1.4.1/cql/__init__.py'
Using thrift lib: module 'thrift' from 
'/home/rdawe/.ccm/repository/2.0.8/bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/__init__.py'
Now tracing requests.

 id| foo
---+---
 blarg |  ness
 hello | world

(2 rows)

stdin:3:Session edc8c010-bcd5-11e4-a008-1dd7f4de70a1 wasn't found.

I notice that the system_traces keyspace has replication factor 2. Since 2 
nodes are down, perhaps sometimes the tracing session would be stored on nodes 
that are down. And other times one of the two replicas for 

Re: Possible problem with disk latency

2015-02-25 Thread Carlos Rolo
Your latency doesn't seem that high that can cause that problem. I suspect
more of a problem with the Cassandra version (2.1.3) than that with the
hard drives. I didn't look deep into the information provided but for your
reference, the only time I had serious (leading to OOM and all sort of
weird behavior) my hard drives where near 70ms latency.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Wed, Feb 25, 2015 at 11:19 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 I write some question before about my problems with C* cluster. All my
 environment is described here:
 https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
 To sum up I have thousands SSTables in one DC and much much less in
 second. I write only to first DC.

 Anyway after reading a lot of post/mails/google I start to think that the
 only reason of above is disk problems.

 My OpsCenter with some stats is following:
 https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

 My iostats are like this:
 https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
 (dm-XX are C* drives. dm-11 is for commitlog)

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?

 Regards
 Piotrek


-- 


--