Re: Possible problem with disk latency

2015-02-26 Thread Ja Sam
We did this query, most our files are less than 100MB.

Our heap setting are like (they are calculatwed using scipr in
cassandra.env):
MAX_HEAP_SIZE=8GB
HEAP_NEWSIZE=2GB
which is maximum recommended by DataStax.

What values do you think we should try?





On Thu, Feb 26, 2015 at 10:06 AM, Roland Etzenhammer 
r.etzenham...@t-online.de wrote:

 Hi Piotrek,

 your disks are mostly idle as far as I can see (the one with 17% busy
 isn't that high on load). One thing came up to my mind did you look on
 the sizes of your sstables? I did this with something like

 find /var/lib/cassandra/data -type f -size -1k -name *Data.db | wc
 find /var/lib/cassandra/data -type f -size -10k -name *Data.db | wc
 find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc
 ...
 find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc

 Your count is growing from opscenter - and if there are many really
 small tables I would guess you are running out of heap. If memory
 pressure is high it is likely that there will be much flushes of
 memtables to disk with many small files - had this once. You can
 increase heap in cassandra-env.sh, but be careful.

 Best regards,
 Roland




Re: Possible problem with disk latency

2015-02-26 Thread Roland Etzenhammer

Hi,

8GB Heap is a good value already - going above 8GB will often result in 
noticeable gc pause times in java, but you can give 12G a try just to 
see if that helps (and turn it back down again). You can add a Heap 
Used graph in opscenter to get a quick overview of your heap state.


Best regards,
Roland




Re: Possible problem with disk latency

2015-02-26 Thread Ja Sam
Hi, Ron
I look deep into my cassandra files and SSTables created during last day
are less than 20MB.

Piotrek

p.s. Your tips are really useful at least I am starting to finding where
exactly the problem is.

On Thu, Feb 26, 2015 at 3:11 PM, Ja Sam ptrstp...@gmail.com wrote:

 We did this query, most our files are less than 100MB.

 Our heap setting are like (they are calculatwed using scipr in
 cassandra.env):
 MAX_HEAP_SIZE=8GB
 HEAP_NEWSIZE=2GB
 which is maximum recommended by DataStax.

 What values do you think we should try?





 On Thu, Feb 26, 2015 at 10:06 AM, Roland Etzenhammer 
 r.etzenham...@t-online.de wrote:

 Hi Piotrek,

 your disks are mostly idle as far as I can see (the one with 17% busy
 isn't that high on load). One thing came up to my mind did you look on
 the sizes of your sstables? I did this with something like

 find /var/lib/cassandra/data -type f -size -1k -name *Data.db | wc
 find /var/lib/cassandra/data -type f -size -10k -name *Data.db | wc
 find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc
 ...
 find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc

 Your count is growing from opscenter - and if there are many really
 small tables I would guess you are running out of heap. If memory
 pressure is high it is likely that there will be much flushes of
 memtables to disk with many small files - had this once. You can
 increase heap in cassandra-env.sh, but be careful.

 Best regards,
 Roland





Re: Possible problem with disk latency

2015-02-26 Thread Roland Etzenhammer

Hi Piotrek,

your disks are mostly idle as far as I can see (the one with 17% busy
isn't that high on load). One thing came up to my mind did you look on
the sizes of your sstables? I did this with something like

find /var/lib/cassandra/data -type f -size -1k -name *Data.db | wc
find /var/lib/cassandra/data -type f -size -10k -name *Data.db | wc
find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc
...
find /var/lib/cassandra/data -type f -size -100k -name *Data.db | wc

Your count is growing from opscenter - and if there are many really
small tables I would guess you are running out of heap. If memory
pressure is high it is likely that there will be much flushes of
memtables to disk with many small files - had this once. You can
increase heap in cassandra-env.sh, but be careful.

Best regards,
Roland



Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
I read that I shouldn't install version less than 6 in the end. But I
started with 2.1.0. Then I upgraded to 2.1.3.

But as I know, I cannot downgrade it

On Wed, Feb 25, 2015 at 12:05 PM, Carlos Rolo r...@pythian.com wrote:

 Your latency doesn't seem that high that can cause that problem. I suspect
 more of a problem with the Cassandra version (2.1.3) than that with the
 hard drives. I didn't look deep into the information provided but for your
 reference, the only time I had serious (leading to OOM and all sort of
 weird behavior) my hard drives where near 70ms latency.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Wed, Feb 25, 2015 at 11:19 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 I write some question before about my problems with C* cluster. All my
 environment is described here:
 https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
 To sum up I have thousands SSTables in one DC and much much less in
 second. I write only to first DC.

 Anyway after reading a lot of post/mails/google I start to think that the
 only reason of above is disk problems.

 My OpsCenter with some stats is following:
 https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

 My iostats are like this:
 https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
 (dm-XX are C* drives. dm-11 is for commitlog)

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?

 Regards
 Piotrek



 --






Re: Possible problem with disk latency

2015-02-25 Thread Nate McCall

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?



You disks are effectively idle. What consistency level are you using for
reads and writes?

Actually, 'await' is sort of weirdly high for idle SSDs. Check your
interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
not being stacked on a single CPU.


Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi Ja,

How are the pending compactions distributed between the nodes?
Run nodetool compactionstats on all of your nodes and check if the
pendings tasks are balanced or they are concentrated in only few
nodes.
You also can check the if the SSTable count is balanced running
nodetool cfstats on your nodes.

Cheers,

Roni Balthazar



On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
 I do NOT have SSD. I have normal HDD group by JBOD.
 My CF have SizeTieredCompactionStrategy
 I am using local quorum for reads and writes. To be precise I have a lot of
 writes and almost 0 reads.
 I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
 compactionthrouput to 999.

 So if my disk are idle, my CPU is less then 40%, I have some free RAM - why
 SSTables count is growing? How I can speed up compactions?

 On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote:



 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with above
 cluster? Maybe I have misconfiguration?



 You disks are effectively idle. What consistency level are you using for
 reads and writes?

 Actually, 'await' is sort of weirdly high for idle SSDs. Check your
 interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
 not being stacked on a single CPU.





Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi Roni,

It is not balanced. As I wrote you last week I have problems only in DC in
which we writes (on screen it is named as AGRAF:
https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The
problem is on ALL nodes in this dc.
In second DC (ZETO) only one node have more than 30 SSTables and pending
compactions are decreasing to zero.

In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg on
screen from opscenter is less then 5000)


Regards
Piotrek.

p.s. I don't know why my mail client display my name as Ja Sam instead of
Piotr Stapp, but this doesn't change anything :)


On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 Hi Ja,

 How are the pending compactions distributed between the nodes?
 Run nodetool compactionstats on all of your nodes and check if the
 pendings tasks are balanced or they are concentrated in only few
 nodes.
 You also can check the if the SSTable count is balanced running
 nodetool cfstats on your nodes.

 Cheers,

 Roni Balthazar



 On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
  I do NOT have SSD. I have normal HDD group by JBOD.
  My CF have SizeTieredCompactionStrategy
  I am using local quorum for reads and writes. To be precise I have a lot
 of
  writes and almost 0 reads.
  I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
  compactionthrouput to 999.
 
  So if my disk are idle, my CPU is less then 40%, I have some free RAM -
 why
  SSTables count is growing? How I can speed up compactions?
 
  On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com
 wrote:
 
 
 
  If You could be so kind and validate above and give me an answer is my
  disk are real problems or not? And give me a tip what should I do with
 above
  cluster? Maybe I have misconfiguration?
 
 
 
  You disks are effectively idle. What consistency level are you using for
  reads and writes?
 
  Actually, 'await' is sort of weirdly high for idle SSDs. Check your
  interrupt mappings (cat /proc/interrupts) and make sure the interrupts
 are
  not being stacked on a single CPU.
 
 
 



Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
I do NOT have SSD. I have normal HDD group by JBOD.
My CF have SizeTieredCompactionStrategy
I am using local quorum for reads and writes. To be precise I have a lot of
writes and almost 0 reads.
I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
compactionthrouput to 999.

So if my disk are idle, my CPU is less then 40%, I have some free RAM - why
SSTables count is growing? How I can speed up compactions?

On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote:



 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?



 You disks are effectively idle. What consistency level are you using for
 reads and writes?

 Actually, 'await' is sort of weirdly high for idle SSDs. Check your
 interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
 not being stacked on a single CPU.





Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi Roni,
The repair results is following (we run it Friday): Cannot proceed on
repair because a neighbor (/192.168.61.201) is dead: session failed

But to be honest the neighbor did not died. It seemed to trigger a series
of full GC events on the initiating node. The results form logs are:

[2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
[2015-02-21 02:21:55,640] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:22:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:23:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:24:55,644] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 04:41:08,607] Repair session
d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]
failed with error org.apache.cassandra.exceptions.RepairException: [repair
#d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
Sync failed between /192.168.71.196 and /192.168.61.199
[2015-02-21 04:41:08,608] Repair session
eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
(68056473384187696470568107782069813248,85070591730234615865843651857942052874]
failed with error java.io.IOException: Endpoint /192.168.61.199 died
[2015-02-21 04:41:08,608] Repair session
c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
(42535295865117307932921825928971026442,68056473384187696470568107782069813248]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
(127605887595351923798765477786913079306,136112946768375392941136215564139626496]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,619] Repair session
c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
(136112946768375392941136215564139626496,0] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair session
c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
(102084710076281535261119195933814292480,127605887595351923798765477786913079306]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair command #2 finished


We tried to run repair one more time. After 24 hour have some streaming
errors. Moreover, 2-3 hours later, we have to stop it because we start to
have write timeouts on client and our system starts to dying.
The iostats from dying time plus tpstats are available here:
https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view



On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 Hi Piotr,

 Are your repairs finishing without errors?

 Regards,

 Roni Balthazar

 On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote:
  Hi, Roni,
  They aren't exactly balanced but as I wrote before they are in range from
  2500-6000.
  If you need exactly data I will check them tomorrow morning. But all
 nodes
  in AGRAF have small increase of pending compactions during last week,
 which
  is wrong direction
 
  I will check in the morning get compaction throuput, but my feeling about
  this parameter is that it doesn't change anything.
 
  Regards
  Piotr
 
 
 
 
  On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com
 
  wrote:
 
  Hi Piotr,
 
  What about the nodes on AGRAF? Are the pending tasks balanced between
  this DC nodes as well?
  You can check the pending compactions on each node.
 
  Also try to run nodetool getcompactionthroughput on all nodes and
  check if the compaction throughput is set to 999.
 
  Cheers,
 
  Roni Balthazar
 
  On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote:
   Hi Roni,
  
   It is not balanced. As I wrote you last week I have problems only in
 DC
   in
   which we writes (on screen it is named as AGRAF:
   https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view).
 The
   problem is on ALL nodes in this dc.
   In second DC (ZETO) only one node have more than 30 SSTables and
 pending
   compactions are decreasing to zero.
  
   In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg
   on
   screen from opscenter is less 

Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
One more thing. Hinted Handoff for last week for all nodes was less than 5.
For me every READ is a problem because it must open too many files (3
SSTables), which occurs as an error in reads, repairs, etc.
Regards
Piotrek

On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 It is not obvious, because data is replicated to second data center. We
 check it manually for random records we put into Cassandra and we find
 all of them in secondary DC.
 We know about every single GC failure, but this doesn't change anything.
 The problem with GC failure is only one: restart the node. For few days we
 do not have GC errors anymore. It looks for me like memory leaks.
 We use Chef.

 By MANUAL compaction you mean running nodetool compact?  What does it
 change to permanently running compactions?

 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the
 secondary data center (yeah, obvious). I would suspect the GC errors are
 (also obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the
 primary you are looking at (and can of course be much more, depending on
 e.g. vnode count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your
 DevOps team may have a favorite log shipping/monitoring tool, could use
 e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.





 *...*






 *“Life should not be a journey to the grave with the intention of
 arriving safely in apretty and well preserved body, but rather to skid in
 broadside in a cloud of smoke,thoroughly used up, totally worn out, and
 loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
 ReiydelleUSA (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0)
 20 8144 9872 %28%2B44%29%20%280%29%2020%208144%209872*

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs
 are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi Piotr,

Are your repairs finishing without errors?

Regards,

Roni Balthazar

On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote:
 Hi, Roni,
 They aren't exactly balanced but as I wrote before they are in range from
 2500-6000.
 If you need exactly data I will check them tomorrow morning. But all nodes
 in AGRAF have small increase of pending compactions during last week, which
 is wrong direction

 I will check in the morning get compaction throuput, but my feeling about
 this parameter is that it doesn't change anything.

 Regards
 Piotr




 On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 Hi Piotr,

 What about the nodes on AGRAF? Are the pending tasks balanced between
 this DC nodes as well?
 You can check the pending compactions on each node.

 Also try to run nodetool getcompactionthroughput on all nodes and
 check if the compaction throughput is set to 999.

 Cheers,

 Roni Balthazar

 On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote:
  Hi Roni,
 
  It is not balanced. As I wrote you last week I have problems only in DC
  in
  which we writes (on screen it is named as AGRAF:
  https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The
  problem is on ALL nodes in this dc.
  In second DC (ZETO) only one node have more than 30 SSTables and pending
  compactions are decreasing to zero.
 
  In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg
  on
  screen from opscenter is less then 5000)
 
 
  Regards
  Piotrek.
 
  p.s. I don't know why my mail client display my name as Ja Sam instead
  of
  Piotr Stapp, but this doesn't change anything :)
 
 
  On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar
  ronibaltha...@gmail.com
  wrote:
 
  Hi Ja,
 
  How are the pending compactions distributed between the nodes?
  Run nodetool compactionstats on all of your nodes and check if the
  pendings tasks are balanced or they are concentrated in only few
  nodes.
  You also can check the if the SSTable count is balanced running
  nodetool cfstats on your nodes.
 
  Cheers,
 
  Roni Balthazar
 
 
 
  On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
   I do NOT have SSD. I have normal HDD group by JBOD.
   My CF have SizeTieredCompactionStrategy
   I am using local quorum for reads and writes. To be precise I have a
   lot
   of
   writes and almost 0 reads.
   I changed cold_reads_to_omit to 0.0 as someone suggest me. I used
   set
   compactionthrouput to 999.
  
   So if my disk are idle, my CPU is less then 40%, I have some free RAM
   -
   why
   SSTables count is growing? How I can speed up compactions?
  
   On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com
   wrote:
  
  
  
   If You could be so kind and validate above and give me an answer is
   my
   disk are real problems or not? And give me a tip what should I do
   with
   above
   cluster? Maybe I have misconfiguration?
  
  
  
   You disks are effectively idle. What consistency level are you using
   for
   reads and writes?
  
   Actually, 'await' is sort of weirdly high for idle SSDs. Check your
   interrupt mappings (cat /proc/interrupts) and make sure the
   interrupts
   are
   not being stacked on a single CPU.
  
  
  
 
 




Re: Possible problem with disk latency

2015-02-25 Thread daemeon reiydelle
I think you may have a vicious circle of errors: because your data is not
properly replicated to the neighbour, it is not replicating to the
secondary data center (yeah, obvious). I would suspect the GC errors are
(also obviously) the result of a backlog of compactions that take out the
neighbour (assuming replication of 3, that means each neighbour is
participating in compaction from at least one other node besides the
primary you are looking at (and can of course be much more, depending on
e.g. vnode count if used).

What happens is that when a node fails due to a GC error (can't reclaim
space), that causes a cascade of other errors, as you see. Might I suggest
you have someone in devops with monitoring experience install a monitoring
tool that will notify you of EVERY SINGLE java GC failure event? Your
DevOps team may have a favorite log shipping/monitoring tool, could use
e.g. Puppet

I think you may have to go through a MANUAL, table by table compaction.





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a series
 of full GC events on the initiating node. The results form logs are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 (136112946768375392941136215564139626496,0] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair session
 c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
 (102084710076281535261119195933814292480,127605887595351923798765477786913079306]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair command #2 finished


 We tried to run repair one more time. After 24 hour have some streaming
 errors. Moreover, 2-3 hours later, we have to stop it because we start to
 have write timeouts on client and our system starts to dying.
 The iostats from dying time plus tpstats are available here:
 https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view



 On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 Hi Piotr,

 Are your repairs finishing without errors?

 Regards,

 Roni Balthazar

 On 25 

Re: Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
It is not obvious, because data is replicated to second data center. We
check it manually for random records we put into Cassandra and we find
all of them in secondary DC.
We know about every single GC failure, but this doesn't change anything.
The problem with GC failure is only one: restart the node. For few days we
do not have GC errors anymore. It looks for me like memory leaks.
We use Chef.

By MANUAL compaction you mean running nodetool compact?  What does it
change to permanently running compactions?

Regards
Piotrek

On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the
 secondary data center (yeah, obvious). I would suspect the GC errors are
 (also obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the
 primary you are looking at (and can of course be much more, depending on
 e.g. vnode count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your
 DevOps team may have a favorite log shipping/monitoring tool, could use
 e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.





 *...*






 *“Life should not be a journey to the grave with the intention of arriving
 safely in apretty and well preserved body, but rather to skid in broadside
 in a cloud of smoke,thoroughly used up, totally worn out, and loudly
 proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
 (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0) 20 8144 9872
 %28%2B44%29%20%280%29%2020%208144%209872*

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs
 are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
 (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,619] Repair session
 c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
 (136112946768375392941136215564139626496,0] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor (/
 192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,620] Repair session
 c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
 

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar
Hi,

Check how many active CompactionExecutors is showing in nodetool tpstats.
Maybe your concurrent_compactors is too low. Enforce 1 per CPU core,
even it's the default value on 2.1.
Some of our nodes were running with 2 compactors, but we have an 8 core CPU...
After that monitor your nodes to be sure that the value is not too
high. You may get too much IO if you increase concurrent compactors
when using spinning disks.

Regards,

Roni Balthazar

On 25 February 2015 at 16:37, Ja Sam ptrstp...@gmail.com wrote:
 Hi,
 One more thing. Hinted Handoff for last week for all nodes was less than 5.
 For me every READ is a problem because it must open too many files (3
 SSTables), which occurs as an error in reads, repairs, etc.
 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 It is not obvious, because data is replicated to second data center. We
 check it manually for random records we put into Cassandra and we find all
 of them in secondary DC.
 We know about every single GC failure, but this doesn't change anything.
 The problem with GC failure is only one: restart the node. For few days we
 do not have GC errors anymore. It looks for me like memory leaks.
 We use Chef.

 By MANUAL compaction you mean running nodetool compact?  What does it
 change to permanently running compactions?

 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the secondary
 data center (yeah, obvious). I would suspect the GC errors are (also
 obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the primary
 you are looking at (and can of course be much more, depending on e.g. vnode
 count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your DevOps
 team may have a favorite log shipping/monitoring tool, could use e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.




 ...
 “Life should not be a journey to the grave with the intention of arriving
 safely in a
 pretty and well preserved body, but rather to skid in broadside in a
 cloud of smoke,
 thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a
 Ride!”
 - Hunter Thompson

 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0) 20 8144 9872

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor
 (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor 

Possible problem with disk latency

2015-02-25 Thread Ja Sam
Hi,
I write some question before about my problems with C* cluster. All my
environment is described here:
https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
To sum up I have thousands SSTables in one DC and much much less in second.
I write only to first DC.

Anyway after reading a lot of post/mails/google I start to think that the
only reason of above is disk problems.

My OpsCenter with some stats is following:
https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

My iostats are like this:
https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
(dm-XX are C* drives. dm-11 is for commitlog)

If You could be so kind and validate above and give me an answer is my disk
are real problems or not? And give me a tip what should I do with above
cluster? Maybe I have misconfiguration?

Regards
Piotrek


Re: Possible problem with disk latency

2015-02-25 Thread Carlos Rolo
Your latency doesn't seem that high that can cause that problem. I suspect
more of a problem with the Cassandra version (2.1.3) than that with the
hard drives. I didn't look deep into the information provided but for your
reference, the only time I had serious (leading to OOM and all sort of
weird behavior) my hard drives where near 70ms latency.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Wed, Feb 25, 2015 at 11:19 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 I write some question before about my problems with C* cluster. All my
 environment is described here:
 https://www.mail-archive.com/user@cassandra.apache.org/msg40982.html
 To sum up I have thousands SSTables in one DC and much much less in
 second. I write only to first DC.

 Anyway after reading a lot of post/mails/google I start to think that the
 only reason of above is disk problems.

 My OpsCenter with some stats is following:
 https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view

 My iostats are like this:
 https://drive.google.com/file/d/0B4N_AbBPGGwLTTZEeG1SYkF0cXc/view
 (dm-XX are C* drives. dm-11 is for commitlog)

 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with
 above cluster? Maybe I have misconfiguration?

 Regards
 Piotrek


-- 


--