from:"Roni"

Re: IO scheduler for SSDs on EC2?

2015-03-15 Thread Roni Balthazar

Hi Ali,

The best practice is to use the noop scheduler on array of SSDs behind
your block device (Hardware RAID controller).
If you are using only one SSD disk, the deadline scheduler is the best
choice to reduce IO latency.
It is not recommended to set cfq on SSDs disks.

Regards,

Roni Balthazar

On 15 March 2015 at 09:03, Ali Akhtar ali.rac...@gmail.com wrote:
 I was watching a talk recently on Elasticsearch performance in EC2, and they
 recommended setting the IO scheduler to noop for SSDs. Is that the case for
 Cassandra as well, or is it recommended to keep the default 'deadline'
 scheduler for Cassandra?

 Thanks.

Downgrade Cassandra from 2.1.x to 2.0.x

2015-03-06 Thread Roni Balthazar

Hi there,

What is the best way to downgrade a C* 2.1.3 cluster to the stable 2.0.12?
I know it's not supported, but we are getting too many issues with the 2.1.x...
It is leading us to think that the best solution is to use the stable version.
Is there a safe way to do that?

Cheers,

Roni

OOM and high SSTables count

2015-03-04 Thread Roni Balthazar

$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_31]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
~[apache-cassandra-2.1.3.jar:2.1.3]

So I am asking how to debug this issue and what are the best practices
in this situation?

Regards,

Roni

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar

Hi Ja,

How are the pending compactions distributed between the nodes?
Run nodetool compactionstats on all of your nodes and check if the
pendings tasks are balanced or they are concentrated in only few
nodes.
You also can check the if the SSTable count is balanced running
nodetool cfstats on your nodes.

Cheers,

Roni Balthazar



On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
 I do NOT have SSD. I have normal HDD group by JBOD.
 My CF have SizeTieredCompactionStrategy
 I am using local quorum for reads and writes. To be precise I have a lot of
 writes and almost 0 reads.
 I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set
 compactionthrouput to 999.

 So if my disk are idle, my CPU is less then 40%, I have some free RAM - why
 SSTables count is growing? How I can speed up compactions?

 On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote:



 If You could be so kind and validate above and give me an answer is my
 disk are real problems or not? And give me a tip what should I do with above
 cluster? Maybe I have misconfiguration?



 You disks are effectively idle. What consistency level are you using for
 reads and writes?

 Actually, 'await' is sort of weirdly high for idle SSDs. Check your
 interrupt mappings (cat /proc/interrupts) and make sure the interrupts are
 not being stacked on a single CPU.

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar

Hi Piotr,

Are your repairs finishing without errors?

Regards,

Roni Balthazar

On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote:
 Hi, Roni,
 They aren't exactly balanced but as I wrote before they are in range from
 2500-6000.
 If you need exactly data I will check them tomorrow morning. But all nodes
 in AGRAF have small increase of pending compactions during last week, which
 is wrong direction

 I will check in the morning get compaction throuput, but my feeling about
 this parameter is that it doesn't change anything.

 Regards
 Piotr




 On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 Hi Piotr,

 What about the nodes on AGRAF? Are the pending tasks balanced between
 this DC nodes as well?
 You can check the pending compactions on each node.

 Also try to run nodetool getcompactionthroughput on all nodes and
 check if the compaction throughput is set to 999.

 Cheers,

 Roni Balthazar

 On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote:
  Hi Roni,
 
  It is not balanced. As I wrote you last week I have problems only in DC
  in
  which we writes (on screen it is named as AGRAF:
  https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The
  problem is on ALL nodes in this dc.
  In second DC (ZETO) only one node have more than 30 SSTables and pending
  compactions are decreasing to zero.
 
  In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg
  on
  screen from opscenter is less then 5000)
 
 
  Regards
  Piotrek.
 
  p.s. I don't know why my mail client display my name as Ja Sam instead
  of
  Piotr Stapp, but this doesn't change anything :)
 
 
  On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar
  ronibaltha...@gmail.com
  wrote:
 
  Hi Ja,
 
  How are the pending compactions distributed between the nodes?
  Run nodetool compactionstats on all of your nodes and check if the
  pendings tasks are balanced or they are concentrated in only few
  nodes.
  You also can check the if the SSTable count is balanced running
  nodetool cfstats on your nodes.
 
  Cheers,
 
  Roni Balthazar
 
 
 
  On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote:
   I do NOT have SSD. I have normal HDD group by JBOD.
   My CF have SizeTieredCompactionStrategy
   I am using local quorum for reads and writes. To be precise I have a
   lot
   of
   writes and almost 0 reads.
   I changed cold_reads_to_omit to 0.0 as someone suggest me. I used
   set
   compactionthrouput to 999.
  
   So if my disk are idle, my CPU is less then 40%, I have some free RAM
   -
   why
   SSTables count is growing? How I can speed up compactions?
  
   On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com
   wrote:
  
  
  
   If You could be so kind and validate above and give me an answer is
   my
   disk are real problems or not? And give me a tip what should I do
   with
   above
   cluster? Maybe I have misconfiguration?
  
  
  
   You disks are effectively idle. What consistency level are you using
   for
   reads and writes?
  
   Actually, 'await' is sort of weirdly high for idle SSDs. Check your
   interrupt mappings (cat /proc/interrupts) and make sure the
   interrupts
   are
   not being stacked on a single CPU.

Re: Possible problem with disk latency

2015-02-25 Thread Roni Balthazar

Hi,

Check how many active CompactionExecutors is showing in nodetool tpstats.
Maybe your concurrent_compactors is too low. Enforce 1 per CPU core,
even it's the default value on 2.1.
Some of our nodes were running with 2 compactors, but we have an 8 core CPU...
After that monitor your nodes to be sure that the value is not too
high. You may get too much IO if you increase concurrent compactors
when using spinning disks.

Regards,

Roni Balthazar

On 25 February 2015 at 16:37, Ja Sam ptrstp...@gmail.com wrote:
 Hi,
 One more thing. Hinted Handoff for last week for all nodes was less than 5.
 For me every READ is a problem because it must open too many files (3
 SSTables), which occurs as an error in reads, repairs, etc.
 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote:

 Hi,
 It is not obvious, because data is replicated to second data center. We
 check it manually for random records we put into Cassandra and we find all
 of them in secondary DC.
 We know about every single GC failure, but this doesn't change anything.
 The problem with GC failure is only one: restart the node. For few days we
 do not have GC errors anymore. It looks for me like memory leaks.
 We use Chef.

 By MANUAL compaction you mean running nodetool compact?  What does it
 change to permanently running compactions?

 Regards
 Piotrek

 On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 I think you may have a vicious circle of errors: because your data is not
 properly replicated to the neighbour, it is not replicating to the secondary
 data center (yeah, obvious). I would suspect the GC errors are (also
 obviously) the result of a backlog of compactions that take out the
 neighbour (assuming replication of 3, that means each neighbour is
 participating in compaction from at least one other node besides the primary
 you are looking at (and can of course be much more, depending on e.g. vnode
 count if used).

 What happens is that when a node fails due to a GC error (can't reclaim
 space), that causes a cascade of other errors, as you see. Might I suggest
 you have someone in devops with monitoring experience install a monitoring
 tool that will notify you of EVERY SINGLE java GC failure event? Your DevOps
 team may have a favorite log shipping/monitoring tool, could use e.g. Puppet

 I think you may have to go through a MANUAL, table by table compaction.




 ...
 “Life should not be a journey to the grave with the intention of arriving
 safely in a
 pretty and well preserved body, but rather to skid in broadside in a
 cloud of smoke,
 thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a
 Ride!”
 - Hunter Thompson

 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0) 20 8144 9872

 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote:

 Hi Roni,
 The repair results is following (we run it Friday): Cannot proceed on
 repair because a neighbor (/192.168.61.201) is dead: session failed

 But to be honest the neighbor did not died. It seemed to trigger a
 series of full GC events on the initiating node. The results form logs are:

 [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
 for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
 [2015-02-21 02:21:55,640] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:22:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:23:55,642] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 02:24:55,644] Lost notification. You should check server log
 for repair status of keyspace prem_maelstrom_2
 [2015-02-21 04:41:08,607] Repair session
 d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
 failed with error org.apache.cassandra.exceptions.RepairException: [repair
 #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
 (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
 Sync failed between /192.168.71.196 and /192.168.61.199
 [2015-02-21 04:41:08,608] Repair session
 eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
 (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
 failed with error java.io.IOException: Endpoint /192.168.61.199 died
 [2015-02-21 04:41:08,608] Repair session
 c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
 java.io.IOException: Cannot proceed on repair because a neighbor
 (/192.168.61.201) is dead: session failed
 [2015-02-21 04:41:08,609] Repair session
 c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
 (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
 failed with error java.io.IOException: Cannot proceed on repair because a
 neighbor

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

Try repair -pr on all nodes.

If after that you still have issues, you can try to rebuild the SSTables using
nodetool upgradesstables or scrub.

Regards,

Roni Balthazar

Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu:

ad 3) I did this already yesterday (setcompactionthrouput also). But still
SSTables are increasing.

ad 1) What do you think I should use -pr or try to use incremental?

On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:
You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease the
numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can use
incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0, and
increase setcompactionthroughput for some time and see if the number of
SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:
Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your mail, that repair is only to make data consistent
between nodes.

Regards

On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:
Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
repair -pr on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a nodetool cleanup.

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we had
Leveled compaction before. Last week we ALTER tables to STCS, because
guys
from DataStax suggest us that we should not use Leveled and alter tables
in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count -
if I
am wrong please give me an information

2) I did this. My tables are 99% write only. It is audit system

3) Yes I am using default values

4) In both operations I am using LOCAL_QUORUM.

I am almost sure that READ timeout happens because of too much SSTables.
Anyway firstly I would like to fix to many pending compactions. I still
don't know how to speed up them.

On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Are you running repairs within gc_grace_seconds? (default is 10 days)

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check if
you are not reading from DC_B due to your Replication Factor and CL.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote:
I don't have problems with DC_B (replica) only in DC_A(my system write
only
to it) I have read timeouts.

I checked in OpsCenter SSTable count and I have:
1) in DC_A same +-10% for last week, a small increase for last 24h
(it
is
more than 15000-2 SSTables depends on node)
2) in DC_B last 24h shows up to 50% decrease, which give nice
prognostics.
Now I have less then 1000 SSTables

What did you measure during system optimizations? Or do you have an
idea
what more should I check?
1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are
spikes
3) system RAM usage is almost full
4) In Total Bytes Compacted most most lines are below 3MB/s. For total
DC_A
it is less than 10MB/s, in DC_B it looks much better (avg is like
17MB/s)

something else?

On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Hi,

You can check if the number of SSTables is decreasing. Look for the
SSTable count information of your tables using nodetool

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

Are you running repairs within gc_grace_seconds? (default is 10 days)
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check if
you are not reading from DC_B due to your Replication Factor and CL.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote:
I don't have problems with DC_B (replica) only in DC_A(my system write only
to it) I have read timeouts.

I checked in OpsCenter SSTable count and I have:
1) in DC_A same +-10% for last week, a small increase for last 24h (it is
more than 15000-2 SSTables depends on node)
2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics.
Now I have less then 1000 SSTables

What did you measure during system optimizations? Or do you have an idea
what more should I check?
1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are
spikes
3) system RAM usage is almost full
4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A
it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s)

something else?

On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Hi,

You can check if the number of SSTables is decreasing. Look for the
SSTable count information of your tables using nodetool cfstats.
The compaction history can be viewed using nodetool
compactionhistory.

About the timeouts, check this out:
http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
Also try to run nodetool tpstats to see the threads statistics. It
can lead you to know if you are having performance problems. If you
are having too many pending tasks or dropped messages, maybe will you
need to tune your system (eg: driver's timeout, concurrent reads and
so on)

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam ptrstp...@gmail.com wrote:
Hi,
Thanks for your tip it looks that something changed - I still don't
know
if it is ok.

My nodes started to do more compaction, but it looks that some
compactions
are really slow.
In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput
to
999, but I do not see difference.

Can we check something more? Or do you have any method to monitor
progress
with small files?

Regards

On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

HI,

Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
the solution...
The number of SSTables decreased from many thousands to a number below
a hundred and the SSTables are now much bigger with several gigabytes
(most of them).

Cheers,

Roni Balthazar

On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam ptrstp...@gmail.com wrote:
After some diagnostic ( we didn't set yet cold_reads_to_omit ).
Compaction
are running but VERY slow with idle IO.

We had a lot of Data files in Cassandra. In DC_A it is about
~12
(only
xxx-Data.db) in DC_B has only ~4000.

I don't know if this change anything but:
1) in DC_A avg size of Data.db file is ~13 mb. I have few a really
big
ones,
but most is really small (almost 1 files are less then 100mb).
2) in DC_B avg size of Data.db is much bigger ~260mb.

Do you think that above flag will help us?

On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam ptrstp...@gmail.com wrote:

I set setcompactionthroughput 999 permanently and it doesn't change
anything. IO is still same. CPU is idle.

On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Hi,

You can run nodetool compactionstats to view statistics on
compactions.
Setting cold_reads_to_omit to 0.0 can help to reduce the number of
SSTables when you use Size-Tiered compaction.
You can also create a cron job to increase the value of
setcompactionthroughput during the night or when your IO is not
busy.

From http://wiki.apache.org/cassandra/NodeTool:
0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16

Cheers,

Roni Balthazar

On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam ptrstp...@gmail.com
wrote:
One think I do not understand. In my case compaction is running
permanently.
Is there a way to check which compaction is pending? The only
information is
about total count.

On Monday, February 16, 2015, Ja Sam ptrstp...@gmail.com wrote

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease the
numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
use incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
and increase setcompactionthroughput for some time and see if the number of
SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:

Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your mail, that repair is only to make data consistent
between nodes.

Regards

On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
repair -pr on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a nodetool cleanup.

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we had
Leveled compaction before. Last week we ALTER tables to STCS, because
guys
from DataStax suggest us that we should not use Leveled and alter
tables in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count -
if I
am wrong please give me an information

2) I did this. My tables are 99% write only. It is audit system

3) Yes I am using default values

4) In both operations I am using LOCAL_QUORUM.

I am almost sure that READ timeout happens because of too much SSTables.
Anyway firstly I would like to fix to many pending compactions. I still
don't know how to speed up them.

On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Are you running repairs within gc_grace_seconds? (default is 10 days)

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check if
you are not reading from DC_B due to your Replication Factor and CL.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote:
I don't have problems with DC_B (replica) only in DC_A(my system
write
only
to it) I have read timeouts.

What did you measure during system optimizations? Or do you have an
idea
what more should I check?
1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there
are
spikes
3) system RAM usage is almost full
4) In Total Bytes Compacted most most lines are below 3MB/s. For
total
DC_A
it is less than 10MB/s, in DC_B it looks much better (avg is like
17MB/s)

something else?

On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Hi,

About the timeouts, check this out:

http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
Also try to run nodetool tpstats to see the threads statistics. It
can lead you to know if you are having performance problems. If you
are having too many pending tasks or dropped messages, maybe will
you
need to tune your system (eg: driver's timeout, concurrent reads and
so on)

Regards,

Roni Balthazar

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

Hi,

You can check if the number of SSTables is decreasing. Look for the
SSTable count information of your tables using nodetool cfstats.
The compaction history can be viewed using nodetool
compactionhistory.

About the timeouts, check this out:
http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure
Also try to run nodetool tpstats to see the threads statistics. It
can lead you to know if you are having performance problems. If you
are having too many pending tasks or dropped messages, maybe will you
need to tune your system (eg: driver's timeout, concurrent reads and
so on)

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam ptrstp...@gmail.com wrote:
 Hi,
 Thanks for your tip it looks that something changed - I still don't know
 if it is ok.

 My nodes started to do more compaction, but it looks that some compactions
 are really slow.
 In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to
 999, but I do not see difference.

 Can we check something more? Or do you have any method to monitor progress
 with small files?

 Regards

 On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 HI,

 Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
 the solution...
 The number of SSTables decreased from many thousands to a number below
 a hundred and the SSTables are now much bigger with several gigabytes
 (most of them).

 Cheers,

 Roni Balthazar



 On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam ptrstp...@gmail.com wrote:
  After some diagnostic ( we didn't set yet cold_reads_to_omit ).
  Compaction
  are running but VERY slow with idle IO.
 
  We had a lot of Data files in Cassandra. In DC_A it is about ~12
  (only
  xxx-Data.db) in DC_B has only ~4000.
 
  I don't know if this change anything but:
  1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
  ones,
  but most is really small (almost 1 files are less then 100mb).
  2) in DC_B avg size of Data.db is much bigger ~260mb.
 
  Do you think that above flag will help us?
 
 
  On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam ptrstp...@gmail.com wrote:
 
  I set setcompactionthroughput 999 permanently and it doesn't change
  anything. IO is still same. CPU is idle.
 
  On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar
  ronibaltha...@gmail.com
  wrote:
 
  Hi,
 
  You can run nodetool compactionstats to view statistics on
  compactions.
  Setting cold_reads_to_omit to 0.0 can help to reduce the number of
  SSTables when you use Size-Tiered compaction.
  You can also create a cron job to increase the value of
  setcompactionthroughput during the night or when your IO is not busy.
 
  From http://wiki.apache.org/cassandra/NodeTool:
  0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
  0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
 
  Cheers,
 
  Roni Balthazar
 
  On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam ptrstp...@gmail.com wrote:
   One think I do not understand. In my case compaction is running
   permanently.
   Is there a way to check which compaction is pending? The only
   information is
   about total count.
  
  
   On Monday, February 16, 2015, Ja Sam ptrstp...@gmail.com wrote:
  
   Of couse I made a mistake. I am using 2.1.2. Anyway night build is
   available from
   http://cassci.datastax.com/job/cassandra-2.1/
  
   I read about cold_reads_to_omit It looks promising. Should I set
   also
   compaction throughput?
  
   p.s. I am really sad that I didn't read this before:
  
  
   https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
  
  
  
   On Monday, February 16, 2015, Carlos Rolo r...@pythian.com wrote:
  
   Hi 100% in agreement with Roland,
  
   2.1.x series is a pain! I would never recommend the current 2.1.x
   series
   for production.
  
   Clocks is a pain, and check your connectivity! Also check tpstats
   to
   see
   if your threadpools are being overrun.
  
   Regards,
  
   Carlos Juzarte Rolo
   Cassandra Consultant
  
   Pythian - Love your data
  
   rolo@pythian | Twitter: cjrolo | Linkedin:
   linkedin.com/in/carlosjuzarterolo
   Tel: 1649
   www.pythian.com
  
   On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
   r.etzenham...@t-online.de wrote:
  
   Hi,
  
   1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested
   by
   Al
   Tobey from DataStax)
   7) minimal reads (usually none, sometimes few)
  
   those two points keep me repeating an anwser I got. First where
   did
   you
   get 2.1.3 from? Maybe I missed it, I will have a look. But if it
   is
   2.1.2
   whis is the latest released version, that version has many bugs -
   most of
   them I got kicked by while testing 2.1.2. I got many problems
   with
   compactions not beeing triggred on column families not beeing
   read,
   compactions and repairs not beeing completed.  See
  
  
  
  
   https://www.mail-archive.com/search?l=user@cassandra.apache.orgq=subject:%22Re%3A

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we had
Leveled compaction before. Last week we ALTER tables to STCS, because guys
from DataStax suggest us that we should not use Leveled and alter tables in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count - if I
am wrong please give me an information