subject:"Many pending compactions"

Re: Many pending compactions

2015-02-24 Thread Ja Sam

The repair results is following (we run it Friday): Cannot proceed on
repair because a neighbor (/192.168.61.201) is dead: session failed

But to be honest the neighbor did not died. It seemed to trigger a series
of full GC events on the initiating node. The results form logs are:

[2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
[2015-02-21 02:21:55,640] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:22:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:23:55,642] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 02:24:55,644] Lost notification. You should check server log
for repair status of keyspace prem_maelstrom_2
[2015-02-21 04:41:08,607] Repair session
d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]
failed with error org.apache.cassandra.exceptions.RepairException: [repair
#d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
(85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
Sync failed between /192.168.71.196 and /192.168.61.199
[2015-02-21 04:41:08,608] Repair session
eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
(68056473384187696470568107782069813248,85070591730234615865843651857942052874]
failed with error java.io.IOException: Endpoint /192.168.61.199 died
[2015-02-21 04:41:08,608] Repair session
c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
(42535295865117307932921825928971026442,68056473384187696470568107782069813248]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,609] Repair session
c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
(127605887595351923798765477786913079306,136112946768375392941136215564139626496]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,619] Repair session
c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
(136112946768375392941136215564139626496,0] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/
192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair session
c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
(102084710076281535261119195933814292480,127605887595351923798765477786913079306]
failed with error java.io.IOException: Cannot proceed on repair because a
neighbor (/192.168.61.201) is dead: session failed
[2015-02-21 04:41:08,620] Repair command #2 finished


We tried to run repair one more time. After 24 hour have some streaming
errors. Moreover we have to stop it because we start to have write timeouts
on client :(

We check iostat when we have write timeouts. Example from one node in DC_A
are here:
The file also contains tpstats from all nodes.Nodes starting with z are
in DC_B, rest is in DC_A
Cassandra is data and commit log are on disk dm-XX.

I also read
http://jonathanhui.com/cassandra-performance-tuning-and-monitoring and I
think about:
1) memtable configuration - do you have some suggestion?
2) run INSERT in batch statements - I am not sure if this reduce IO, again
do you have experience with this?

Any tips will be helpful

Regards
Piotrek

On Thu, Feb 19, 2015 at 10:34 AM, Roland Etzenhammer 
r.etzenham...@t-online.de wrote:

 Hi,

 2.1.3 is now the official latest release - I checked this morning and got
 this good surprise. Now it's update time - thanks to all guys involved, if
 I meet anyone one beer from me :-)

 The changelist is rather long:
 https://git1-us-west.apache.org/repos/asf?p=cassandra.git;
 a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3

 Hopefully that will solve many of those oddities and not invent to much
 new ones :-)

 Cheers,
 Roland

Re: Many pending compactions

2015-02-19 Thread Roland Etzenhammer


Hi,

2.1.3 is now the official latest release - I checked this morning and 
got this good surprise. Now it's update time - thanks to all guys 
involved, if I meet anyone one beer from me :-)


The changelist is rather long:
https://git1-us-west.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.1.3

Hopefully that will solve many of those oddities and not invent to much 
new ones :-)


Cheers,
Roland

Re: Many pending compactions

2015-02-18 Thread Roni Balthazar

Try repair -pr on all nodes.

If after that you still have issues, you can try to rebuild the SSTables using
nodetool upgradesstables or scrub.

Regards,

Roni Balthazar

Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu:

ad 3) I did this already yesterday (setcompactionthrouput also). But still
SSTables are increasing.

ad 1) What do you think I should use -pr or try to use incremental?

On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:
You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease the
numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can use
incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0, and
increase setcompactionthroughput for some time and see if the number of
SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:
Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your mail, that repair is only to make data consistent
between nodes.

Regards

On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:
Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
repair -pr on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a nodetool cleanup.

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we had
Leveled compaction before. Last week we ALTER tables to STCS, because
guys
from DataStax suggest us that we should not use Leveled and alter tables
in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count -
if I
am wrong please give me an information

2) I did this. My tables are 99% write only. It is audit system

3) Yes I am using default values

4) In both operations I am using LOCAL_QUORUM.

I am almost sure that READ timeout happens because of too much SSTables.
Anyway firstly I would like to fix to many pending compactions. I still
don't know how to speed up them.

On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Are you running repairs within gc_grace_seconds? (default is 10 days)

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check if
you are not reading from DC_B due to your Replication Factor and CL.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote:
I don't have problems with DC_B (replica) only in DC_A(my system write
only
to it) I have read timeouts.

I checked in OpsCenter SSTable count and I have:
1) in DC_A same +-10% for last week, a small increase for last 24h
(it
is
more than 15000-2 SSTables depends on node)
2) in DC_B last 24h shows up to 50% decrease, which give nice
prognostics.
Now I have less then 1000 SSTables

What did you measure during system optimizations? Or do you have an
idea
what more should I check?
1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are
spikes
3) system RAM usage is almost full
4) In Total Bytes Compacted most most lines are below 3MB/s. For total
DC_A
it is less than 10MB/s, in DC_B it looks much better (avg is like
17MB/s)

something else?

On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Hi,

You can check if the number of SSTables is decreasing. Look for the
SSTable count information of your tables using nodetool

Re: Many pending compactions

2015-02-18 Thread Ja Sam

As Al Tobey suggest me I upgraded my 2.1.0 to snaphot version of 2.1.3. I
have now installed exactly this build:
https://cassci.datastax.com/job/cassandra-2.1/912/
I see many compaction which completes, but some of them are really slow.
Maybe I should send some stats form OpsCenter or servers? But it is
difficult to me to choose what is important

Regards

On Wed, Feb 18, 2015 at 6:11 PM, Jake Luciani jak...@gmail.com wrote:

Ja, Please upgrade to official 2.1.3 we've fixed many things related to
compaction. Are you seeing the compactions % complete progress at all?

On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Try repair -pr on all nodes.

If after that you still have issues, you can try to rebuild the SSTables
using nodetool upgradesstables or scrub.

Regards,

Roni Balthazar

Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu:

ad 3) I did this already yesterday (setcompactionthrouput also). But
still SSTables are increasing.

ad 1) What do you think I should use -pr or try to use incremental?

On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease
the numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
use incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to
0.0, and increase setcompactionthroughput for some time and see if the
number of SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:

Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your mail, that repair is only to make data
consistent between nodes.

Regards

On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar
ronibaltha...@gmail.com wrote:

Which error are you getting when running repairs?
You need to run repair on your nodes within gc_grace_seconds (eg:
weekly). They have data that are not read frequently. You can run
repair -pr on all nodes. Since you do not have deletes, you will not
have trouble with that. If you have deletes, it's better to increase
gc_grace_seconds before the repair.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a nodetool cleanup.

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we
had
Leveled compaction before. Last week we ALTER tables to STCS,
because guys
from DataStax suggest us that we should not use Leveled and alter
tables in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable
count - if I
am wrong please give me an information

2) I did this. My tables are 99% write only. It is audit system

3) Yes I am using default values

4) In both operations I am using LOCAL_QUORUM.

I am almost sure that READ timeout happens because of too much
SSTables.
Anyway firstly I would like to fix to many pending compactions. I
still
don't know how to speed up them.

On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar
ronibaltha...@gmail.com
wrote:

Are you running repairs within gc_grace_seconds? (default is 10
days)

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Double check if you set cold_reads_to_omit to 0.0 on tables with
STCS
that you do not read often.

Are you using default values for the properties
min_compaction_threshold(4) and max_compaction_threshold(32)?

Which Consistency Level are you using for reading operations? Check
if
you are not reading from DC_B due to your Replication Factor and CL.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com
wrote:
I don't have problems with DC_B (replica) only in DC_A(my system
write
only
to it) I have read timeouts.

I checked in OpsCenter SSTable count and I have:
1) in DC_A same +-10% for last week, a small increase for last
24h (it
is
more than 15000-2 SSTables depends on node)
2) in DC_B last 24h shows up to 50% decrease, which give nice
prognostics.
Now I have less then 1000 SSTables

What did you measure during system optimizations? Or do you have
an idea

Re: Many pending compactions

2015-02-18 Thread Jake Luciani

Ja, Please upgrade to official 2.1.3 we've fixed many things related to
compaction. Are you seeing the compactions % complete progress at all?

On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar ronibaltha...@gmail.com
wrote:

Try repair -pr on all nodes.

If after that you still have issues, you can try to rebuild the SSTables
using nodetool upgradesstables or scrub.

Regards,

Roni Balthazar

Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu:

ad 3) I did this already yesterday (setcompactionthrouput also). But
still SSTables are increasing.

ad 1) What do you think I should use -pr or try to use incremental?

On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

You are right... Repair makes the data consistent between nodes.

I understand that you have 2 issues going on.

You need to run repair periodically without errors and need to decrease
the numbers of compactions pending.

So I suggest:

1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
use incremental repairs. There were some bugs on 2.1.2.
2) Run cleanup on all nodes
3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
and increase setcompactionthroughput for some time and see if the number
of SSTables is going down.

Let us know what errors are you getting when running repairs.

Regards,

Roni Balthazar

On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:

Can you explain me what is the correlation between growing SSTables and
repair?
I was sure, until your mail, that repair is only to make data
consistent between nodes.

Regards

On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
After repair, try to run a nodetool cleanup.

Check if the number of SSTables goes down after that... Pending
compactions must decrease as well...

Cheers,

Roni Balthazar

On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
1) we tried to run repairs but they usually does not succeed. But we
had
Leveled compaction before. Last week we ALTER tables to STCS, because
guys
from DataStax suggest us that we should not use Leveled and alter
tables in
STCS, because we don't have SSD. After this change we did not run any
repair. Anyway I don't think it will change anything in SSTable count
- if I
am wrong please give me an information