[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-11-16 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10515:
--
Fix Version/s: 3.0.1

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>  Labels: commitlog, triage
> Fix For: 3.0.1, 3.1, 2.1.x, 2.2.x
>
> Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, 
> CommitLogProblem.jpg, CommitLogSize.jpg, 
> MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, 
> cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-11-11 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10515:
--
Fix Version/s: 2.2.x
   2.1.x
   3.1

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>  Labels: commitlog, triage
> Fix For: 3.1, 2.1.x, 2.2.x
>
> Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, 
> CommitLogProblem.jpg, CommitLogSize.jpg, 
> MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, 
> cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-11-11 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10515:
--
Priority: Major  (was: Critical)

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>  Labels: commitlog, triage
> Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, 
> CommitLogProblem.jpg, CommitLogSize.jpg, 
> MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, 
> cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-11-11 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10515:
--
Component/s: Streaming and Messaging

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>  Labels: commitlog, triage
> Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, 
> CommitLogProblem.jpg, CommitLogSize.jpg, 
> MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, 
> cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-29 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: CASSANDRA-19579.jpg

[~blambov] [~benedict]

Killed two birds with one stone here it seems. Looking at he logs before the 
commit log growth, it looks like the IndexOutOfBounds exceptions affected all 
nodes in this small cluster of 3 at the same time, with with RF=3 that probably 
makes sense, doesn't it?


> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, 
> CommitLogProblem.jpg, CommitLogSize.jpg, 
> MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, 
> cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-23 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: MultinodeCommitLogGrowth-node1.tar.gz

[~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G 
setting but with different behavior. I have captured the whole thing with 
thread dumps and tpstats every two minutes. I've embedded pending numbers in 
the filenames for your convenience to make it easy to see where the backup 
starts. *-node1.tar.gz is the only one i uploaded since the files were so 
large, but note in the Dashboard.jpg file that all three nodes break the limit 
at about the same time. I can upload the others if it is useful. This case 
seems different from the previous case where there were lots of L0 files 
causing thread blocking, but even here it seems like the MemtablePostFlush is 
stopping on a countdownlatch.

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, 
> CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, 
> cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-16 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: C5commitLogIncrease.jpg

[~krummas] I checked a different cluster for sstable counts. 
(C5commitLogIncrease.jpg) Here they all decided to break the limit at the same 
time. The largest sstable count in each is about 5K.

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, 
> CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, 
> stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-16 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: cassandra.yaml

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-16 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: cfstats-clean.txt

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: RUN3tpstats.jpg

[~tjake] I monitored live for a few hours to capture the behavior. See 
RUN3tpstats.jpg in the attachments:

Overview is:
Monitoring threads began to block before the memtable flushing did.
Memtable flushing seemed to be progressing slowly and then post flushing 
operations began to pile up. The primary things blocked were:
1. MemtableFlushWriter/handleNotif
2. CompactionExec/getNextBGTask
3. ServiceThread/getEstimatedRemTask

Those three blocked and never came unblocked so assume (?) the locker never 
completed:

{code}
"CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 
nid=0x728b runnable [0x7fda4ae0b000]
   java.lang.Thread.State: RUNNABLE
at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49)
at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511)
at 
org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572)
at 
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346)
- locked <0x0004a8bc5038> (a 
org.apache.cassandra.db.compaction.LeveledManifest)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90)
- locked <0x0004a8af17d0> (a 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84)
- locked <0x0004a894df10> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, 
> RUN3tpstats.jpg, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also d

[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-15 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: stacktrace.txt

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, 
> system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-14 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: CommitLogSize.jpg

This matches the period in the log file system.log.clean

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-14 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Attachment: system.log.clean

I cleaned up a log and attached it for a period that begin already in trouble, 
but suddenly started (flushing?) and compacting again. At 11:25 there was a 
huge compaction. Things seemed normal for a while but the commit logs began to 
grow again. Note that I have cleaned out a bunch of warnings like this because 
we see them everywhere:

WARN  [SharedPool-Worker-4] 2015-10-14 11:21:22,940 BatchStatement.java:252 - 
Batch of prepared statements for [xx] is of size 5253, exceeding specified 
threshold of 5120 by 133.

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-13 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-10515:

Labels: commitlog triage  (was: )

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
>  Labels: commitlog, triage
> Attachments: CommitLogProblem.jpg
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-13 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-10515:

 Assignee: Branimir Lambov
Reproduced In: 2.1.10

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Assignee: Branimir Lambov
>Priority: Critical
> Attachments: CommitLogProblem.jpg
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-13 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Description: 
After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where 
some nodes break the 12G commit log max we configured and go as high as 65G or 
more before it restarts. Once it reaches the state of more than 12G commit log 
files, "nodetool compactionstats" hangs. Eventually C* restarts without errors 
(not sure yet whether it is crashing but I'm checking into it) and the cleanup 
occurs and the commit logs shrink back down again. Here is the nodetool 
compactionstats immediately after restart.

{code}
jgriffith@prod1xc1.c2.bf1:~$ ndc
pending tasks: 2185
   compaction type   keyspace  table completed  
totalunit   progress
Compaction   SyncCore  *cf1*   61251208033   
170643574558   bytes 35.89%
Compaction   SyncCore  *cf2*   19262483904
19266079916   bytes 99.98%
Compaction   SyncCore  *cf3*6592197093 
6592316682   bytes100.00%
Compaction   SyncCore  *cf4*3411039555 
3411039557   bytes100.00%
Compaction   SyncCore  *cf5*2879241009 
2879487621   bytes 99.99%
Compaction   SyncCore  *cf6*   21252493623
21252635196   bytes100.00%
Compaction   SyncCore  *cf7*   81009853587
81009854438   bytes100.00%
Compaction   SyncCore  *cf8*3005734580 
3005768582   bytes100.00%
Active compaction remaining time :n/a
{code}

I was also doing periodic "nodetool tpstats" which were working but not being 
logged in system.log on the StatusLogger thread until after the compaction 
started working again.


  was:
After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where 
some nodes break the 12G commit log max we configured and go as high as 65G or 
more. Once it reaches this state, "nodetool compactionstats" hangs. Eventually 
C* restarts without errors and the cleanup occurs and the commit logs shrink 
back down again.

{code}
jgriffith@prod1xc1.c2.bf1:~$ ndc
pending tasks: 2185
   compaction type   keyspace  table completed  
totalunit   progress
Compaction   SyncCore  *cf1*   61251208033   
170643574558   bytes 35.89%
Compaction   SyncCore  *cf2*   19262483904
19266079916   bytes 99.98%
Compaction   SyncCore  *cf3*6592197093 
6592316682   bytes100.00%
Compaction   SyncCore  *cf4*3411039555 
3411039557   bytes100.00%
Compaction   SyncCore  *cf5*2879241009 
2879487621   bytes 99.99%
Compaction   SyncCore  *cf6*   21252493623
21252635196   bytes100.00%
Compaction   SyncCore  *cf7*   81009853587
81009854438   bytes100.00%
Compaction   SyncCore  *cf8*3005734580 
3005768582   bytes100.00%
Active compaction remaining time :n/a
{code}

I was also doing periodic "nodetool tpstats" which were working but not being 
logged in system.log on the StatusLogger thread until after the compaction 
started working again.



> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Priority: Critical
> Attachments: CommitLogProblem.jpg
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *

[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-13 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Description: 
After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where 
some nodes break the 12G commit log max we configured and go as high as 65G or 
more. Once it reaches this state, "nodetool compactionstats" hangs. Eventually 
C* restarts without errors and the cleanup occurs and the commit logs shrink 
back down again.

{code}
jgriffith@prod1xc1.c2.bf1:~$ ndc
pending tasks: 2185
   compaction type   keyspace  table completed  
totalunit   progress
Compaction   SyncCore  *cf1*   61251208033   
170643574558   bytes 35.89%
Compaction   SyncCore  *cf2*   19262483904
19266079916   bytes 99.98%
Compaction   SyncCore  *cf3*6592197093 
6592316682   bytes100.00%
Compaction   SyncCore  *cf4*3411039555 
3411039557   bytes100.00%
Compaction   SyncCore  *cf5*2879241009 
2879487621   bytes 99.99%
Compaction   SyncCore  *cf6*   21252493623
21252635196   bytes100.00%
Compaction   SyncCore  *cf7*   81009853587
81009854438   bytes100.00%
Compaction   SyncCore  *cf8*3005734580 
3005768582   bytes100.00%
Active compaction remaining time :n/a
{code}

I was also doing periodic "nodetool tpstats" which were working but not being 
logged in system.log on the StatusLogger thread until after the compaction 
started working again.


  was:
After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where 
some nodes break the 12G commit log max we configured and go as high as 65G or 
more. Once it reaches this state, "nodetool compactionstats" hangs. I watched 
the recovery live when compactions begin happening again. the "nodetool 
compactionstats" suddenly completed to show the outstanding jobs most in 100% 
completion state:

{code}
jgriffith@prod1xc1.c2.bf1:~$ ndc
pending tasks: 2185
   compaction type   keyspace  table completed  
totalunit   progress
Compaction   SyncCore  *cf1*   61251208033   
170643574558   bytes 35.89%
Compaction   SyncCore  *cf2*   19262483904
19266079916   bytes 99.98%
Compaction   SyncCore  *cf3*6592197093 
6592316682   bytes100.00%
Compaction   SyncCore  *cf4*3411039555 
3411039557   bytes100.00%
Compaction   SyncCore  *cf5*2879241009 
2879487621   bytes 99.99%
Compaction   SyncCore  *cf6*   21252493623
21252635196   bytes100.00%
Compaction   SyncCore  *cf7*   81009853587
81009854438   bytes100.00%
Compaction   SyncCore  *cf8*3005734580 
3005768582   bytes100.00%
Active compaction remaining time :n/a
{code}

I was also doing periodic "nodetool tpstats" which were working but not being 
logged in system.log on the StatusLogger thread until after the compaction 
started working again.



> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Priority: Critical
> Attachments: CommitLogProblem.jpg
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. 
> Eventually C* restarts without errors and the cleanup occurs and the commit 
> logs shrink back down again.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
>   

[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10

2015-10-13 Thread Jeff Griffith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Griffith updated CASSANDRA-10515:
--
Summary: Commit logs back up with move to 2.1.10  (was: Compaction hangs 
with move to 2.1.10)

> Commit logs back up with move to 2.1.10
> ---
>
> Key: CASSANDRA-10515
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: redhat 6.5, cassandra 2.1.10
>Reporter: Jeff Griffith
>Priority: Critical
> Attachments: CommitLogProblem.jpg
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I 
> watched the recovery live when compactions begin happening again. the 
> "nodetool compactionstats" suddenly completed to show the outstanding jobs 
> most in 100% completion state:
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>compaction type   keyspace  table completed
>   totalunit   progress
> Compaction   SyncCore  *cf1*   61251208033   
> 170643574558   bytes 35.89%
> Compaction   SyncCore  *cf2*   19262483904
> 19266079916   bytes 99.98%
> Compaction   SyncCore  *cf3*6592197093
>  6592316682   bytes100.00%
> Compaction   SyncCore  *cf4*3411039555
>  3411039557   bytes100.00%
> Compaction   SyncCore  *cf5*2879241009
>  2879487621   bytes 99.99%
> Compaction   SyncCore  *cf6*   21252493623
> 21252635196   bytes100.00%
> Compaction   SyncCore  *cf7*   81009853587
> 81009854438   bytes100.00%
> Compaction   SyncCore  *cf8*3005734580
>  3005768582   bytes100.00%
> Active compaction remaining time :n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)