[jira] [Commented] (CASSANDRA-15152) Batch Log - Mutation too large while bootstrapping a newly added node
[ https://issues.apache.org/jira/browse/CASSANDRA-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859724#comment-16859724 ] Avraham Kalvo commented on CASSANDRA-15152: --- Switching log level to trace has disclosed the following, just before the error we’re getting: `TRACE [BatchlogTasks:1] 2019-06-10 05:45:40,251 BatchlogManager.java:309 - Replaying batch 5694cca0-8834-11e9-b262-b3ace0831935` How should one query the `system.batches` table to see the actual mutation(s) list (Blob to Text? Casting?) Would this table disclose the exact keyspace.table the mutations is related to? thanks. > Batch Log - Mutation too large while bootstrapping a newly added node > - > > Key: CASSANDRA-15152 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15152 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Batch Log >Reporter: Avraham Kalvo >Priority: Normal > > Scaling our six nodes cluster by three more nodes, we came upon behavior in > which bootstrap appears hung under `UJ` (two previously added were joined > within approximately 2.5 hours). > Examining the logs the following became apparent shortly after the bootstrap > process has commenced for this node: > ``` > ERROR [BatchlogTasks:1] 2019-06-05 14:43:46,508 CassandraDaemon.java:207 - > Exception in thread Thread[BatchlogTasks:1,5,main] > java.lang.IllegalArgumentException: Mutation of 108035175 bytes is too large > for the maximum size of 16777216 > at > org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:520) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:399) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:213) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendSingleReplayMutation(BatchlogManager.java:427) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendReplays(BatchlogManager.java:402) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.replay(BatchlogManager.java:318) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:238) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:207) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_201] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_201] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_201] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_201] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_201] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_201] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201] > ``` > And since then, repeating itself in the logs. > We decided to discard the newly added apparently still joining node by doing > the following: > 1. at first - simply restarting it, which resulted in it starting up > apparently normally > 2. then - decommission it by issuing `nodetool decommission`, this took long > (over 2.5 hours) and eventually was terminated by issuing `nodetool > removenode` > 3. node removal was hung on a specific token, which led us to complete it by > force. > 4. forcing the node removal has generated a corruption with one of the > `system.batches` table SSTables, which was removed (backed up) from its > underlying data dir as mitigation (78MB worth) > 5. cluster-wide repair was run > 6. `Mutation too large` error is now repeating itself in three different > permutations (alerted sizes) under three different nodes (our standard > replication factor is of three) > We're not sure whether we
[jira] [Created] (CASSANDRA-15152) Batch Log - Mutation too large while bootstrapping a newly added node
Avraham Kalvo created CASSANDRA-15152: - Summary: Batch Log - Mutation too large while bootstrapping a newly added node Key: CASSANDRA-15152 URL: https://issues.apache.org/jira/browse/CASSANDRA-15152 Project: Cassandra Issue Type: Bug Components: Consistency/Batch Log Reporter: Avraham Kalvo Scaling our six nodes cluster by three more nodes, we came upon behavior in which bootstrap appears hung under `UJ` (two previously added were joined within approximately 2.5 hours). Examining the logs the following became apparent shortly after the bootstrap process has commenced for this node: ``` ERROR [BatchlogTasks:1] 2019-06-05 14:43:46,508 CassandraDaemon.java:207 - Exception in thread Thread[BatchlogTasks:1,5,main] java.lang.IllegalArgumentException: Mutation of 108035175 bytes is too large for the maximum size of 16777216 at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:520) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:399) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.Mutation.apply(Mutation.java:213) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendSingleReplayMutation(BatchlogManager.java:427) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendReplays(BatchlogManager.java:402) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.replay(BatchlogManager.java:318) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:238) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:207) ~[apache-cassandra-3.0.10.jar:3.0.10] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-3.0.10.jar:3.0.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_201] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_201] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_201] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201] ``` And since then, repeating itself in the logs. We decided to discard the newly added apparently still joining node by doing the following: 1. at first - simply restarting it, which resulted in it starting up apparently normally 2. then - decommission it by issuing `nodetool decommission`, this took long (over 2.5 hours) and eventually was terminated by issuing `nodetool removenode` 3. node removal was hung on a specific token, which led us to complete it by force. 4. forcing the node removal has generated a corruption with one of the `system.batches` table SSTables, which was removed (backed up) from its underlying data dir as mitigation (78MB worth) 5. cluster-wide repair was run 6. `Mutation too large` error is now repeating itself in three different permutations (alerted sizes) under three different nodes (our standard replication factor is of three) We're not sure whether we're hitting https://issues.apache.org/jira/browse/CASSANDRA-11670 or not, as it's said to be resolved in our current version of 3.0.10. Still would like to verify what's the root cause for this? as we need to make clear whether we are to expect this happening in production environments. How would you recommend verifying to which keyspace.table does this mutation belong to? Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-10190) Python 3 support for cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859468#comment-16859468 ] Patrick Bannister edited comment on CASSANDRA-10190 at 6/9/19 2:15 PM: --- [~djoshi3], thanks for the feedback. I've incorporated all of your recommendations on the branch. I meant to completely delete FrozenType after testing with it commented out, but I forgot to follow through. The reason I wanted to remove the FrozenType class from cqlsh.py is because it was commented "Needed until the bundled python driver adds FrozenType.", and I noticed that [the Python driver includes FrozenType since version 2.5.0|https://github.com/datastax/python-driver/blob/master/CHANGELOG.rst#250]. It looks like the bundled driver is at least version 3.7.0 (CASSANDRA-12736), so I think we should be able to remove FrozenType completely. I've made this change on my branch too. was (Author: ptbannister): [~djoshi3], thanks for the feedback. I've incorporated all of your recommendations on the branch. I meant to completely delete FrozenType after testing with it commented out, but I forgot to follow through. The reason I wanted to remove the FrozenType class from cqlsh.py is because it was commented "Needed until the bundled python driver adds FrozenType.", and I noticed that [the Python driver includes FrozenType since version 2.5.0|[https://github.com/datastax/python-driver/blob/master/CHANGELOG.rst#250].] It looks like the bundled driver is at least version 3.7.0 (CASSANDRA-12736), so I think we should be able to remove FrozenType completely. I've made this change on my branch too. > Python 3 support for cqlsh > -- > > Key: CASSANDRA-10190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10190 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: Andrew Pennebaker >Assignee: Patrick Bannister >Priority: Normal > Labels: cqlsh > Attachments: coverage_notes.txt > > > Users who operate in a Python 3 environment may have trouble launching cqlsh. > Could we please update cqlsh's syntax to run in Python 3? > As a workaround, users can setup pyenv, and cd to a directory with a > .python-version containing "2.7". But it would be nice if cqlsh supported > modern Python versions out of the box. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10190) Python 3 support for cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859468#comment-16859468 ] Patrick Bannister commented on CASSANDRA-10190: --- [~djoshi3], thanks for the feedback. I've incorporated all of your recommendations on the branch. I meant to completely delete FrozenType after testing with it commented out, but I forgot to follow through. The reason I wanted to remove the FrozenType class from cqlsh.py is because it was commented "Needed until the bundled python driver adds FrozenType.", and I noticed that [the Python driver includes FrozenType since version 2.5.0|[https://github.com/datastax/python-driver/blob/master/CHANGELOG.rst#250].] It looks like the bundled driver is at least version 3.7.0 (CASSANDRA-12736), so I think we should be able to remove FrozenType completely. I've made this change on my branch too. > Python 3 support for cqlsh > -- > > Key: CASSANDRA-10190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10190 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: Andrew Pennebaker >Assignee: Patrick Bannister >Priority: Normal > Labels: cqlsh > Attachments: coverage_notes.txt > > > Users who operate in a Python 3 environment may have trouble launching cqlsh. > Could we please update cqlsh's syntax to run in Python 3? > As a workaround, users can setup pyenv, and cd to a directory with a > .python-version containing "2.7". But it would be nice if cqlsh supported > modern Python versions out of the box. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14494) Investigate possibility of a cqlsh terminfo
[ https://issues.apache.org/jira/browse/CASSANDRA-14494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Bannister updated CASSANDRA-14494: -- Resolution: Won't Do Status: Resolved (was: Open) This is unnecessary at this time. > Investigate possibility of a cqlsh terminfo > --- > > Key: CASSANDRA-14494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14494 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Tools > Environment: This behavior has been observed in xterm on CentOS 7.5 > platforms. The test_cqlsh_output.py unit tests > (pylib/cqlshlib/test/test_cqlsh_output.py) are a good place to see it in > action. >Reporter: Patrick Bannister >Assignee: Patrick Bannister >Priority: Normal > Labels: Python, cqlsh, test > Fix For: 4.x > > > Summary: investigate whether we could use a cqlsh-specific terminfo file to > prevent use of the set-meta-mode escape sequence in xterm without breaking > colors. If it works, see if we can install it in an appropriate place using > Python distutils. If yes to both, generate a cqlsh terminfo file and work it > into the install process. > Long detailed explanation: > In some more recent environments, in Python REPL applications that use the > readline module, the set meta mode escape sequence is output before each > prompt. This escape sequence has caused problems for some applications, and > in our case, some of our cqlsh unit tests > (pylib/cqlshlib/test/test_cqlsh_output.py) choke on this output because of > the way our tests are designed to detect the cqlsh prompt. This behavior was > observed on a CentOS 7.5 platform. > The set-meta-mode escape sequence normally appears as "[?1034h" in output; > it's normally defined as the bytes 1b 5b 3f 31 30 33 34 68. The exact value > of the escape sequence is configurable and may be found on a GNU/Linux > platform by running the command: > {code:java} > tput smm | hexdump{code} > If this command gives no output, then the set meta mode sequence is not > defined on this platform for the terminal in use. Refer to the xterm and > terminfo man pages for more information on this sequence. > There are easier ways to solve this problem for the sake of the unit test, > but if time allows, I'd like to look into this to achieve a more consistent > output behavior for cqlsh on GNU/Linux platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14494) Investigate possibility of a cqlsh terminfo
[ https://issues.apache.org/jira/browse/CASSANDRA-14494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Bannister reassigned CASSANDRA-14494: - Assignee: Patrick Bannister > Investigate possibility of a cqlsh terminfo > --- > > Key: CASSANDRA-14494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14494 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Tools > Environment: This behavior has been observed in xterm on CentOS 7.5 > platforms. The test_cqlsh_output.py unit tests > (pylib/cqlshlib/test/test_cqlsh_output.py) are a good place to see it in > action. >Reporter: Patrick Bannister >Assignee: Patrick Bannister >Priority: Normal > Labels: Python, cqlsh, test > Fix For: 4.x > > > Summary: investigate whether we could use a cqlsh-specific terminfo file to > prevent use of the set-meta-mode escape sequence in xterm without breaking > colors. If it works, see if we can install it in an appropriate place using > Python distutils. If yes to both, generate a cqlsh terminfo file and work it > into the install process. > Long detailed explanation: > In some more recent environments, in Python REPL applications that use the > readline module, the set meta mode escape sequence is output before each > prompt. This escape sequence has caused problems for some applications, and > in our case, some of our cqlsh unit tests > (pylib/cqlshlib/test/test_cqlsh_output.py) choke on this output because of > the way our tests are designed to detect the cqlsh prompt. This behavior was > observed on a CentOS 7.5 platform. > The set-meta-mode escape sequence normally appears as "[?1034h" in output; > it's normally defined as the bytes 1b 5b 3f 31 30 33 34 68. The exact value > of the escape sequence is configurable and may be found on a GNU/Linux > platform by running the command: > {code:java} > tput smm | hexdump{code} > If this command gives no output, then the set meta mode sequence is not > defined on this platform for the terminal in use. Refer to the xterm and > terminfo man pages for more information on this sequence. > There are easier ways to solve this problem for the sake of the unit test, > but if time allows, I'd like to look into this to achieve a more consistent > output behavior for cqlsh on GNU/Linux platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org