[jira] [Updated] (CASSANDRA-11272) NullPointerException (NPE) during bootstrap startup in StorageService.java
[ https://issues.apache.org/jira/browse/CASSANDRA-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton updated CASSANDRA-11272: -- Fix Version/s: (was: 3.7) 3.x > NullPointerException (NPE) during bootstrap startup in StorageService.java > -- > > Key: CASSANDRA-11272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11272 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle > Environment: debian jesse up to date >Reporter: Jason Kania >Assignee: Alex Petrov > Fix For: 2.2.x, 3.0.x, 3.x > > > After bootstrapping fails due to stream closed error, the following error > results: > {code} > Feb 27, 2016 8:06:38 PM com.google.common.util.concurrent.ExecutionList > executeListener > SEVERE: RuntimeException while executing runnable > com.google.common.util.concurrent.Futures$6@3d61813b with executor INSTANCE > java.lang.NullPointerException > at > org.apache.cassandra.service.StorageService$2.onFailure(StorageService.java:1284) > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210) > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186) > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430) > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:525) > at > org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:645) > at > org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:70) > at > org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39) > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59) > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-11731: - Assignee: Philip Thompson (was: Stefania) > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Philip Thompson > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292727#comment-15292727 ] Stefania commented on CASSANDRA-11731: -- CI results are good, assigning back to you for further testing [~philipthompson]. > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Stefania > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11851) Table alias not supported
Prajakta Bhosale created CASSANDRA-11851: Summary: Table alias not supported Key: CASSANDRA-11851 URL: https://issues.apache.org/jira/browse/CASSANDRA-11851 Project: Cassandra Issue Type: Bug Components: CQL Environment: [cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0] Reporter: Prajakta Bhosale Priority: Minor Table alias not supported in CQL ... Getting below error message while accessing it ... cqlsh:test>select e.emp_id from emp e; Bad Request: line 1:25 no viable alternative at input 'e' Same query is working with w/o alias name as well as with column alias Below are version details : show version [cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292649#comment-15292649 ] Yuki Morishita commented on CASSANDRA-11750: you are right. Here is 3.0 version. ||branch||testall||dtest|| |[11750-3.0|https://github.com/yukim/cassandra/tree/11750-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11750-3.0-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11750-3.0-dtest/lastCompletedBuild/testReport/]| > Offline scrub should not abort when it hits corruption > -- > > Key: CASSANDRA-11750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11750 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Hattrell >Assignee: Yuki Morishita >Priority: Minor > Labels: Tools > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Hit a failure on startup due to corruption of some sstables in system > keyspace. Deleted the listed file and restarted - came down again with > another file. > Figured that I may as well run scrub to clean up all the files. Got > following error: > {noformat} > sstablescrub system compaction_history > ERROR 17:21:34 Exiting forcefully due to file system exception on startup, > disk failure policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db > > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > [na:1.7.0_79] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_79] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) > ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > ... 14 common frames omitted > {noformat} > I guess it might be by design - but I'd argue that I should at least have the > option to continue and let it do it's thing. I'd prefer that sstablescrub > ignored the disk failure policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-11750: --- Fix Version/s: 3.0.x > Offline scrub should not abort when it hits corruption > -- > > Key: CASSANDRA-11750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11750 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Hattrell >Assignee: Yuki Morishita >Priority: Minor > Labels: Tools > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Hit a failure on startup due to corruption of some sstables in system > keyspace. Deleted the listed file and restarted - came down again with > another file. > Figured that I may as well run scrub to clean up all the files. Got > following error: > {noformat} > sstablescrub system compaction_history > ERROR 17:21:34 Exiting forcefully due to file system exception on startup, > disk failure policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db > > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > [na:1.7.0_79] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_79] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) > ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > ... 14 common frames omitted > {noformat} > I guess it might be by design - but I'd argue that I should at least have the > option to continue and let it do it's thing. I'd prefer that sstablescrub > ignored the disk failure policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11569) Track message latency across DCs
[ https://issues.apache.org/jira/browse/CASSANDRA-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292648#comment-15292648 ] Chris Lohfink commented on CASSANDRA-11569: --- You just described why averages are a terrible statistic to track latencies on. This metric is of "all time" so if theres suddenly a spike in latency the average wont suddenly change since it is averaged with all the previous data. See CASSANDRA-11752 > Track message latency across DCs > > > Key: CASSANDRA-11569 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11569 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Attachments: CASSANDRA-11569.patch, CASSANDRA-11569v2.txt, > nodeLatency.PNG > > > Since we have the timestamp a message is created and when arrives, we can get > an approximate time it took relatively easy and would remove necessity for > more complex hacks to determine latency between DCs. > Although is not going to be very meaningful when ntp is not setup, it is > pretty common to have NTP setup and even with clock drift nothing is really > hurt except the metric becoming whacky. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292613#comment-15292613 ] Stefania commented on CASSANDRA-11731: -- I've created a dtest patch [here|https://github.com/stef1927/cassandra-dtest/commits/11731] and a c* patch for trunk [here|https://github.com/stef1927/cassandra/commits/11731]. If the tests are fine we will need to backport it to 2.2. I've started a run of CI: |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11731-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11731-dtest/]| [~philipthompson] if the CI results are OK can you start another batch of repeated tests for the entire {{TestPushedNotifications}} class? We should also consider adding one more test that checks that when a node joins, all other nodes send a NEW_NODE and that when one node leaves, all other nodes send NODE_LEFT. See comment [here|https://github.com/stef1927/cassandra-dtest/commit/bae01dee9bd399981799c8d17ac671af0ca964e2#diff-2e73564535f1538fb660a5df5635f887R97] for more details. I've also lowered some timeouts since they should be sufficient now that we have changed when NEW_NODE is sent, I hope they are not too low though. [~beobal] this patch should also cover CASSANDRA-11038. > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Stefania > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11850) cannot use cql since upgrading python to 2.7.11+
Andrew Madison created CASSANDRA-11850: -- Summary: cannot use cql since upgrading python to 2.7.11+ Key: CASSANDRA-11850 URL: https://issues.apache.org/jira/browse/CASSANDRA-11850 Project: Cassandra Issue Type: Bug Components: CQL Environment: Development Reporter: Andrew Madison Fix For: 3.5 OS: Debian GNU/Linux stretch/sid Kernel: 4.5.0-2-amd64 #1 SMP Debian 4.5.4-1 (2016-05-16) x86_64 GNU/Linux Python version: 2.7.11+ (default, May 9 2016, 15:54:33) [GCC 5.3.1 20160429] cqlsh --version: cqlsh 5.0.1 cassandra -v: 3.5 (also occurs with 3.0.6) Issue: when running cqlsh, it returns the following error: cqlsh -u dbarpt_usr01 Password: * Connection error: ('Unable to connect to any servers', {'odbasandbox1': TypeError('ref() does not take keyword arguments',)}) I cleared PYTHONPATH: python -c "import json; print dir(json); print json.__version__" ['JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', '_default_decoder', '_default_encoder', 'decoder', 'dump', 'dumps', 'encoder', 'load', 'loads', 'scanner'] 2.0.9 Java based clients can connect to Cassandra with no issue. Just CQLSH and Python clients cannot. nodetool status also works. Thank you for your help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292477#comment-15292477 ] Stefania commented on CASSANDRA-11731: -- I'm going to take a look server side. I know there is an erroneous NEW_NODE when a node restarts (CASSANDRA-11038) but that should be unrelated to MOVE_NODE. > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Philip Thompson > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania reassigned CASSANDRA-11731: Assignee: Stefania (was: Philip Thompson) > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Stefania > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time
[ https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292449#comment-15292449 ] Dikang Gu commented on CASSANDRA-11709: --- [~jkni] Thanks a lot for looking at this! 1. the jstack is token from the nodes did not have gossip disabled, and I attached the full jstack. 2. I will send the logs to your email. 3. The latency increased about several minutes after I re-enable the gossips. It could not recover by itself. I fixed it by rolling restart the cluster. > Lock contention when large number of dead nodes come back within short time > --- > > Key: CASSANDRA-11709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11709 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Joel Knighton > Fix For: 2.2.x, 3.x > > Attachments: lock.jstack > > > We have a few hundreds nodes across 3 data centers, and we are doing a few > millions writes per second into the cluster. > We were trying to simulate a data center failure, by disabling the gossip on > all the nodes in one data center. After ~20mins, I re-enabled the gossip on > those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the > batch. > After that, I saw the latency of read/write requests increased a lot, and > client requests started to timeout. > On the node, I can see there are huge number of pending tasks in GossipStage. > = > 2016-05-02_23:55:08.99515 WARN 23:55:08 Gossip stage has 36337 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:09.36009 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:41:0 state jump to normal > 2016-05-02_23:55:09.99057 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:43:0 state jump to normal > 2016-05-02_23:55:10.09742 WARN 23:55:10 Gossip stage has 36421 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:10.91860 INFO 23:55:10 Node > /2401:db00:2020:717a:face:0:45:0 state jump to normal > 2016-05-02_23:55:11.20100 WARN 23:55:11 Gossip stage has 36558 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:11.57893 INFO 23:55:11 Node > /2401:db00:2030:612a:face:0:49:0 state jump to normal > 2016-05-02_23:55:12.23405 INFO 23:55:12 Node /2401:db00:2020:7189:face:0:7:0 > state jump to normal > > And I took jstack of the node, I found the read/write threads are blocked by > a lock, > read thread == > "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for > monitor entry [0x7fde6f8a1000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546) > - waiting to lock <0x7fe4faef4398> (a > org.apache.cassandra.locator.TokenMetadata) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111) > at > org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521) > at > org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328) > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270) > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195) > at > org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118) > at > org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275) > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457) > at > org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346) > at > org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) > at >
[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time
[ https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292423#comment-15292423 ] Jeremiah Jordan commented on CASSANDRA-11709: - It falls back so you can do a rolling upgrade from PFS to GPFS. > Lock contention when large number of dead nodes come back within short time > --- > > Key: CASSANDRA-11709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11709 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Joel Knighton > Fix For: 2.2.x, 3.x > > Attachments: lock.jstack > > > We have a few hundreds nodes across 3 data centers, and we are doing a few > millions writes per second into the cluster. > We were trying to simulate a data center failure, by disabling the gossip on > all the nodes in one data center. After ~20mins, I re-enabled the gossip on > those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the > batch. > After that, I saw the latency of read/write requests increased a lot, and > client requests started to timeout. > On the node, I can see there are huge number of pending tasks in GossipStage. > = > 2016-05-02_23:55:08.99515 WARN 23:55:08 Gossip stage has 36337 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:09.36009 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:41:0 state jump to normal > 2016-05-02_23:55:09.99057 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:43:0 state jump to normal > 2016-05-02_23:55:10.09742 WARN 23:55:10 Gossip stage has 36421 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:10.91860 INFO 23:55:10 Node > /2401:db00:2020:717a:face:0:45:0 state jump to normal > 2016-05-02_23:55:11.20100 WARN 23:55:11 Gossip stage has 36558 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:11.57893 INFO 23:55:11 Node > /2401:db00:2030:612a:face:0:49:0 state jump to normal > 2016-05-02_23:55:12.23405 INFO 23:55:12 Node /2401:db00:2020:7189:face:0:7:0 > state jump to normal > > And I took jstack of the node, I found the read/write threads are blocked by > a lock, > read thread == > "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for > monitor entry [0x7fde6f8a1000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546) > - waiting to lock <0x7fe4faef4398> (a > org.apache.cassandra.locator.TokenMetadata) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111) > at > org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521) > at > org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328) > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270) > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195) > at > org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118) > at > org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275) > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457) > at > org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346) > at > org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > = writer === > "Thrift:7668" daemon prio=10
[jira] [Updated] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time
[ https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-11709: -- Attachment: lock.jstack > Lock contention when large number of dead nodes come back within short time > --- > > Key: CASSANDRA-11709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11709 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Joel Knighton > Fix For: 2.2.x, 3.x > > Attachments: lock.jstack > > > We have a few hundreds nodes across 3 data centers, and we are doing a few > millions writes per second into the cluster. > We were trying to simulate a data center failure, by disabling the gossip on > all the nodes in one data center. After ~20mins, I re-enabled the gossip on > those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the > batch. > After that, I saw the latency of read/write requests increased a lot, and > client requests started to timeout. > On the node, I can see there are huge number of pending tasks in GossipStage. > = > 2016-05-02_23:55:08.99515 WARN 23:55:08 Gossip stage has 36337 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:09.36009 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:41:0 state jump to normal > 2016-05-02_23:55:09.99057 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:43:0 state jump to normal > 2016-05-02_23:55:10.09742 WARN 23:55:10 Gossip stage has 36421 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:10.91860 INFO 23:55:10 Node > /2401:db00:2020:717a:face:0:45:0 state jump to normal > 2016-05-02_23:55:11.20100 WARN 23:55:11 Gossip stage has 36558 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:11.57893 INFO 23:55:11 Node > /2401:db00:2030:612a:face:0:49:0 state jump to normal > 2016-05-02_23:55:12.23405 INFO 23:55:12 Node /2401:db00:2020:7189:face:0:7:0 > state jump to normal > > And I took jstack of the node, I found the read/write threads are blocked by > a lock, > read thread == > "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for > monitor entry [0x7fde6f8a1000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546) > - waiting to lock <0x7fe4faef4398> (a > org.apache.cassandra.locator.TokenMetadata) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111) > at > org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521) > at > org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328) > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270) > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195) > at > org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118) > at > org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275) > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457) > at > org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346) > at > org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > = writer === > "Thrift:7668" daemon prio=10 tid=0x7fde90d91000 nid=0x50e9 waiting for > monitor entry [0x7fde78bbc000] >
[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time
[ https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292415#comment-15292415 ] Dikang Gu commented on CASSANDRA-11709: --- [~jjordan], yes, it's definite possible. I'm wondering what the reason that GPFS would fall back to the PFS? > Lock contention when large number of dead nodes come back within short time > --- > > Key: CASSANDRA-11709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11709 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Joel Knighton > Fix For: 2.2.x, 3.x > > > We have a few hundreds nodes across 3 data centers, and we are doing a few > millions writes per second into the cluster. > We were trying to simulate a data center failure, by disabling the gossip on > all the nodes in one data center. After ~20mins, I re-enabled the gossip on > those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the > batch. > After that, I saw the latency of read/write requests increased a lot, and > client requests started to timeout. > On the node, I can see there are huge number of pending tasks in GossipStage. > = > 2016-05-02_23:55:08.99515 WARN 23:55:08 Gossip stage has 36337 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:09.36009 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:41:0 state jump to normal > 2016-05-02_23:55:09.99057 INFO 23:55:09 Node > /2401:db00:2020:717a:face:0:43:0 state jump to normal > 2016-05-02_23:55:10.09742 WARN 23:55:10 Gossip stage has 36421 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:10.91860 INFO 23:55:10 Node > /2401:db00:2020:717a:face:0:45:0 state jump to normal > 2016-05-02_23:55:11.20100 WARN 23:55:11 Gossip stage has 36558 pending > tasks; skipping status check (no nodes will be marked down) > 2016-05-02_23:55:11.57893 INFO 23:55:11 Node > /2401:db00:2030:612a:face:0:49:0 state jump to normal > 2016-05-02_23:55:12.23405 INFO 23:55:12 Node /2401:db00:2020:7189:face:0:7:0 > state jump to normal > > And I took jstack of the node, I found the read/write threads are blocked by > a lock, > read thread == > "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for > monitor entry [0x7fde6f8a1000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546) > - waiting to lock <0x7fe4faef4398> (a > org.apache.cassandra.locator.TokenMetadata) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111) > at > org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526) > at > org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521) > at > org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328) > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270) > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195) > at > org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118) > at > org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275) > at > org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457) > at > org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346) > at > org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > = writer === > "Thrift:7668" daemon prio=10
[jira] [Issue Comment Deleted] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted
[ https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton updated CASSANDRA-11742: -- Comment: was deleted (was: I think this second patch is an improvement - I traced this issue to determine exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] which centralized Cassandra startup checks. Prior to this change, we inserted cluster name directly after checking the health of the system keyspace, so if an sstable for the system keyspace was flushed, we could guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name with the rest of the local metadata in {{SystemKeyspace.finishStartup()}}. [~beobal] - I couldn't find a reason for the change as to when cluster name is inserted other than that it didn't seem like a good idea to mutate anything in a startup check. Can you think of any reason we can't just call {{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the system keyspace in {{CassandraDaemon}}? The root cause of this problem is that we need the data persisted before any truncate/schema logic, since these will write to the system keyspace, so we can have flushed sstables with this data but no sstable with cluster name, which breaks the logic of the system keyspace health check. I ran full unit tests/dtests on a branch that moved {{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of the system keyspace and the results looked good.) > Failed bootstrap results in exception when node is restarted > > > Key: CASSANDRA-11742 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11742 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > Attachments: 11742-2.txt, 11742.txt > > > Since 2.2 a failed bootstrap results in a > {{org.apache.cassandra.exceptions.ConfigurationException: Found system > keyspace files, but they couldn't be loaded!}} exception when the node is > restarted. This did not happen in 2.1, it just tried to bootstrap again. I > know that the workaround is relatively easy, just delete the system keyspace > in the data folder on disk and try again, but its a bit annoying that you > have to do that. > The problem seems to be that the creation of the {{system.local}} table has > been moved to just before the bootstrap begins (in 2.1 it was done much > earlier) and as a result its still in the memtable och commitlog if the > bootstrap failes. Still a few values is inserted to the {{system.local}} > table at an earlier point in the startup and they have been flushed from the > memtable to an sstable. When the node is restarted the > {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed > and therefore only see the sstable with an incomplete {{system.local}} table > and throws an exception. > I think we could fix this very easily by forceFlush the system keyspace in > the {{StorageServiceShutdownHook}}, I have included a patch that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted
[ https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292309#comment-15292309 ] Joel Knighton commented on CASSANDRA-11742: --- I think this second patch is an improvement - I traced this issue to determine exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] which centralized Cassandra startup checks. Prior to this change, we inserted cluster name directly after checking the health of the system keyspace, so if an sstable for the system keyspace was flushed, we could guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name with the rest of the local metadata in {{SystemKeyspace.finishStartup}}. [~beobal] - I couldn't find a reason for the change as to when cluster name is inserted other than that it didn't seem like a good idea to mutate anything in a startup check. Can you think of any reason we can't just call {{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the system keyspace in {{CassandraDaemon}}? The root cause of this problem is that we need the data persisted before any truncate/schema logic, since these will write to the system keyspace, so we can have flushed sstables with this data but no sstable with cluster name, which breaks the logic of the system keyspace health check. I ran full unit tests/dtests on a branch that moved {{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of the system keyspace and the results looked good. > Failed bootstrap results in exception when node is restarted > > > Key: CASSANDRA-11742 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11742 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > Attachments: 11742-2.txt, 11742.txt > > > Since 2.2 a failed bootstrap results in a > {{org.apache.cassandra.exceptions.ConfigurationException: Found system > keyspace files, but they couldn't be loaded!}} exception when the node is > restarted. This did not happen in 2.1, it just tried to bootstrap again. I > know that the workaround is relatively easy, just delete the system keyspace > in the data folder on disk and try again, but its a bit annoying that you > have to do that. > The problem seems to be that the creation of the {{system.local}} table has > been moved to just before the bootstrap begins (in 2.1 it was done much > earlier) and as a result its still in the memtable och commitlog if the > bootstrap failes. Still a few values is inserted to the {{system.local}} > table at an earlier point in the startup and they have been flushed from the > memtable to an sstable. When the node is restarted the > {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed > and therefore only see the sstable with an incomplete {{system.local}} table > and throws an exception. > I think we could fix this very easily by forceFlush the system keyspace in > the {{StorageServiceShutdownHook}}, I have included a patch that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted
[ https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292308#comment-15292308 ] Joel Knighton commented on CASSANDRA-11742: --- I think this second patch is an improvement - I traced this issue to determine exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] which centralized Cassandra startup checks. Prior to this change, we inserted cluster name directly after checking the health of the system keyspace, so if an sstable for the system keyspace was flushed, we could guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name with the rest of the local metadata in {{SystemKeyspace.finishStartup()}}. [~beobal] - I couldn't find a reason for the change as to when cluster name is inserted other than that it didn't seem like a good idea to mutate anything in a startup check. Can you think of any reason we can't just call {{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the system keyspace in {{CassandraDaemon}}? The root cause of this problem is that we need the data persisted before any truncate/schema logic, since these will write to the system keyspace, so we can have flushed sstables with this data but no sstable with cluster name, which breaks the logic of the system keyspace health check. I ran full unit tests/dtests on a branch that moved {{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of the system keyspace and the results looked good. > Failed bootstrap results in exception when node is restarted > > > Key: CASSANDRA-11742 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11742 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > Attachments: 11742-2.txt, 11742.txt > > > Since 2.2 a failed bootstrap results in a > {{org.apache.cassandra.exceptions.ConfigurationException: Found system > keyspace files, but they couldn't be loaded!}} exception when the node is > restarted. This did not happen in 2.1, it just tried to bootstrap again. I > know that the workaround is relatively easy, just delete the system keyspace > in the data folder on disk and try again, but its a bit annoying that you > have to do that. > The problem seems to be that the creation of the {{system.local}} table has > been moved to just before the bootstrap begins (in 2.1 it was done much > earlier) and as a result its still in the memtable och commitlog if the > bootstrap failes. Still a few values is inserted to the {{system.local}} > table at an earlier point in the startup and they have been flushed from the > memtable to an sstable. When the node is restarted the > {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed > and therefore only see the sstable with an incomplete {{system.local}} table > and throws an exception. > I think we could fix this very easily by forceFlush the system keyspace in > the {{StorageServiceShutdownHook}}, I have included a patch that does this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11719) Add bind variables to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292237#comment-15292237 ] Mahdi Mohammadi commented on CASSANDRA-11719: - [~snazy] The test file TraceCqlTest exists only on trunk branch. Should I create my branch off trunk? > Add bind variables to trace > --- > > Key: CASSANDRA-11719 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11719 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Mahdi Mohammadi >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 11719-2.1.patch > > > {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a > _TODO_ saying "we don't have [typed] access to CQL bind variables here". > In fact, we now have access typed access to CQL bind variables there. So, it > is now possible to show the bind variables in the trace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292221#comment-15292221 ] Jeremiah Jordan edited comment on CASSANDRA-11750 at 5/19/16 9:53 PM: -- [~yukim] is the a reason for not putting this in 3.0 as well? Seems strange to not merge the change all the way forward and only have it in 2.1/2.2/3.8? was (Author: jjordan): [~yukim] is the a reason for not putting this in 3.0 as well? Seems strange to not merge the change all the way forward and have it in 2.1/2.2/3.8? > Offline scrub should not abort when it hits corruption > -- > > Key: CASSANDRA-11750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11750 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Hattrell >Assignee: Yuki Morishita >Priority: Minor > Labels: Tools > Fix For: 2.1.x, 2.2.x > > > Hit a failure on startup due to corruption of some sstables in system > keyspace. Deleted the listed file and restarted - came down again with > another file. > Figured that I may as well run scrub to clean up all the files. Got > following error: > {noformat} > sstablescrub system compaction_history > ERROR 17:21:34 Exiting forcefully due to file system exception on startup, > disk failure policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db > > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > [na:1.7.0_79] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_79] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) > ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > ... 14 common frames omitted > {noformat} > I guess it might be by design - but I'd argue that I should at least have the > option to continue and let it do it's thing. I'd prefer that sstablescrub > ignored the disk failure policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption
[ https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292221#comment-15292221 ] Jeremiah Jordan commented on CASSANDRA-11750: - [~yukim] is the a reason for not putting this in 3.0 as well? Seems strange to not merge the change all the way forward and have it in 2.1/2.2/3.8? > Offline scrub should not abort when it hits corruption > -- > > Key: CASSANDRA-11750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11750 > Project: Cassandra > Issue Type: Bug >Reporter: Adam Hattrell >Assignee: Yuki Morishita >Priority: Minor > Labels: Tools > Fix For: 2.1.x, 2.2.x > > > Hit a failure on startup due to corruption of some sstables in system > keyspace. Deleted the listed file and restarted - came down again with > another file. > Figured that I may as well run scrub to clean up all the files. Got > following error: > {noformat} > sstablescrub system compaction_history > ERROR 17:21:34 Exiting forcefully due to file system exception on startup, > disk failure policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db > > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at > org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > [na:1.7.0_79] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_79] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_79] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) > ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] > at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) > ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] > ... 14 common frames omitted > {noformat} > I guess it might be by design - but I'd argue that I should at least have the > option to continue and let it do it's thing. I'd prefer that sstablescrub > ignored the disk failure policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292145#comment-15292145 ] Paulo Motta commented on CASSANDRA-11845: - Unfortunately it's not possible to track down the cause from these logs your posted. You'll need to [enable DEBUG logging|https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html] on the {{org.apache.cassandra.streaming}} and {{org.apache.cassandra.repair}} packages and attach full debug.log on this ticket (you should use the attach files functionality of JIRA instead of pasting logs on the comments). Please note that to cancel hanged repair you'll probably need to restart involved nodes first before starting a new repair (stop repair functionality will be provided by CASSANDRA-3486). > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
[jira] [Updated] (CASSANDRA-11849) Potential data directory problems due to CFS getDirectories logic
[ https://issues.apache.org/jira/browse/CASSANDRA-11849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-11849: --- Description: CASSANDRA-8671 added the ability to change the data directory based on the compaction strategy. Since nothing uses this yet we haven't hit any issues but reading the code I see potential bugs for things like Transaction log cleanup and CFS initialization since these all use the default {{Directories}} location from the yaml. * {{Directories}} is passed into CFS constructor then possibly disregarded. * Startup checks like scrubDataDirectories are all using default Directories locations. * StandaloneSSTableUtil was: CASSANDRA-8671 added the ability to change the data directory based on the compaction strategy. Since nothing uses this yet we haven't hit any issues but reading the code I see potential bugs for things like Transaction log cleanup and CFA initialization since these all use the default {{Directories}} location from the yaml. * {{Directories}} is passed into CFS constructor then possibly disregarded. * Startup checks like scrubDataDirectories are all using default Directories locations. * StandaloneSSTableUtil > Potential data directory problems due to CFS getDirectories logic > - > > Key: CASSANDRA-11849 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11849 > Project: Cassandra > Issue Type: Bug >Reporter: T Jake Luciani >Assignee: Blake Eggleston > > CASSANDRA-8671 added the ability to change the data directory based on the > compaction strategy. > Since nothing uses this yet we haven't hit any issues but reading the code I > see potential bugs for things like Transaction log cleanup and CFS > initialization since these all use the default {{Directories}} location from > the yaml. > * {{Directories}} is passed into CFS constructor then possibly disregarded. > * Startup checks like scrubDataDirectories are all using default Directories > locations. > * StandaloneSSTableUtil -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11849) Potential data directory problems due to CFS getDirectories logic
T Jake Luciani created CASSANDRA-11849: -- Summary: Potential data directory problems due to CFS getDirectories logic Key: CASSANDRA-11849 URL: https://issues.apache.org/jira/browse/CASSANDRA-11849 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Blake Eggleston CASSANDRA-8671 added the ability to change the data directory based on the compaction strategy. Since nothing uses this yet we haven't hit any issues but reading the code I see potential bugs for things like Transaction log cleanup and CFA initialization since these all use the default {{Directories}} location from the yaml. * {{Directories}} is passed into CFS constructor then possibly disregarded. * Startup checks like scrubDataDirectories are all using default Directories locations. * StandaloneSSTableUtil -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything
[ https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-11848: Assignee: Paulo Motta > replace address can "succeed" without actually streaming anything > - > > Key: CASSANDRA-11848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jeremiah Jordan >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > When you do a replace address and the new node has the same IP as the node it > is replacing, then the following check can let the replace be successful even > if we think all the other nodes are down: > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271 > As the FailureDetectorSourceFilter will exclude the other nodes, so an empty > stream plan gets executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything
[ https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-11848: Description: When you do a replace address and the new node has the same IP as the node it is replacing, then the following check can let the replace be successful even if we think all the other nodes are down: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271 As the FailureDetectorSourceFilter will exclude the other nodes, so an empty stream plan gets executed. was:When you do a replace address and the new node has the same IP as the node it is replacing, then the following check can let the replace be successful even if we think all the other nodes are down: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271 > replace address can "succeed" without actually streaming anything > - > > Key: CASSANDRA-11848 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11848 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jeremiah Jordan > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > When you do a replace address and the new node has the same IP as the node it > is replacing, then the following check can let the replace be successful even > if we think all the other nodes are down: > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271 > As the FailureDetectorSourceFilter will exclude the other nodes, so an empty > stream plan gets executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything
Jeremiah Jordan created CASSANDRA-11848: --- Summary: replace address can "succeed" without actually streaming anything Key: CASSANDRA-11848 URL: https://issues.apache.org/jira/browse/CASSANDRA-11848 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Jeremiah Jordan Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x When you do a replace address and the new node has the same IP as the node it is replacing, then the following check can let the replace be successful even if we think all the other nodes are down: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:13 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-3 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-3 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-3 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2 and /Node-1 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a nd /Node-1 on
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:12 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-3 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-3 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-3 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2 and /Node-1 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a nd /Node-1 on
[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment
[ https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291965#comment-15291965 ] Jeff Jirsa commented on CASSANDRA-11847: It definitely looks a lot like a hardware problem, but even if it weren't Cassandra 2.0 isn't supported anymore. Not even critical fixes. You'd need to re-open if you can replicate the problem in 2.1+ > Cassandra dies on a specific node in a multi-DC environment > --- > > Key: CASSANDRA-11847 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11847 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Core > Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15 >Reporter: Rajesh Babu > Attachments: java_error19030.log, java_error2912.log, > java_error4571.log, java_error7539.log, java_error9552.log > > > We've a customer who runs a 16 node 2 DC (8 nodes each) environment where > Cassandra pid dies randomly but on a specific node. > Whenever Cassandra dies, admin has to manually restart Cassandra only on that > node. > I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 > (patch 79) but it still seems to be an issue. > Is this a known hardware related bug or should is this issue fixed in later > Cassandra versions? > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libjava.so+0xe027f] _fini+0xbd5f7 > # > # Core dump written. Default location: /tmp/core or core.19030 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f453c89f000): JavaThread "COMMIT-LOG-WRITER" > [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f4542d5a27f > Registers: > RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, > RDX=0x0020 > RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, > RDI=0x0001 > R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, > R11=0x0006fae57068 > R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, > R15=0x7f453c89f000 > RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0014 > TRAPNO=0x000e > - > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C 0x7f28e08787a4 > # > # Core dump written. Default location: /tmp/core or core.2912 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f2640008000): JavaThread "ValidationExecutor:15" > daemon [_thread_in_Java, id=7393, > stack(0x7f256fdf8000,0x7f256fe39000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f28e08787a4 > Registers: > RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, > RDX=0xc770409a > RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, > RDI=0x > R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, > R11=0x3f8bb878 > R12=0x, R13=0x00063b820530, R14=0x000b, > R15=0x7f2640008000 > RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0015 > TRAPNO=0x000e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:12 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-3 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-3 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-3 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2 and /Node-1 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a nd /Node-1 on
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:11 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-1 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-1 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-1 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2 and /192.168.200.151 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a nd
[jira] [Comment Edited] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291224#comment-15291224 ] Philip Thompson edited comment on CASSANDRA-11760 at 5/19/16 7:04 PM: -- I'm re-running the tests that found this, to see if it comes up again. They take about 3-4 hours. EDIT: Re-re-running the tests. They ran against the old sha, not the one with the fix from 11613. was (Author: philipthompson): I'm re-running the tests that found this, to see if it comes up again. They take about 3-4 hours. > dtest failure in > TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test > > > Key: CASSANDRA-11760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11760 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Tyler Hobbs > Labels: dtest > Fix For: 3.6 > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, > node3.log, node3_debug.log > > > example failure: > http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/ > I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant > failure stack trace extracted here: > {code} > ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33 > 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe > rvice-Incoming-/127.0.0.1,5,main] > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91) > ~[apache-cassandra-2.2.6.jar:2.2.6] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291946#comment-15291946 ] Philip Thompson commented on CASSANDRA-11731: - So, it still fails: http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/105/testReport/ just less frequently. I'm working off this branch: https://github.com/riptano/cassandra-dtest/tree/fix-11731 I think the reduced flake rate is just from the longer waiting, but this didn't fix the root issue. > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Philip Thompson > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:02 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-1 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-1 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-1 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 and /192.168.200.151 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 6:59 PM: Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-1 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-1 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-1 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep - iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 and /192.168.200.151 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939 ] vin01 commented on CASSANDRA-11845: --- Yeah, its still stuck at 55 % . No new streams are getting created, netstats shows the same output again n again. Only thing that changes in its output is :- Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Here is a longer snippet of netstats output which shows the repair session as well, it has been the same for last 8 or so hrs :- Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-1 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-1 Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-1 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-1 Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14760878 Gossip messages n/a 0 151698 Snippet for system.log using grep -iE "repair|valid|sync" system.log :- INFO [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 and /192.168.200.151 on TABLE_NAME INFO [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - [repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished INFO [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session a0e5df00-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 a nd /192.168.200.151 on TABLE_NAME INFO
[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment
[ https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291882#comment-15291882 ] Rajesh Babu commented on CASSANDRA-11847: - It is a physical hardware (private cloud) Manufacturer: Quanta Computer Inc Product Name: QuantaPlex T41S-2U I indeed thought initially it was a RAM related issue and I swapped the RAM on that node with "SAMSUNG 16GB 288-Pin DDR4 SDRAM ECC Registered DDR4 2133 (PC4 17000) Server Memory Model M393A2G40DB0-CPB" but that didn't help either. Server was stable for 3 days or so and then again Cassandra died. I just wanted to see if this issue is caused by Cassandra software (may be fixed in later versions, may be 2.0.17?) > Cassandra dies on a specific node in a multi-DC environment > --- > > Key: CASSANDRA-11847 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11847 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Core > Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15 >Reporter: Rajesh Babu > Attachments: java_error19030.log, java_error2912.log, > java_error4571.log, java_error7539.log, java_error9552.log > > > We've a customer who runs a 16 node 2 DC (8 nodes each) environment where > Cassandra pid dies randomly but on a specific node. > Whenever Cassandra dies, admin has to manually restart Cassandra only on that > node. > I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 > (patch 79) but it still seems to be an issue. > Is this a known hardware related bug or should is this issue fixed in later > Cassandra versions? > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libjava.so+0xe027f] _fini+0xbd5f7 > # > # Core dump written. Default location: /tmp/core or core.19030 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f453c89f000): JavaThread "COMMIT-LOG-WRITER" > [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f4542d5a27f > Registers: > RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, > RDX=0x0020 > RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, > RDI=0x0001 > R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, > R11=0x0006fae57068 > R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, > R15=0x7f453c89f000 > RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0014 > TRAPNO=0x000e > - > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C 0x7f28e08787a4 > # > # Core dump written. Default location: /tmp/core or core.2912 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f2640008000): JavaThread "ValidationExecutor:15" > daemon [_thread_in_Java, id=7393, > stack(0x7f256fdf8000,0x7f256fe39000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f28e08787a4 > Registers: > RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, > RDX=0xc770409a > RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, > RDI=0x > R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, > R11=0x3f8bb878 > R12=0x, R13=0x00063b820530, R14=0x000b, > R15=0x7f2640008000 > RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0015 > TRAPNO=0x000e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)
[ https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-8844: --- Status: Patch Available (was: Open) Setting back to Patch Available. There is now an implemented solution for the size tracking problems listed above. The branch is post-rebase of the addition of lower/upper bound to segments, and tests are in a mostly complete place. Have 3 failed dtests and 7 failed in testall that I believe are unrelated (read: flakey) but I'm going to track down each locally to confirm. I've fixed the CreateTest and CommitLogStressTest since the last CI run. No sense in paying for another run until I've confirmed these final 10 tests aren't a problem from the branch. > Change Data Capture (CDC) > - > > Key: CASSANDRA-8844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8844 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Local Write-Read Paths >Reporter: Tupshin Harper >Assignee: Joshua McKenzie >Priority: Critical > Fix For: 3.x > > > "In databases, change data capture (CDC) is a set of software design patterns > used to determine (and track) the data that has changed so that action can be > taken using the changed data. Also, Change data capture (CDC) is an approach > to data integration that is based on the identification, capture and delivery > of the changes made to enterprise data sources." > -Wikipedia > As Cassandra is increasingly being used as the Source of Record (SoR) for > mission critical data in large enterprises, it is increasingly being called > upon to act as the central hub of traffic and data flow to other systems. In > order to try to address the general need, we (cc [~brianmhess]), propose > implementing a simple data logging mechanism to enable per-table CDC patterns. > h2. The goals: > # Use CQL as the primary ingestion mechanism, in order to leverage its > Consistency Level semantics, and in order to treat it as the single > reliable/durable SoR for the data. > # To provide a mechanism for implementing good and reliable > (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) > continuous semi-realtime feeds of mutations going into a Cassandra cluster. > # To eliminate the developmental and operational burden of users so that they > don't have to do dual writes to other systems. > # For users that are currently doing batch export from a Cassandra system, > give them the opportunity to make that realtime with a minimum of coding. > h2. The mechanism: > We propose a durable logging mechanism that functions similar to a commitlog, > with the following nuances: > - Takes place on every node, not just the coordinator, so RF number of copies > are logged. > - Separate log per table. > - Per-table configuration. Only tables that are specified as CDC_LOG would do > any logging. > - Per DC. We are trying to keep the complexity to a minimum to make this an > easy enhancement, but most likely use cases would prefer to only implement > CDC logging in one (or a subset) of the DCs that are being replicated to > - In the critical path of ConsistencyLevel acknowledgment. Just as with the > commitlog, failure to write to the CDC log should fail that node's write. If > that means the requested consistency level was not met, then clients *should* > experience UnavailableExceptions. > - Be written in a Row-centric manner such that it is easy for consumers to > reconstitute rows atomically. > - Written in a simple format designed to be consumed *directly* by daemons > written in non JVM languages > h2. Nice-to-haves > I strongly suspect that the following features will be asked for, but I also > believe that they can be deferred for a subsequent release, and to guage > actual interest. > - Multiple logs per table. This would make it easy to have multiple > "subscribers" to a single table's changes. A workaround would be to create a > forking daemon listener, but that's not a great answer. > - Log filtering. Being able to apply filters, including UDF-based filters > would make Casandra a much more versatile feeder into other systems, and > again, reduce complexity that would otherwise need to be built into the > daemons. > h2. Format and Consumption > - Cassandra would only write to the CDC log, and never delete from it. > - Cleaning up consumed logfiles would be the client daemon's responibility > - Logfile size should probably be configurable. > - Logfiles should be named with a predictable naming schema, making it > triivial to process them in order. > - Daemons should be able to checkpoint their work, and resume from where they > left off. This means they would have to leave some file artifact in the CDC > log's directory. > - A
[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment
[ https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291694#comment-15291694 ] Jeff Jirsa commented on CASSANDRA-11847: Your crashes are all over cassandra (commitlog, mutation, compaction) - the most likely cause is bad hardware (bad memory, for example). Physical hardware / home grown VM or public cloud? ECC RAM? > Cassandra dies on a specific node in a multi-DC environment > --- > > Key: CASSANDRA-11847 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11847 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Core > Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15 >Reporter: Rajesh Babu > Attachments: java_error19030.log, java_error2912.log, > java_error4571.log, java_error7539.log, java_error9552.log > > > We've a customer who runs a 16 node 2 DC (8 nodes each) environment where > Cassandra pid dies randomly but on a specific node. > Whenever Cassandra dies, admin has to manually restart Cassandra only on that > node. > I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 > (patch 79) but it still seems to be an issue. > Is this a known hardware related bug or should is this issue fixed in later > Cassandra versions? > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libjava.so+0xe027f] _fini+0xbd5f7 > # > # Core dump written. Default location: /tmp/core or core.19030 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f453c89f000): JavaThread "COMMIT-LOG-WRITER" > [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f4542d5a27f > Registers: > RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, > RDX=0x0020 > RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, > RDI=0x0001 > R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, > R11=0x0006fae57068 > R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, > R15=0x7f453c89f000 > RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0014 > TRAPNO=0x000e > - > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C 0x7f28e08787a4 > # > # Core dump written. Default location: /tmp/core or core.2912 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f2640008000): JavaThread "ValidationExecutor:15" > daemon [_thread_in_Java, id=7393, > stack(0x7f256fdf8000,0x7f256fe39000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f28e08787a4 > Registers: > RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, > RDX=0xc770409a > RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, > RDI=0x > R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, > R11=0x3f8bb878 > R12=0x, R13=0x00063b820530, R14=0x000b, > R15=0x7f2640008000 > RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0015 > TRAPNO=0x000e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment
[ https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291687#comment-15291687 ] Rajesh Babu commented on CASSANDRA-11847: - Cassandra system log indicates the below, before Cassandra process id dies INFO [CompactionExecutor:49] 2016-05-10 14:06:37,074 CompactionTask.java (line 115) Compacting [SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-266-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-267-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-268-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-265-Data.db')] INFO [CompactionExecutor:49] 2016-05-10 14:06:37,191 CompactionTask.java (line 287) Compacted 4 sstables to [/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-269,]. 742,551 bytes to 256,142 (~34% of original) in 116ms = 2.105828MB/s. 7,348 total partitions merged to 2,845. Partition merge counts were {1:7348, } INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,693 ThriftServer.java (line 141) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,749 Server.java (line 182) Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,749 Gossiper.java (line 1307) Announcing shutdown INFO [main] 2016-05-10 14:24:30,997 CassandraDaemon.java (line 135) Logging initialized INFO [main] 2016-05-10 14:24:31,028 YamlConfigurationLoader.java (line 80) Loading settings from file:/opt/cloudian-packages/apache-cassandra-2.0.11/conf/cassandra.yaml > Cassandra dies on a specific node in a multi-DC environment > --- > > Key: CASSANDRA-11847 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11847 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Core > Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15 >Reporter: Rajesh Babu > Attachments: java_error19030.log, java_error2912.log, > java_error4571.log, java_error7539.log, java_error9552.log > > > We've a customer who runs a 16 node 2 DC (8 nodes each) environment where > Cassandra pid dies randomly but on a specific node. > Whenever Cassandra dies, admin has to manually restart Cassandra only on that > node. > I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 > (patch 79) but it still seems to be an issue. > Is this a known hardware related bug or should is this issue fixed in later > Cassandra versions? > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libjava.so+0xe027f] _fini+0xbd5f7 > # > # Core dump written. Default location: /tmp/core or core.19030 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread (0x7f453c89f000): JavaThread "COMMIT-LOG-WRITER" > [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)] > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), > si_addr=0x7f4542d5a27f > Registers: > RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, > RDX=0x0020 > RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, > RDI=0x0001 > R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, > R11=0x0006fae57068 > R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, > R15=0x7f453c89f000 > RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, > ERR=0x0014 > TRAPNO=0x000e > - > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C 0x7f28e08787a4 > # > # Core dump written. Default location: /tmp/core or core.2912 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > --- T H R E A D --- > Current thread
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291686#comment-15291686 ] Paulo Motta commented on CASSANDRA-11845: - [~vin01] so, {{nodetool netstats}} does no longer show ongoing stream sessions? is the repair still hanging at 55% or has it progressed? If so, you'll probably need to attach your system.log for further investigation, since it's not possible to detect at which stage the repair is hanging from the data you provided so far. You may want to use grep to filter the log with {{grep -i 'repair\|valid\|sync' logs/system.log}} > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db > 399865/399865 bytes(100%) sent to
[jira] [Created] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment
Rajesh Babu created CASSANDRA-11847: --- Summary: Cassandra dies on a specific node in a multi-DC environment Key: CASSANDRA-11847 URL: https://issues.apache.org/jira/browse/CASSANDRA-11847 Project: Cassandra Issue Type: Bug Components: Compaction, Core Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15 Reporter: Rajesh Babu Attachments: java_error19030.log, java_error2912.log, java_error4571.log, java_error7539.log, java_error9552.log We've a customer who runs a 16 node 2 DC (8 nodes each) environment where Cassandra pid dies randomly but on a specific node. Whenever Cassandra dies, admin has to manually restart Cassandra only on that node. I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 (patch 79) but it still seems to be an issue. Is this a known hardware related bug or should is this issue fixed in later Cassandra versions? # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896 # # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libjava.so+0xe027f] _fini+0xbd5f7 # # Core dump written. Default location: /tmp/core or core.19030 # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x7f453c89f000): JavaThread "COMMIT-LOG-WRITER" [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), si_addr=0x7f4542d5a27f Registers: RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, RDX=0x0020 RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, RDI=0x0001 R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, R11=0x0006fae57068 R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, R15=0x7f453c89f000 RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, ERR=0x0014 TRAPNO=0x000e - # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712 # # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x7f28e08787a4 # # Core dump written. Default location: /tmp/core or core.2912 # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x7f2640008000): JavaThread "ValidationExecutor:15" daemon [_thread_in_Java, id=7393, stack(0x7f256fdf8000,0x7f256fe39000)] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), si_addr=0x7f28e08787a4 Registers: RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, RDX=0xc770409a RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, RDI=0x R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, R11=0x3f8bb878 R12=0x, R13=0x00063b820530, R14=0x000b, R15=0x7f2640008000 RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, ERR=0x0015 TRAPNO=0x000e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
[ https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291633#comment-15291633 ] Russ Hatch commented on CASSANDRA-11690: looks to be the same as 11686 > dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc > --- > > Key: CASSANDRA-11690 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11690 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Russ Hatch > Labels: dtest > > looks like a possible resource constraint issue: > {noformat} > [Errno 12] Cannot allocate memory > {noformat} > more than one failure in recent history. > http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
[ https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch resolved CASSANDRA-11690. Resolution: Duplicate > dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc > --- > > Key: CASSANDRA-11690 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11690 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Russ Hatch > Labels: dtest > > looks like a possible resource constraint issue: > {noformat} > [Errno 12] Cannot allocate memory > {noformat} > more than one failure in recent history. > http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
[ https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch reassigned CASSANDRA-11690: -- Assignee: Russ Hatch (was: DS Test Eng) > dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc > --- > > Key: CASSANDRA-11690 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11690 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Russ Hatch > Labels: dtest > > looks like a possible resource constraint issue: > {noformat} > [Errno 12] Cannot allocate memory > {noformat} > more than one failure in recent history. > http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291564#comment-15291564 ] vin01 edited comment on CASSANDRA-11845 at 5/19/16 5:24 PM: [-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats pending tasks: 0 (output of compactionstats is same on all 3 nodes) Its still stuck at same point. nodetool netstats output summary :- Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14758741 Gossip messages n/a 0 135056 was (Author: vin01): [-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats pending tasks: 0 Its still stuck at same point. nodetool netstats output summary :- Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14758741 Gossip messages n/a 0 135056 > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > >
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291578#comment-15291578 ] Olivier Michallat commented on CASSANDRA-10786: --- +1, it's an elegant way to avoid an extra roundtrip when the schema changed. And since it does affect the format of protocol messages, there's no risk of "forgetting" to cover it when implementing protocol v5, like [~adutra] suggested above. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291564#comment-15291564 ] vin01 commented on CASSANDRA-11845: --- [-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats pending tasks: 0 Its still stuck at same point. nodetool netstats output summary :- Read Repair Statistics: Attempted: 1142 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Large messages n/a 0779 Small messages n/a 0 14758741 Gossip messages n/a 0 135056 > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > >
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291536#comment-15291536 ] Paulo Motta commented on CASSANDRA-11845: - [~vin01] can you check the output of {{nodetool compactionstats}} on the receiving node, and check if there are secondary indexes being rebuilt? > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db > 399865/399865 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db > 149066/149066 bytes(100%) sent to idx:0/Node-2 > >
[jira] [Commented] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291485#comment-15291485 ] Tyler Hobbs commented on CASSANDRA-11846: - This may be related to CASSANDRA-8779 > Invalid QueryBuilder.insert is not invalidated which causes OOM > --- > > Key: CASSANDRA-11846 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11846 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra-2.1.14 >Reporter: ZhaoYang >Priority: Minor > Fix For: 2.1.15 > > > create table test( key text primary key, value list ); > When using QueryBuilder.Insert() to bind column `value` with a blob, > Cassandra didn't consider it to be an invalid query and then lead to OOM and > crashed. > the same plain query(String) can be invalidated by Cassandra and C* responds > InvalidQuery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11687) dtest failure in rebuild_test.TestRebuild.simple_rebuild_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291459#comment-15291459 ] Russ Hatch commented on CASSANDRA-11687: 1 test did fail out of 100 trials, so this test does need some repair. > dtest failure in rebuild_test.TestRebuild.simple_rebuild_test > - > > Key: CASSANDRA-11687 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11687 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Russ Hatch > Labels: dtest > > single failure on most recent run (3.0 no-vnode) > {noformat} > concurrent rebuild should not be allowed, but one rebuild command should have > succeeded. > {noformat} > http://cassci.datastax.com/job/cassandra-3.0_novnode_dtest/217/testReport/rebuild_test/TestRebuild/simple_rebuild_test > Failed on CassCI build cassandra-3.0_novnode_dtest #217 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291457#comment-15291457 ] vin01 commented on CASSANDRA-11845: --- Because of '-XX:+PerfDisableSharedMem' its not possible to use jstack or any similar tools i guess. Also debug logging is not enabled.. so nothing in debug.log, i don't think log level can be changed at runtime.. And yes there are secondary indices in that table. > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db > 399865/399865 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db > 149066/149066 bytes(100%) sent to
[jira] [Commented] (CASSANDRA-11686) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
[ https://issues.apache.org/jira/browse/CASSANDRA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291447#comment-15291447 ] Russ Hatch commented on CASSANDRA-11686: There were no failures when running a multiplex job on larger instances, so the problem does look like ccm node crowding, as I mentioned. I'll get these tests moved to the large test job. > dtest failure in > replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc > -- > > Key: CASSANDRA-11686 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11686 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Russ Hatch > Labels: dtest > > intermittent failure. this test also fails on windows but looks to be for > another reason (CASSANDRA-11439) > http://cassci.datastax.com/job/cassandra-3.0_dtest/682/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc/ > {noformat} > Nodetool command '/home/automaton/cassandra/bin/nodetool -h localhost -p 7400 > getendpoints testing rf_test dummy' failed; exit status: 1; stderr: nodetool: > Failed to connect to 'localhost:7400' - ConnectException: 'Connection > refused'. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-11846: - Description: create table test( key text primary key, value list ); When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra didn't consider it to be an invalid query and then lead to OOM and crashed. the same plain query(String) can be invalidated by Cassandra and C* responds InvalidQuery. was: create table test{ key text primary key, value list }; When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra didn't consider it to be an invalid query and then lead to OOM and crashed. the same plain query(String) can be invalidated by Cassandra and C* responds InvalidQuery. > Invalid QueryBuilder.insert is not invalidated which causes OOM > --- > > Key: CASSANDRA-11846 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11846 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cassandra-2.1.14 >Reporter: ZhaoYang >Priority: Minor > Fix For: 2.1.15 > > > create table test( key text primary key, value list ); > When using QueryBuilder.Insert() to bind column `value` with a blob, > Cassandra didn't consider it to be an invalid query and then lead to OOM and > crashed. > the same plain query(String) can be invalidated by Cassandra and C* responds > InvalidQuery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM
ZhaoYang created CASSANDRA-11846: Summary: Invalid QueryBuilder.insert is not invalidated which causes OOM Key: CASSANDRA-11846 URL: https://issues.apache.org/jira/browse/CASSANDRA-11846 Project: Cassandra Issue Type: Bug Components: CQL Environment: cassandra-2.1.14 Reporter: ZhaoYang Priority: Minor Fix For: 2.1.15 create table test{ key text primary key, value list }; When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra didn't consider it to be an invalid query and then lead to OOM and crashed. the same plain query(String) can be invalidated by Cassandra and C* responds InvalidQuery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291428#comment-15291428 ] Benedict commented on CASSANDRA-11521: -- I would also like to voice my support for a separate path. The two needs are really quite distinct, and while optimising the normal read path is definitely something we should be exploring in general, complicating it with harder to reason about system behaviour on the normal path (wrt memory usage, reclaim, abort detection etc) _and_ implementation details (leading to bugs around those things, for more critical use cases), and yet still unlikely yielding the same performance suggests it isn't the best approach for this goal. However I would caveat that the idea of evaluating the entire query to an off-heap memory region is not what I would have in mind - there's a sliding scale starting from a small buffer (or pair of buffers) kept just ahead of the client, refilled from a persistent server-side cursor that just avoids repeating work to seek into files. The ideal would be as close to this as possible, with a potential time-bound on the lifespan of the cursor, after which it can be reinitialised to permit cleanup of sstables. A configurable time limit on isolation could be provided as an option to define this period. However these streams can be arbitrarily large, so certainly we don't want to evaluate the entire query to permit releasing the sstables. Note, that the OpOrder should not be used by these queries - actual references should be taken so that long lifespans have no impact. The code that takes these references really needs to be fixed, also, so that the races to update the data tracker don't cause temporary "infinite" loops - like we see for range queries today. > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11489) DynamicCompositeType failures during 2.1 to 3.0 upgrade.
[ https://issues.apache.org/jira/browse/CASSANDRA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291396#comment-15291396 ] Tyler Hobbs commented on CASSANDRA-11489: - No, unfortunately nothing obvious comes to mind. The code for serializing range tombstones in the "legacy" format is fairly complex, so there's definitely a possibility that there's a bug there. I would put a bunch of debug statements in {{LegacyLayout.fromUnfilteredRowIterator()}} to check what kinds of deletions are present in 3.0 and what the 3.0 node _thinks_ it's serializing. > DynamicCompositeType failures during 2.1 to 3.0 upgrade. > > > Key: CASSANDRA-11489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11489 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeremiah Jordan >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.x > > > When upgrading from 2.1.13 to 3.0.4+some (hash > 70eab633f289eb1e4fbe47b3e17ff3203337f233) we are seeing the following > exceptions on 2.1 nodes after other nodes have been upgraded. With tables > using DynamicCompositeType in use. The workload runs fine once everything is > upgraded. > {code} > ERROR [MessagingService-Incoming-/10.200.182.2] 2016-04-03 21:49:10,531 > CassandraDaemon.java:229 - Exception in thread > Thread[MessagingService-Incoming-/10.200.182.2,5,main] > java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input > length = 1 > at > org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:181) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:200) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.cql3.ColumnIdentifier.(ColumnIdentifier.java:54) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.SimpleSparseCellNameType.fromByteBuffer(SimpleSparseCellNameType.java:83) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:398) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.RangeTombstoneList$Serializer.deserialize(RangeTombstoneList.java:843) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.DeletionInfo$Serializer.deserialize(DeletionInfo.java:407) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:105) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:89) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:73) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:116) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:88) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > Caused by: java.nio.charset.MalformedInputException: Input length = 1 > at java.nio.charset.CoderResult.throwException(CoderResult.java:281) > ~[na:1.8.0_40] > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816) > ~[na:1.8.0_40] > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:109) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:169) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > ... 16 common frames omitted > {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291390#comment-15291390 ] Paulo Motta commented on CASSANDRA-11845: - Can you post debug.log from c0c8af20-1d9c-11e6-9d63-b717b380ffdd and e3055fb0-1d9d-11e6-9d63-b717b380ffdd stream sessions? Do you have secondary indexes on these tables? Also it would be nice if you could provide a thread dump of the process with {{jstack >> dump.log}}. > Hanging repair in cassandra 2.2.4 > - > > Key: CASSANDRA-11845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Centos 6 >Reporter: vin01 >Priority: Minor > > So after increasing the streaming_timeout_in_ms value to 3 hours, i was able > to avoid the socketTimeout errors i was getting earlier > (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue > is repair just stays stuck. > current status :- > [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd > for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) > [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd > for range (8149151263857514385,8181801084802729407] finished (progress: 55%) > [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd > for range (3372779397996730299,3381236471688156773] finished (progress: 55%) > [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd > for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) > [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd > for range (6499366179019889198,6523760493740195344] finished (progress: 55%) > And its 10:46:25 Now, almost 5 hours since it has been stuck right there. > Earlier i could see repair session going on in system.log but there are no > logs coming in right now, all i get in logs is regular index summary > redistribution logs. > Last logs for repair i saw in logs :- > INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair > #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - > [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully > INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - > Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range > (6499366179019889198,6523760493740195344] finished > Its an incremental repair, and in "nodetool netstats" output i can see logs > like :- > Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd > /Node-2 > Receiving 8 files, 1093461 bytes total. Already received 8 files, > 1093461 bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db > 399475/399475 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db > 53809/53809 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db > 89955/89955 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db > 168790/168790 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db > 107785/107785 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db > 52889/52889 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db > 148882/148882 bytes(100%) received from idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db > 71876/71876 bytes(100%) received from idx:0/Node-2 > Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 > bytes total > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db > 161895/161895 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db > 399865/399865 bytes(100%) sent to idx:0/Node-2 > > /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db > 149066/149066
[jira] [Updated] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vin01 updated CASSANDRA-11845: -- Description: So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid the socketTimeout errors i was getting earlier (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue is repair just stays stuck. current status :- [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range (8149151263857514385,8181801084802729407] finished (progress: 55%) [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range (3372779397996730299,3381236471688156773] finished (progress: 55%) [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished (progress: 55%) And its 10:46:25 Now, almost 5 hours since it has been stuck right there. Earlier i could see repair session going on in system.log but there are no logs coming in right now, all i get in logs is regular index summary redistribution logs. Last logs for repair i saw in logs :- INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished Its an incremental repair, and in "nodetool netstats" output i can see logs like :- Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd /Node-2 Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db 399475/399475 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db 53809/53809 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db 89955/89955 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db 168790/168790 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db 107785/107785 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db 52889/52889 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db 148882/148882 bytes(100%) received from idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db 71876/71876 bytes(100%) received from idx:0/Node-2 Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 161895/161895 bytes(100%) sent to idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 399865/399865 bytes(100%) sent to idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 149066/149066 bytes(100%) sent to idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 126000/126000 bytes(100%) sent to idx:0/Node-2 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 26495/26495 bytes(100%) sent to idx:0/Node-2 Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /Node-3 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-3 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291371#comment-15291371 ] Tyler Hobbs commented on CASSANDRA-10786: - I like the idea of using a separate hash/ID for the statement and the result set metadata if we want to fix the "prepare storm" problem at the same time. Overall, it seems like it would work like this: * In response to a PREPARE message, the server returns a statement ID and a result set metadata ID. * When performing an EXECUTE, the driver sends both IDs. * If the prepared statement ID isn't found, the server responds with an "unprepared" error, and the driver needs to reprepare as usual. * If the statement ID is found, but the metadata ID doesn't match, the server executes the query and responds with a special results message. This message contains the correct result set metadata and its ID, the prepared statement ID, and a flag to indicate that it's doing this. * When the driver receives this special response, it replaces its internal result set metadata with the new one from the response. In the scenario Robert describes above (some nodes have seen a schema change, others haven't), this would avoid repreparation of statements. The driver might end up swapping its internal result set metadata for the statement several times, but that's relatively inexpensive. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291370#comment-15291370 ] Tyler Hobbs commented on CASSANDRA-10786: - I like the idea of using a separate hash/ID for the statement and the result set metadata if we want to fix the "prepare storm" problem at the same time. Overall, it seems like it would work like this: * In response to a PREPARE message, the server returns a statement ID and a result set metadata ID. * When performing an EXECUTE, the driver sends both IDs. * If the prepared statement ID isn't found, the server responds with an "unprepared" error, and the driver needs to reprepare as usual. * If the statement ID is found, but the metadata ID doesn't match, the server executes the query and responds with a special results message. This message contains the correct result set metadata and its ID, the prepared statement ID, and a flag to indicate that it's doing this. * When the driver receives this special response, it replaces its internal result set metadata with the new one from the response. In the scenario Robert describes above (some nodes have seen a schema change, others haven't), this would avoid repreparation of statements. The driver might end up swapping its internal result set metadata for the statement several times, but that's relatively inexpensive. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-10786: Comment: was deleted (was: I like the idea of using a separate hash/ID for the statement and the result set metadata if we want to fix the "prepare storm" problem at the same time. Overall, it seems like it would work like this: * In response to a PREPARE message, the server returns a statement ID and a result set metadata ID. * When performing an EXECUTE, the driver sends both IDs. * If the prepared statement ID isn't found, the server responds with an "unprepared" error, and the driver needs to reprepare as usual. * If the statement ID is found, but the metadata ID doesn't match, the server executes the query and responds with a special results message. This message contains the correct result set metadata and its ID, the prepared statement ID, and a flag to indicate that it's doing this. * When the driver receives this special response, it replaces its internal result set metadata with the new one from the response. In the scenario Robert describes above (some nodes have seen a schema change, others haven't), this would avoid repreparation of statements. The driver might end up swapping its internal result set metadata for the statement several times, but that's relatively inexpensive.) > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vin01 updated CASSANDRA-11845: -- Description: So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid the socketTimeout errors i was getting earlier (https://issues.apache.org/jira/browse/CASSANDRA-11826), but now the issue is repair just stays stuck. current status :- [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range (8149151263857514385,8181801084802729407] finished (progress: 55%) [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range (3372779397996730299,3381236471688156773] finished (progress: 55%) [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished (progress: 55%) And its 10:46:25 Now, almost 5 hours since it has been stuck right there. Earlier i could see repair session going on in system.log but there are no logs coming in right now, all i get in logs is regular index summary redistribution logs. Last logs for repair i saw in logs :- INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished Its an incremental repair, and in "nodetool netstats" output i can see logs like :- Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd /192.168.100.138 Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db 399475/399475 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db 53809/53809 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db 89955/89955 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db 168790/168790 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db 107785/107785 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db 52889/52889 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db 148882/148882 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db 71876/71876 bytes(100%) received from idx:0/192.168.100.138 Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 161895/161895 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 399865/399865 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 149066/149066 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 126000/126000 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 26495/26495 bytes(100%) sent to idx:0/192.168.100.138 Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /192.168.100.147 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/192.168.100.147
[jira] [Created] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4
vin01 created CASSANDRA-11845: - Summary: Hanging repair in cassandra 2.2.4 Key: CASSANDRA-11845 URL: https://issues.apache.org/jira/browse/CASSANDRA-11845 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Environment: Centos 6 Reporter: vin01 Priority: Minor So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid the socketTimeout errors i was getting earlier (https://issues.apache.org/jira/browse/CASSANDRA-11826), but now the issue is repair just stays stuck. current status :- [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%) [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range (8149151263857514385,8181801084802729407] finished (progress: 55%) [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range (3372779397996730299,3381236471688156773] finished (progress: 55%) [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%) [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished (progress: 55%) And its 10:46:25 Now, almost 5 hours since it has been stuck right there. Earlier i could see repair session going on in system.log but there are no logs coming in right now, all i get in logs is regular index summary redistribution logs. Last logs for repair i saw in logs :- INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished Its an incremental repair, and in "nodetool netstats" output i can see logs like :- Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd /192.168.100.138 Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes total /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db 399475/399475 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db 53809/53809 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db 89955/89955 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db 168790/168790 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db 107785/107785 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db 52889/52889 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db 148882/148882 bytes(100%) received from idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db 71876/71876 bytes(100%) received from idx:0/192.168.100.138 Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 161895/161895 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 399865/399865 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 149066/149066 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 126000/126000 bytes(100%) sent to idx:0/192.168.100.138 /data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 26495/26495 bytes(100%) sent to idx:0/192.168.100.138 Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd /192.168.100.147 Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291273#comment-15291273 ] Robert Stupp commented on CASSANDRA-10786: -- Well, leaving the {{id}} (which is the {{MD5Digest}} for the pstmt) as is allows backwards compatibility. The purpose of a _fingerprint_ is to provide a hash over {{ResultSet.ResultMetadata}} - something like a _prepared statement version_. Imagine that a (reasonable) amount of time can elapse until all cluster nodes have processed the schema change. Nodes can be down for whatever reason and get the schema change late. Some nodes can be unreachable for other nodes but still be available for clients. (Network partitions occur when you don't need them.) Additionally, a client probably talks to all nodes "simultaneously" and therefore gets different results from nodes that have processed the schema change and those that did not have processed it. Different results means: some nodes will say: "i don't know that pstmt ID - please re-prepare" while others respond as expected. We should not make such situations worse (by causing a _prepare storm_) than it already is (schema disagreement). For example, say you have an application that runs 100,000 queries per second for a prepared statement. At time=0, an {{ALTER TABLE foo ADD bar text}} is run. The schema migration takes for example 500ms (just a random number) until all nodes have "switched" their schema. This means that 50,000 queries may hit a node that has the new schema and re-prepare but hit another node during the next request that does not have the new schema. Also, the information a driver gets via the _control connection_ is not "just in time" - unlucky driver instances may get the schema change notification via the control connections quite late. I'm not a fan of changing the way we compute the pstmt {{id}} as we're pleased between versions (either C* releases or protocol versions) for the same reasons. I agree that we should probably not specify the algorithm to compute such IDs into the native protocol specification - but we should keep the algorithm to compute these IDs consistent. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11799) dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error
[ https://issues.apache.org/jira/browse/CASSANDRA-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291271#comment-15291271 ] Michael Shuler commented on CASSANDRA-11799: I'll check these out in the same environments we test on - they are UTF8 locale boxes and I think there are other UTF8 tests, but I'll see what I can come up with. > dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error > > > Key: CASSANDRA-11799 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11799 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Tyler Hobbs > Labels: cqlsh, dtest > Fix For: 2.2.x, 3.0.x, 3.x > > > example failure: > http://cassci.datastax.com/job/cassandra-3.0_dtest/703/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_unicode_syntax_error > Failed on CassCI build cassandra-3.0_dtest #703 > Also failing is > cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_invalid_request_error > The relevant failure is > {code} > 'ascii' codec can't encode character u'\xe4' in position 12: ordinal not in > range(128) > {code} > These are failing on 2.2, 3.0 and trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291267#comment-15291267 ] Philip Thompson commented on CASSANDRA-11731: - Testing that change here: http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/105/ > dtest failure in > pushed_notifications_test.TestPushedNotifications.move_single_node_test > > > Key: CASSANDRA-11731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11731 > Project: Cassandra > Issue Type: Test >Reporter: Russ Hatch >Assignee: Philip Thompson > Labels: dtest > > one recent failure (no vnode job) > {noformat} > 'MOVED_NODE' != u'NEW_NODE' > {noformat} > http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test > Failed on CassCI build trunk_novnode_dtest #366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup
[ https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-11743: --- Reviewer: Branimir Lambov > Race condition in CommitLog.recover can prevent startup > --- > > Key: CASSANDRA-11743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11743 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 2.2.x, 3.0.x, 3.x > > > In {{CommitLog::recover}} the list of segments to recover from is determined > by removing the files managed by the {{CommitLogSegmentManager}} from the > list of files present in the commit log directory. Unfortunatly, due to the > way the creation of segments is done there is a time window where a segment > file has been created but has not been added yet to the list of segments > managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during > that time window the Commit log might try to recover from that new segment > and crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup
[ https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291265#comment-15291265 ] Benjamin Lerer commented on CASSANDRA-11743: ||Branch||utests||dtests|| |[2.2|https://github.com/blerer/cassandra/tree/11743-2.2]|[2.2|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-2.2-testall/]|[2.2|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-2.2-dtest/]| |[3.0|https://github.com/blerer/cassandra/tree/11743-3.0]|[3.0|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.0-testall/]|[3.0|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.0-dtest/]| |[3.7|https://github.com/blerer/cassandra/tree/11743-3.7]|[3.7|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.7-testall/]|[3.7|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.7-dtest/]| The new patch does as suggested. > Race condition in CommitLog.recover can prevent startup > --- > > Key: CASSANDRA-11743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11743 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 2.2.x, 3.0.x, 3.x > > > In {{CommitLog::recover}} the list of segments to recover from is determined > by removing the files managed by the {{CommitLogSegmentManager}} from the > list of files present in the commit log directory. Unfortunatly, due to the > way the creation of segments is done there is a time window where a segment > file has been created but has not been added yet to the list of segments > managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during > that time window the Commit log might try to recover from that new segment > and crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup
[ https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-11743: --- Fix Version/s: (was: 2.1.x) Status: Patch Available (was: Open) > Race condition in CommitLog.recover can prevent startup > --- > > Key: CASSANDRA-11743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11743 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 2.2.x, 3.0.x, 3.x > > > In {{CommitLog::recover}} the list of segments to recover from is determined > by removing the files managed by the {{CommitLogSegmentManager}} from the > list of files present in the commit log directory. Unfortunatly, due to the > way the creation of segments is done there is a time window where a segment > file has been created but has not been added yet to the list of segments > managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during > that time window the Commit log might try to recover from that new segment > and crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11647) Don't use static dataDirectories field in Directories instances
[ https://issues.apache.org/jira/browse/CASSANDRA-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-11647: -- Resolution: Fixed Fix Version/s: (was: 3.x) 3.7 Status: Resolved (was: Patch Available) Committed as [f294750f535f2a73c71eba589dcaf19074f91bbf|https://github.com/apache/cassandra/commit/f294750f535f2a73c71eba589dcaf19074f91bbf] to 3.7 and merged into trunk, thanks. > Don't use static dataDirectories field in Directories instances > --- > > Key: CASSANDRA-11647 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11647 > Project: Cassandra > Issue Type: Improvement >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 3.7 > > > Some of the changes to Directories by CASSANDRA-6696 use the static > {{dataDirectories}} field, instead of the instance field {{paths}}. This > complicates things for external code creating their own Directories instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-11837) dtest failure in topology_test.TestTopology.simple_decommission_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson resolved CASSANDRA-11837. - Resolution: Fixed https://github.com/riptano/cassandra-dtest/pull/979 > dtest failure in topology_test.TestTopology.simple_decommission_test > > > Key: CASSANDRA-11837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11837 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Philip Thompson > Labels: dtest > > example failure: > http://cassci.datastax.com/job/trunk_dtest/1223/testReport/topology_test/TestTopology/simple_decommission_test > Failed on CassCI build trunk_dtest #1223 > The problem is that node3 detected node2 as down before the stop call was > made, so the wait_other_notice check fails. The fix here is almost certainly > as simple as just changing that line to {{node2.stop()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[3/3] cassandra git commit: Merge branch 'cassandra-3.7' into trunk
Merge branch 'cassandra-3.7' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/74f41c9c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/74f41c9c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/74f41c9c Branch: refs/heads/trunk Commit: 74f41c9cc61959748aa2cb6b42186ecfd8587796 Parents: da9bb03 f294750 Author: Aleksey YeschenkoAuthored: Thu May 19 16:09:23 2016 +0100 Committer: Aleksey Yeschenko Committed: Thu May 19 16:09:23 2016 +0100 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/Directories.java | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/74f41c9c/CHANGES.txt -- diff --cc CHANGES.txt index 854ae53,4c50980..4a5dbf4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,13 -1,5 +1,14 @@@ +3.8 + * Support older ant versions (CASSANDRA-11807) + * Estimate compressed on disk size when deciding if sstable size limit reached (CASSANDRA-11623) + * cassandra-stress profiles should support case sensitive schemas (CASSANDRA-11546) + * Remove DatabaseDescriptor dependency from FileUtils (CASSANDRA-11578) + * Faster streaming (CASSANDRA-9766) + * Add prepared query parameter to trace for "Execute CQL3 prepared query" session (CASSANDRA-11425) + + 3.7 + * Don't use static dataDirectories field in Directories instances (CASSANDRA-11647) Merged from 3.0: * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754)
[1/3] cassandra git commit: Don't use static dataDirectories field in Directories instances
Repository: cassandra Updated Branches: refs/heads/cassandra-3.7 b1cf0fe6b -> f294750f5 refs/heads/trunk da9bb0306 -> 74f41c9cc Don't use static dataDirectories field in Directories instances patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11647 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f294750f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f294750f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f294750f Branch: refs/heads/cassandra-3.7 Commit: f294750f535f2a73c71eba589dcaf19074f91bbf Parents: b1cf0fe Author: Blake EgglestonAuthored: Mon Apr 25 13:06:30 2016 -0700 Committer: Aleksey Yeschenko Committed: Thu May 19 16:09:00 2016 +0100 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/Directories.java | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index f96c31a..4c50980 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.7 + * Don't use static dataDirectories field in Directories instances (CASSANDRA-11647) Merged from 3.0: * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/src/java/org/apache/cassandra/db/Directories.java -- diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index 3898180..7876959 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -296,7 +296,7 @@ public class Directories { if (directory != null) { -for (DataDirectory dataDirectory : dataDirectories) +for (DataDirectory dataDirectory : paths) { if (directory.getAbsolutePath().startsWith(dataDirectory.location.getAbsolutePath())) return dataDirectory; @@ -464,7 +464,7 @@ public class Directories public DataDirectory[] getWriteableLocations() { List nonBlacklistedDirs = new ArrayList<>(); -for (DataDirectory dir : dataDirectories) +for (DataDirectory dir : paths) { if (!BlacklistedDirectories.isUnwritable(dir.location)) nonBlacklistedDirs.add(dir);
[2/3] cassandra git commit: Don't use static dataDirectories field in Directories instances
Don't use static dataDirectories field in Directories instances patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11647 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f294750f Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f294750f Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f294750f Branch: refs/heads/trunk Commit: f294750f535f2a73c71eba589dcaf19074f91bbf Parents: b1cf0fe Author: Blake EgglestonAuthored: Mon Apr 25 13:06:30 2016 -0700 Committer: Aleksey Yeschenko Committed: Thu May 19 16:09:00 2016 +0100 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/Directories.java | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index f96c31a..4c50980 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.7 + * Don't use static dataDirectories field in Directories instances (CASSANDRA-11647) Merged from 3.0: * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/src/java/org/apache/cassandra/db/Directories.java -- diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index 3898180..7876959 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -296,7 +296,7 @@ public class Directories { if (directory != null) { -for (DataDirectory dataDirectory : dataDirectories) +for (DataDirectory dataDirectory : paths) { if (directory.getAbsolutePath().startsWith(dataDirectory.location.getAbsolutePath())) return dataDirectory; @@ -464,7 +464,7 @@ public class Directories public DataDirectory[] getWriteableLocations() { List nonBlacklistedDirs = new ArrayList<>(); -for (DataDirectory dir : dataDirectories) +for (DataDirectory dir : paths) { if (!BlacklistedDirectories.isUnwritable(dir.location)) nonBlacklistedDirs.add(dir);
[jira] [Commented] (CASSANDRA-11799) dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error
[ https://issues.apache.org/jira/browse/CASSANDRA-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291242#comment-15291242 ] Philip Thompson commented on CASSANDRA-11799: - We think we use utf8 everywhere, Michael is looking into it. > dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error > > > Key: CASSANDRA-11799 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11799 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Tyler Hobbs > Labels: cqlsh, dtest > Fix For: 2.2.x, 3.0.x, 3.x > > > example failure: > http://cassci.datastax.com/job/cassandra-3.0_dtest/703/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_unicode_syntax_error > Failed on CassCI build cassandra-3.0_dtest #703 > Also failing is > cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_invalid_request_error > The relevant failure is > {code} > 'ascii' codec can't encode character u'\xe4' in position 12: ordinal not in > range(128) > {code} > These are failing on 2.2, 3.0 and trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11705) clearSnapshots using Directories.dataDirectories instead of CFS.initialDirectories
[ https://issues.apache.org/jira/browse/CASSANDRA-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-11705: -- Resolution: Fixed Fix Version/s: (was: 3.0.x) (was: 3.x) 3.0.7 3.7 Status: Resolved (was: Patch Available) > clearSnapshots using Directories.dataDirectories instead of > CFS.initialDirectories > -- > > Key: CASSANDRA-11705 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11705 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 3.7, 3.0.7 > > > An oversight in CASSANDRA-10518 prevents snapshots created in data > directories defined outside of cassandra.yaml from being cleared by > {{Keyspace.clearSnapshots}}. {{ColumnFamilyStore.initialDirectories}} should > be used when finding snapshots to clear, not {{Directories.dataDirectories}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11705) clearSnapshots using Directories.dataDirectories instead of CFS.initialDirectories
[ https://issues.apache.org/jira/browse/CASSANDRA-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291234#comment-15291234 ] Aleksey Yeschenko commented on CASSANDRA-11705: --- The new version LGTM. Committed to 3.0 as [6663c5ff898ff502fc3c69b9f36328c1d9f517e8|https://github.com/apache/cassandra/commit/6663c5ff898ff502fc3c69b9f36328c1d9f517e8] and merged with 3.7 and trunk. Thanks. > clearSnapshots using Directories.dataDirectories instead of > CFS.initialDirectories > -- > > Key: CASSANDRA-11705 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11705 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 3.0.x, 3.x > > > An oversight in CASSANDRA-10518 prevents snapshots created in data > directories defined outside of cassandra.yaml from being cleared by > {{Keyspace.clearSnapshots}}. {{ColumnFamilyStore.initialDirectories}} should > be used when finding snapshots to clear, not {{Directories.dataDirectories}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11795) cassandra-stress legacy mode fails - time to remove it?
[ https://issues.apache.org/jira/browse/CASSANDRA-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291230#comment-15291230 ] T Jake Luciani commented on CASSANDRA-11795: Yes I totally agree... > cassandra-stress legacy mode fails - time to remove it? > --- > > Key: CASSANDRA-11795 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11795 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Michael Shuler >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.x > > > {noformat} > (trunk)mshuler@hana:~/git/cassandra$ cassandra-stress legacy -o INSERT > Running in legacy support mode. Translating command to: > stress write n=100 -col n=fixed(5) size=fixed(34) data=repeat(1) -rate > threads=50 -log interval=10 -mode thrift > Invalid parameter data=repeat(1) > Usage: cassandra-stress [options] > Help usage: cassandra-stress help > ---Commands--- > read : Multiple concurrent reads - the cluster must first be > populated by a write test > write: Multiple concurrent writes against the cluster > <...> > {noformat} > I tried legacy mode as a one-off, since someone provided a 2.0 stress option > command line to duplicate. Is it time to remove legacy, perhaps? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[6/6] cassandra git commit: Merge branch 'cassandra-3.7' into trunk
Merge branch 'cassandra-3.7' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/da9bb030 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/da9bb030 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/da9bb030 Branch: refs/heads/trunk Commit: da9bb03067e8f11c933e1a04dccf13a1f5a131c7 Parents: beb6464 b1cf0fe Author: Aleksey YeschenkoAuthored: Thu May 19 15:58:01 2016 +0100 Committer: Aleksey Yeschenko Committed: Thu May 19 15:58:01 2016 +0100 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 16 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/da9bb030/CHANGES.txt -- diff --cc CHANGES.txt index adadefd,f96c31a..854ae53 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,14 -1,6 +1,15 @@@ +3.8 + * Support older ant versions (CASSANDRA-11807) + * Estimate compressed on disk size when deciding if sstable size limit reached (CASSANDRA-11623) + * cassandra-stress profiles should support case sensitive schemas (CASSANDRA-11546) + * Remove DatabaseDescriptor dependency from FileUtils (CASSANDRA-11578) + * Faster streaming (CASSANDRA-9766) + * Add prepared query parameter to trace for "Execute CQL3 prepared query" session (CASSANDRA-11425) + + 3.7 Merged from 3.0: + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615)
[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.7
Merge branch 'cassandra-3.0' into cassandra-3.7 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1cf0fe6 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1cf0fe6 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1cf0fe6 Branch: refs/heads/trunk Commit: b1cf0fe6bbd3c2cf75cd6b9586a9bd1e9e632e8b Parents: 326a263 6663c5f Author: Aleksey YeschenkoAuthored: Thu May 19 15:57:49 2016 +0100 Committer: Aleksey Yeschenko Committed: Thu May 19 15:57:49 2016 +0100 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 17 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1cf0fe6/CHANGES.txt -- diff --cc CHANGES.txt index d029c7b,27398db..f96c31a --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,85 -1,14 +1,87 @@@ -3.0.7 +3.7 +Merged from 3.0: + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615) Merged from 2.2: * Add seconds to cqlsh tracing session duration (CASSANDRA-11753) + * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669) * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395) + * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626) + -3.0.6 +3.6 + * Correctly migrate schema for frozen UDTs during 2.x -> 3.x upgrades + (does not affect any released versions) (CASSANDRA-11613) + * Allow server startup if JMX is configured directly (CASSANDRA-11725) + * Prevent direct memory OOM on buffer pool allocations (CASSANDRA-11710) + * Enhanced Compaction Logging (CASSANDRA-10805) + * Make prepared statement cache size configurable (CASSANDRA-11555) + * Integrated JMX authentication and authorization (CASSANDRA-10091) + * Add units to stress ouput (CASSANDRA-11352) + * Fix PER PARTITION LIMIT for single and multi partitions queries (CASSANDRA-11603) + * Add uncompressed chunk cache for RandomAccessReader (CASSANDRA-5863) + * Clarify ClusteringPrefix hierarchy (CASSANDRA-11213) + * Always perform collision check before joining ring (CASSANDRA-10134) + * SSTableWriter output discrepancy (CASSANDRA-11646) + * Fix potential timeout in NativeTransportService.testConcurrentDestroys (CASSANDRA-10756) + * Support large partitions on the 3.0 sstable format (CASSANDRA-11206) + * Add support to rebuild from specific range (CASSANDRA-10406) + * Optimize the overlapping lookup by calculating all the + bounds in advance (CASSANDRA-11571) + * Support json/yaml output in noetool tablestats (CASSANDRA-5977) + * (stress) Add datacenter option to -node options (CASSANDRA-11591) + * Fix handling of empty slices (CASSANDRA-11513) + * Make number of cores used by cqlsh COPY visible to testing code (CASSANDRA-11437) + * Allow filtering on clustering columns for queries without secondary indexes (CASSANDRA-11310) + * Refactor Restriction hierarchy (CASSANDRA-11354) + * Eliminate allocations in R/W path (CASSANDRA-11421) + * Update Netty to 4.0.36 (CASSANDRA-11567) + * Fix PER PARTITION LIMIT for queries requiring post-query ordering (CASSANDRA-11556) + * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818) + * Support UDT in CQLSSTableWriter (CASSANDRA-10624) + * Support for non-frozen user-defined types, updating + individual fields of user-defined types (CASSANDRA-7423) + * Make LZ4 compression level configurable (CASSANDRA-11051) + * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017) + * Make custom filtering more extensible with UserExpression (CASSANDRA-11295) + * Improve field-checking and error reporting in cassandra.yaml (CASSANDRA-10649) + * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507) + * More user friendly error when providing an invalid token to nodetool (CASSANDRA-9348) + * Add static column support to SASI index (CASSANDRA-11183) + * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization (CASSANDRA-11434) + * Support LIKE operator in prepared statements (CASSANDRA-11456) + * Add a command to see if a Materialized View has finished building (CASSANDRA-9967) + * Log endpoint and port associated with streaming operation (CASSANDRA-8777) + * Print sensible units for all log messages (CASSANDRA-9692) + * Upgrade Netty to version 4.0.34 (CASSANDRA-11096)
[2/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots
Use CFS.initialDirectories when clearing snapshots patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11705 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff Branch: refs/heads/cassandra-3.7 Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8 Parents: 5a5d0a1 Author: Blake EgglestonAuthored: Tue May 3 09:00:57 2016 -0700 Committer: Aleksey Yeschenko Committed: Thu May 19 15:54:19 2016 +0100 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 17 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b3e7d5e..27398db 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.7 + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615) @@ -6,6 +7,7 @@ Merged from 2.2: * Add seconds to cqlsh tracing session duration (CASSANDRA-11753) * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395) + 3.0.6 * Disallow creating view with a static column (CASSANDRA-11602) * Reduce the amount of object allocations caused by the getFunctions methods (CASSANDRA-11593) http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index a6d5c17..f340b0a 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -116,6 +116,12 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean initialDirectories = replacementArray; } +public static Directories.DataDirectory[] getInitialDirectories() +{ +Directories.DataDirectory[] src = initialDirectories; +return Arrays.copyOf(src, src.length); +} + private static final Logger logger = LoggerFactory.getLogger(ColumnFamilyStore.class); private static final ExecutorService flushExecutor = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(), http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java -- diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index e00c8b9..f7bb390 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -903,11 +903,17 @@ public class Directories return visitor.getAllocatedSize(); } -// Recursively finds all the sub directories in the KS directory. public static List getKSChildDirectories(String ksName) { +return getKSChildDirectories(ksName, dataDirectories); + +} + +// Recursively finds all the sub directories in the KS directory. +public static List getKSChildDirectories(String ksName, DataDirectory[] directories) +{ List result = new ArrayList<>(); -for (DataDirectory dataDirectory : dataDirectories) +for (DataDirectory dataDirectory : directories) { File ksDir = new File(dataDirectory.location, ksName); File[] cfDirs = ksDir.listFiles(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java -- diff --git a/src/java/org/apache/cassandra/db/Keyspace.java b/src/java/org/apache/cassandra/db/Keyspace.java index 273946e..5865364 100644 --- a/src/java/org/apache/cassandra/db/Keyspace.java +++ b/src/java/org/apache/cassandra/db/Keyspace.java @@ -276,7 +276,7 @@ public class Keyspace */ public static void clearSnapshot(String snapshotName, String keyspace) { -List snapshotDirs = Directories.getKSChildDirectories(keyspace); +List snapshotDirs = Directories.getKSChildDirectories(keyspace,
[1/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 5a5d0a1eb -> 6663c5ff8 refs/heads/cassandra-3.7 326a263f4 -> b1cf0fe6b refs/heads/trunk beb6464c0 -> da9bb0306 Use CFS.initialDirectories when clearing snapshots patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11705 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff Branch: refs/heads/cassandra-3.0 Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8 Parents: 5a5d0a1 Author: Blake EgglestonAuthored: Tue May 3 09:00:57 2016 -0700 Committer: Aleksey Yeschenko Committed: Thu May 19 15:54:19 2016 +0100 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 17 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b3e7d5e..27398db 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.7 + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615) @@ -6,6 +7,7 @@ Merged from 2.2: * Add seconds to cqlsh tracing session duration (CASSANDRA-11753) * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395) + 3.0.6 * Disallow creating view with a static column (CASSANDRA-11602) * Reduce the amount of object allocations caused by the getFunctions methods (CASSANDRA-11593) http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index a6d5c17..f340b0a 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -116,6 +116,12 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean initialDirectories = replacementArray; } +public static Directories.DataDirectory[] getInitialDirectories() +{ +Directories.DataDirectory[] src = initialDirectories; +return Arrays.copyOf(src, src.length); +} + private static final Logger logger = LoggerFactory.getLogger(ColumnFamilyStore.class); private static final ExecutorService flushExecutor = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(), http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java -- diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index e00c8b9..f7bb390 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -903,11 +903,17 @@ public class Directories return visitor.getAllocatedSize(); } -// Recursively finds all the sub directories in the KS directory. public static List getKSChildDirectories(String ksName) { +return getKSChildDirectories(ksName, dataDirectories); + +} + +// Recursively finds all the sub directories in the KS directory. +public static List getKSChildDirectories(String ksName, DataDirectory[] directories) +{ List result = new ArrayList<>(); -for (DataDirectory dataDirectory : dataDirectories) +for (DataDirectory dataDirectory : directories) { File ksDir = new File(dataDirectory.location, ksName); File[] cfDirs = ksDir.listFiles(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java -- diff --git a/src/java/org/apache/cassandra/db/Keyspace.java b/src/java/org/apache/cassandra/db/Keyspace.java index 273946e..5865364 100644 --- a/src/java/org/apache/cassandra/db/Keyspace.java +++ b/src/java/org/apache/cassandra/db/Keyspace.java @@ -276,7 +276,7 @@ public class Keyspace */ public static void clearSnapshot(String snapshotName, String
[3/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots
Use CFS.initialDirectories when clearing snapshots patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11705 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff Branch: refs/heads/trunk Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8 Parents: 5a5d0a1 Author: Blake EgglestonAuthored: Tue May 3 09:00:57 2016 -0700 Committer: Aleksey Yeschenko Committed: Thu May 19 15:54:19 2016 +0100 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 17 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b3e7d5e..27398db 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.7 + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615) @@ -6,6 +7,7 @@ Merged from 2.2: * Add seconds to cqlsh tracing session duration (CASSANDRA-11753) * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395) + 3.0.6 * Disallow creating view with a static column (CASSANDRA-11602) * Reduce the amount of object allocations caused by the getFunctions methods (CASSANDRA-11593) http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index a6d5c17..f340b0a 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -116,6 +116,12 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean initialDirectories = replacementArray; } +public static Directories.DataDirectory[] getInitialDirectories() +{ +Directories.DataDirectory[] src = initialDirectories; +return Arrays.copyOf(src, src.length); +} + private static final Logger logger = LoggerFactory.getLogger(ColumnFamilyStore.class); private static final ExecutorService flushExecutor = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(), http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java -- diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index e00c8b9..f7bb390 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -903,11 +903,17 @@ public class Directories return visitor.getAllocatedSize(); } -// Recursively finds all the sub directories in the KS directory. public static List getKSChildDirectories(String ksName) { +return getKSChildDirectories(ksName, dataDirectories); + +} + +// Recursively finds all the sub directories in the KS directory. +public static List getKSChildDirectories(String ksName, DataDirectory[] directories) +{ List result = new ArrayList<>(); -for (DataDirectory dataDirectory : dataDirectories) +for (DataDirectory dataDirectory : directories) { File ksDir = new File(dataDirectory.location, ksName); File[] cfDirs = ksDir.listFiles(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java -- diff --git a/src/java/org/apache/cassandra/db/Keyspace.java b/src/java/org/apache/cassandra/db/Keyspace.java index 273946e..5865364 100644 --- a/src/java/org/apache/cassandra/db/Keyspace.java +++ b/src/java/org/apache/cassandra/db/Keyspace.java @@ -276,7 +276,7 @@ public class Keyspace */ public static void clearSnapshot(String snapshotName, String keyspace) { -List snapshotDirs = Directories.getKSChildDirectories(keyspace); +List snapshotDirs = Directories.getKSChildDirectories(keyspace,
[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.7
Merge branch 'cassandra-3.0' into cassandra-3.7 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1cf0fe6 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1cf0fe6 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1cf0fe6 Branch: refs/heads/cassandra-3.7 Commit: b1cf0fe6bbd3c2cf75cd6b9586a9bd1e9e632e8b Parents: 326a263 6663c5f Author: Aleksey YeschenkoAuthored: Thu May 19 15:57:49 2016 +0100 Committer: Aleksey Yeschenko Committed: Thu May 19 15:57:49 2016 +0100 -- CHANGES.txt | 2 ++ src/java/org/apache/cassandra/db/ColumnFamilyStore.java | 6 ++ src/java/org/apache/cassandra/db/Directories.java | 10 -- src/java/org/apache/cassandra/db/Keyspace.java | 2 +- 4 files changed, 17 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1cf0fe6/CHANGES.txt -- diff --cc CHANGES.txt index d029c7b,27398db..f96c31a --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,85 -1,14 +1,87 @@@ -3.0.7 +3.7 +Merged from 3.0: + * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705) * Allow compaction strategies to disable early open (CASSANDRA-11754) * Refactor Materialized View code (CASSANDRA-11475) * Update Java Driver (CASSANDRA-11615) Merged from 2.2: * Add seconds to cqlsh tracing session duration (CASSANDRA-11753) + * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669) * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395) + * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626) + -3.0.6 +3.6 + * Correctly migrate schema for frozen UDTs during 2.x -> 3.x upgrades + (does not affect any released versions) (CASSANDRA-11613) + * Allow server startup if JMX is configured directly (CASSANDRA-11725) + * Prevent direct memory OOM on buffer pool allocations (CASSANDRA-11710) + * Enhanced Compaction Logging (CASSANDRA-10805) + * Make prepared statement cache size configurable (CASSANDRA-11555) + * Integrated JMX authentication and authorization (CASSANDRA-10091) + * Add units to stress ouput (CASSANDRA-11352) + * Fix PER PARTITION LIMIT for single and multi partitions queries (CASSANDRA-11603) + * Add uncompressed chunk cache for RandomAccessReader (CASSANDRA-5863) + * Clarify ClusteringPrefix hierarchy (CASSANDRA-11213) + * Always perform collision check before joining ring (CASSANDRA-10134) + * SSTableWriter output discrepancy (CASSANDRA-11646) + * Fix potential timeout in NativeTransportService.testConcurrentDestroys (CASSANDRA-10756) + * Support large partitions on the 3.0 sstable format (CASSANDRA-11206) + * Add support to rebuild from specific range (CASSANDRA-10406) + * Optimize the overlapping lookup by calculating all the + bounds in advance (CASSANDRA-11571) + * Support json/yaml output in noetool tablestats (CASSANDRA-5977) + * (stress) Add datacenter option to -node options (CASSANDRA-11591) + * Fix handling of empty slices (CASSANDRA-11513) + * Make number of cores used by cqlsh COPY visible to testing code (CASSANDRA-11437) + * Allow filtering on clustering columns for queries without secondary indexes (CASSANDRA-11310) + * Refactor Restriction hierarchy (CASSANDRA-11354) + * Eliminate allocations in R/W path (CASSANDRA-11421) + * Update Netty to 4.0.36 (CASSANDRA-11567) + * Fix PER PARTITION LIMIT for queries requiring post-query ordering (CASSANDRA-11556) + * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818) + * Support UDT in CQLSSTableWriter (CASSANDRA-10624) + * Support for non-frozen user-defined types, updating + individual fields of user-defined types (CASSANDRA-7423) + * Make LZ4 compression level configurable (CASSANDRA-11051) + * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017) + * Make custom filtering more extensible with UserExpression (CASSANDRA-11295) + * Improve field-checking and error reporting in cassandra.yaml (CASSANDRA-10649) + * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507) + * More user friendly error when providing an invalid token to nodetool (CASSANDRA-9348) + * Add static column support to SASI index (CASSANDRA-11183) + * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization (CASSANDRA-11434) + * Support LIKE operator in prepared statements (CASSANDRA-11456) + * Add a command to see if a Materialized View has finished building (CASSANDRA-9967) + * Log endpoint and port associated with streaming operation (CASSANDRA-8777) + * Print sensible units for all log messages (CASSANDRA-9692) + * Upgrade Netty to version 4.0.34
[jira] [Commented] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291224#comment-15291224 ] Philip Thompson commented on CASSANDRA-11760: - I'm re-running the tests that found this, to see if it comes up again. They take about 3-4 hours. > dtest failure in > TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test > > > Key: CASSANDRA-11760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11760 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Tyler Hobbs > Labels: dtest > Fix For: 3.6 > > Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, > node3.log, node3_debug.log > > > example failure: > http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/ > I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant > failure stack trace extracted here: > {code} > ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33 > 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe > rvice-Incoming-/127.0.0.1,5,main] > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91) > ~[apache-cassandra-2.2.6.jar:2.2.6] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11844) Create compaction-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-11844: --- Description: A tool like cassandra-stress that works with stress yaml but: * writes directly to a specified dir using CQLSSTableWriter. * lets you run just compaction on that directory and generates a report on compaction throughput. was: A tool like cassandra-stress that works with stress yaml but: 1. writes directly to a specified dir using CQLSSTableWriter. 2 lets you run just compaction on that directory and generates a report on compaction throughput. > Create compaction-stress > > > Key: CASSANDRA-11844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11844 > Project: Cassandra > Issue Type: Sub-task > Components: Compaction >Reporter: T Jake Luciani >Assignee: T Jake Luciani > > A tool like cassandra-stress that works with stress yaml but: > * writes directly to a specified dir using CQLSSTableWriter. > * lets you run just compaction on that directory and generates a report on > compaction throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11844) Create compaction-stress
T Jake Luciani created CASSANDRA-11844: -- Summary: Create compaction-stress Key: CASSANDRA-11844 URL: https://issues.apache.org/jira/browse/CASSANDRA-11844 Project: Cassandra Issue Type: Sub-task Reporter: T Jake Luciani A tool like cassandra-stress that works with stress yaml but: 1. writes directly to a specified dir using CQLSSTableWriter. 2 lets you run just compaction on that directory and generates a report on compaction throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-11844) Create compaction-stress
[ https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani reassigned CASSANDRA-11844: -- Assignee: T Jake Luciani > Create compaction-stress > > > Key: CASSANDRA-11844 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11844 > Project: Cassandra > Issue Type: Sub-task > Components: Compaction >Reporter: T Jake Luciani >Assignee: T Jake Luciani > > A tool like cassandra-stress that works with stress yaml but: > 1. writes directly to a specified dir using CQLSSTableWriter. > 2 lets you run just compaction on that directory and generates a report on > compaction throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval
[ https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291211#comment-15291211 ] Stefan Podkowinski commented on CASSANDRA-11349: I've been debuging the latest mentioned error case using the following cql/ccm statements and a local 2 node cluster. {code} create keyspace ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; use ks; CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float, PRIMARY KEY (c1, c2, c3) ) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'}; DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' AND c3 = 'c'; ccm node1 flush DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b'; ccm node1 flush DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' AND c3 = 'd'; ccm node1 flush {code} Timestamps have been added for easier tracking of the specific tombstone in the debugger. ColmnIndex.Builder.buildForCompaction() will add tombstones in the following order to the tracker: *Node1* {{1463656272792: c1 = 'a' AND c2 = 'b'}} First RT, added to unwritten + opened tombstones {{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}} Overshadowed by RT added before while being older at the same time. Will not be added and simply ignored. {{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} Overshaded by first and only RT added to opened so far, but newer and will thus be added to unwritten+opened We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for message digest. *Node2* {{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START) First RT, added to unwritten + opened tombstones {{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END) comparision of EOC flag (Tracker:251) of previously added RT will cause having it removed from the opened list (Tracker:258). Afterwards the current RT will be added to unwritten + opened. {{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color}) Gets compared with prev. added RT, which supersedes the current one and thus stays in the list. Will again be added to unwritten + opened list. We end up with 3 unwritten RTs, including 1463656272792 twice. I still haven't been able to exactly pinpoint why the reducer will be called twice with the same TS, but since [~blambov] explicitly mentioned that possibility, I guess it's intended behavior (but why? :)). > MerkleTree mismatch when multiple range tombstones exists for the same > partition and interval > - > > Key: CASSANDRA-11349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11349 > Project: Cassandra > Issue Type: Bug >Reporter: Fabien Rousseau >Assignee: Stefan Podkowinski > Labels: repair > Fix For: 2.1.x, 2.2.x > > Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch > > > We observed that repair, for some of our clusters, streamed a lot of data and > many partitions were "out of sync". > Moreover, the read repair mismatch ratio is around 3% on those clusters, > which is really high. > After investigation, it appears that, if two range tombstones exists for a > partition for the same range/interval, they're both included in the merkle > tree computation. > But, if for some reason, on another node, the two range tombstones were > already compacted into a single range tombstone, this will result in a merkle > tree difference. > Currently, this is clearly bad because MerkleTree differences are dependent > on compactions (and if a partition is deleted and created multiple times, the > only way to ensure that repair "works correctly"/"don't overstream data" is > to major compact before each repair... which is not really feasible). > Below is a list of steps allowing to easily reproduce this case: > {noformat} > ccm create test -v 2.1.13 -n 2 -s > ccm node1 cqlsh > CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 2}; > USE test_rt; > CREATE TABLE IF NOT EXISTS table1 ( > c1 text, > c2 text, > c3 float, > c4 float, > PRIMARY KEY ((c1), c2) > ); > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > # now flush only one of the two nodes > ccm node1 flush > ccm node1 cqlsh > USE test_rt; > INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3); > DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b'; > ctrl ^d > ccm node1 repair > # now grep the log and observe that there was some inconstencies detected > between nodes (while it shouldn't have detected any) > ccm node1 showlog | grep "out of sync" > {noformat} > Consequences
[jira] [Resolved] (CASSANDRA-11678) cassandra crush when enable hints_compression
[ https://issues.apache.org/jira/browse/CASSANDRA-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko resolved CASSANDRA-11678. --- Resolution: Cannot Reproduce Couldn't reproduce this one, sorry. Feel free to reopen if you can provide a hints file that reliably trigger the issue. Thank you. > cassandra crush when enable hints_compression > - > > Key: CASSANDRA-11678 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11678 > Project: Cassandra > Issue Type: Bug > Components: Core, Local Write-Read Paths > Environment: Centos 7 >Reporter: Weijian Lin >Assignee: Blake Eggleston >Priority: Critical > > When I enable hints_compression and set the compression class to > LZ4Compressor,the > cassandra (v3.05, V3.5.0) will crush。That is a bug, or any conf is wrong? > *Exception in V 3.5.0 * > {code} > ERROR [HintsDispatcher:2] 2016-04-26 15:02:56,970 > HintsDispatchExecutor.java:225 - Failed to dispatch hints file > abc4dda2-b551-427e-bb0b-e383d4a392e1-1461654138963-1.hints: file is > corrupted ({}) > org.apache.cassandra.io.FSReadError: java.io.EOFException > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:284) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:254) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259) > [apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242) > [apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220) > [apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199) > [apache-cassandra-3.5.0.jar:3.5.0] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_65] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_65] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_65] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65] > Caused by: java.io.EOFException: null > at > org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:146) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.io.util.RebufferingInputStream.readInt(RebufferingInputStream.java:188) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:297) > ~[apache-cassandra-3.5.0.jar:3.5.0] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:280) > ~[apache-cassandra-3.5.0.jar:3.5.0] > ... 15 common frames omitted > {code} > *Exception in V 3.0.5 * > {code} > ERROR [HintsDispatcher:2] 2016-04-26 15:54:46,294 > HintsDispatchExecutor.java:225 - Failed to dispatch hints file > 8603be13-6878-4de3-8bc3-a7a7146b0376-1461657251205-1.hints: file is > corrupted ({}) > org.apache.cassandra.io.FSReadError: java.io.EOFException > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137) >
[jira] [Commented] (CASSANDRA-11489) DynamicCompositeType failures during 2.1 to 3.0 upgrade.
[ https://issues.apache.org/jira/browse/CASSANDRA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291188#comment-15291188 ] Aleksey Yeschenko commented on CASSANDRA-11489: --- [~thobbs] I initially assumed that the problem here was with DCT, but apparently a 2.1 node, when decoding a read response from a 3.0 node, is trying to deserialise some range tombstones that just cannot be there (this trace is from reading a CFS table, and those use only cell level tombstones and whole partition deletions, exclusively). Having written 3.0-2.1 upgrade compat code, anything obvious comes to mind re: how this got here? > DynamicCompositeType failures during 2.1 to 3.0 upgrade. > > > Key: CASSANDRA-11489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11489 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeremiah Jordan >Assignee: Aleksey Yeschenko > Fix For: 3.0.x, 3.x > > > When upgrading from 2.1.13 to 3.0.4+some (hash > 70eab633f289eb1e4fbe47b3e17ff3203337f233) we are seeing the following > exceptions on 2.1 nodes after other nodes have been upgraded. With tables > using DynamicCompositeType in use. The workload runs fine once everything is > upgraded. > {code} > ERROR [MessagingService-Incoming-/10.200.182.2] 2016-04-03 21:49:10,531 > CassandraDaemon.java:229 - Exception in thread > Thread[MessagingService-Incoming-/10.200.182.2,5,main] > java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input > length = 1 > at > org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:181) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:200) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.cql3.ColumnIdentifier.(ColumnIdentifier.java:54) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.SimpleSparseCellNameType.fromByteBuffer(SimpleSparseCellNameType.java:83) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:398) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.RangeTombstoneList$Serializer.deserialize(RangeTombstoneList.java:843) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.DeletionInfo$Serializer.deserialize(DeletionInfo.java:407) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:105) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:89) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:73) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:116) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:88) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > Caused by: java.nio.charset.MalformedInputException: Input length = 1 > at java.nio.charset.CoderResult.throwException(CoderResult.java:281) > ~[na:1.8.0_40] > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816) > ~[na:1.8.0_40] > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:109) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > at > org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:169) > ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131] > ... 16 common frames
[jira] [Commented] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291175#comment-15291175 ] Robert Stupp commented on CASSANDRA-11738: -- Just thinking that any measured latency is basically aged out when it's computed. And something like a "15 minute load" (as the other extreme) cannot reflect recent spikes. Also, a measured latency can be influenced by a badly timed GC (e.g. G1 running with a 500ms goal that sometimes has "valid" STW phases of up to 300/400ms). Maybe I don't see the point, but I think all nodes (assuming they have the same hardware and the cluster is balanced) should have (nearly) equal response times. Compactions and GCs can kick in every time anyway. Just as an idea: a node can request a _ping-response_ from a node it sends a request to (could be requested by setting a flag in the verbs' payload). For example, node "A" sends a request to node "B". The request contains the timestamp at node "A". "B" sends a _ping-response_ including the request timestamp back to "A" as soon as it deserializes the request. "A" can now decide whether to use the calculated latency ({{currentTime() - requestTimestamp}}). It could for example ignore that number, which is legit when itself hit a longer GC (say, >100ms or so). "A" could also decide, that "B" is "slow" because it didn't get the _ping-response_ within a certain time. Too complicated? > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremiah Jordan > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-11843) Improve test coverage for conditional deletes
[ https://issues.apache.org/jira/browse/CASSANDRA-11843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov resolved CASSANDRA-11843. - Resolution: Invalid Will be solved in the scope of original issue. > Improve test coverage for conditional deletes > - > > Key: CASSANDRA-11843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11843 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov > > Follow-up ticket for #9842 to cover conditional deletes for non-existing > columns or columns containing null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291148#comment-15291148 ] Alex Petrov commented on CASSANDRA-10786: - I mostly can not see how splitting the "long hash" to {{id}} and {{fingerprint}} improves anything. We still do use the "short" version internally, for reasons stated above. "long" hash is the only thing we communicate with client. It also bears no semantical meaning, we may change it as we're pleased, as long as we respect protocol. Also, client would still have to come back with both {{id}} and {{fingerprint}} when executing the prepared message. So I'm not sure how {{fingerprint}} is useful without the {{id}}. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11824) If repair fails no way to run repair again
[ https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291124#comment-15291124 ] Marcus Eriksson commented on CASSANDRA-11824: - pushed and new builds triggered > If repair fails no way to run repair again > -- > > Key: CASSANDRA-11824 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11824 > Project: Cassandra > Issue Type: Bug >Reporter: T Jake Luciani >Assignee: Marcus Eriksson > Labels: fallout > Fix For: 3.0.x > > > I have a test that disables gossip and runs repair at the same time. > {quote} > WARN [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 > StorageService.java:384 - Stopping gossip by operator request > INFO [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 > Gossiper.java:1463 - Announcing shutdown > INFO [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 > StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown > INFO [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 > OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32 > INFO [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 > OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76 > INFO [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting > repair command #1, repairing keyspace keyspace1 with repair options > (parallelism: parallel, primary range: false, incremental: true, job threads: > 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3) > INFO [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting > repair command #2, repairing keyspace stresscql with repair options > (parallelism: parallel, primary range: false, incremental: true, job threads: > 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3) > INFO [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting > repair command #3, repairing keyspace system_traces with repair options > (parallelism: parallel, primary range: false, incremental: true, job threads: > 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2) > {quote} > This ends up failing: > {quote} > 16:54:44.844 INFO serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] > Starting repair command #1, repairing keyspace keyspace1 with repair options > (parallelism: parallel, primary range: false, incremental: true, job threads: > 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3) > [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. > List of failed endpoint(s): [172.31.24.76, 172.31.17.32] > [2016-05-17 16:57:21,945] null > {quote} > Subsequent calls to repair with all nodes up still fails: > {quote} > ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 > CompactionManager.java:1193 - Cannot start multiple repair sessions over the > same sstables > ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - > Failed creating a merkle tree for [repair > #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291106#comment-15291106 ] Robert Stupp edited comment on CASSANDRA-10786 at 5/19/16 1:48 PM: --- Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I thought too complicated. Another possible way to solve the opt-in/long-hash problem would be to just add another identifier, which is the hash over the result set metadata. So, the current ID would stay as it is and we add a _fingerprint_ to _Prepared_ response and _Execute_ request. For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain: {code} - is [short bytes] representing the prepared query ID. - is [short bytes] representing the metadata hash. - is composed of: {code} And the body for _4.1.6 Execute_ would be {{}}. To handle the situation when that result-set-metadata-fingerprint does not match, there are two options IMO. # The coordinator could reply with a new error code (near to 0x2500, Unprepared) telling the client that the result set metadata no longer matches and the statement needs to be prepared again. # We just send out the result set metadata with the _Rows_ response in case the metadata has changed / does not match the fingerprint. The second option would also work around a race condition that could arise with a new error code during schema changes. Means: some nodes may already use the new result set metadata while others still use the old one. It would also save one roundtrip. It makes the code on the client probably a bit more complex, but I think it's worth to pay that price in order to prevent this race condition (and _prepare storm_). was (Author: snazy): Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I thought too complicated. Another possible way to solve the opt-in/long-hash problem would be to just add another identifier, which is the hash over the result set metadata. So, the current ID would stay as it is and we add a _fingerprint_ to _Prepared_ response and _Execute_ request. For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain: {code} - is [short bytes] representing the prepared query ID. - is [short bytes] representing the metadata hash. - is composed of: {code} And the body for _4.1.6 Execute_ would be {{}}. To handle the situation when that result-set-metadata-fingerprint does not match, there are two options IMO. # The coordinator could reply with a new error code (near to 0x2500, Unprepared) telling the client that the result set metadata no longer matches and the statement needs to be prepared again. # We just send out the result set metadata with the _Rows_ response in case it has. The second option would also work around a race condition that could arise with a new error code during schema changes. Means: some nodes may already use the new result set metadata while others still use the old one. It would also save one roundtrip. It makes the code on the client probably a bit more complex, but I think it's worth to pay that price in order to prevent this race condition (and _prepare storm_). > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) >
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291106#comment-15291106 ] Robert Stupp commented on CASSANDRA-10786: -- Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I thought too complicated. Another possible way to solve the opt-in/long-hash problem would be to just add another identifier, which is the hash over the result set metadata. So, the current ID would stay as it is and we add a _fingerprint_ to _Prepared_ response and _Execute_ request. For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain: {code} - is [short bytes] representing the prepared query ID. - is [short bytes] representing the metadata hash. - is composed of: {code} And the body for _4.1.6 Execute_ would be {{}}. To handle the situation when that result-set-metadata-fingerprint does not match, there are two options IMO. # The coordinator could reply with a new error code (near to 0x2500, Unprepared) telling the client that the result set metadata no longer matches and the statement needs to be prepared again. # We just send out the result set metadata with the _Rows_ response in case it has. The second option would also work around a race condition that could arise with a new error code during schema changes. Means: some nodes may already use the new result set metadata while others still use the old one. It would also save one roundtrip. It makes the code on the client probably a bit more complex, but I think it's worth to pay that price in order to prevent this race condition (and _prepare storm_). > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
[ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291098#comment-15291098 ] Alex Petrov commented on CASSANDRA-10786: - [~adutra] has a very good point about SHA changes and driver implementers. I'm not sure if every driver would deadlock the same, it might depend on the implementation, although the Python driver seems to have [same behaviour|https://github.com/ifesdjeen/cassandra-dtest/tree/10786-trunk], just checked. I like the idea with {{OPTIONS}}/{{SUPPORTED}} over the rest so far. > Include hash of result set metadata in prepared statement id > > > Key: CASSANDRA-10786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10786 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Olivier Michallat >Assignee: Alex Petrov >Priority: Minor > Labels: client-impacting, protocolv5 > Fix For: 3.x > > > This is a follow-up to CASSANDRA-7910, which was about invalidating a > prepared statement when the table is altered, to force clients to update > their local copy of the metadata. > There's still an issue if multiple clients are connected to the same host. > The first client to execute the query after the cache was invalidated will > receive an UNPREPARED response, re-prepare, and update its local metadata. > But other clients might miss it entirely (the MD5 hasn't changed), and they > will keep using their old metadata. For example: > # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, > clientA and clientB both have a cache of the metadata (columns b and c) > locally > # column a gets added to the table, C* invalidates its cache entry > # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, > re-prepares on the fly and updates its local metadata to (a, b, c) > # prepared statement is now in C*’s cache again, with the same md5 abc123 > # clientB sends an EXECUTE request for id abc123. Because the cache has been > populated again, the query succeeds. But clientB still has not updated its > metadata, it’s still (b,c) > One solution that was suggested is to include a hash of the result set > metadata in the md5. This way the md5 would change at step 3, and any client > using the old md5 would get an UNPREPARED, regardless of whether another > client already reprepared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)