[jira] [Created] (CASSANDRA-13754) FastThreadLocal leaks memory
Eric Evans created CASSANDRA-13754: -- Summary: FastThreadLocal leaks memory Key: CASSANDRA-13754 URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 Project: Cassandra Issue Type: Bug Components: Core Environment: OpenJDK 8u141-b15 Reporter: Eric Evans Fix For: 3.11.1 After a chronic bout of {{OutOfMemoryError}} in our development environment, a heap analysis is showing that more than 10G of our 12G heaps are consumed by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. Reverting [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13630) support large internode messages with netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121770#comment-16121770 ] Jason Brown commented on CASSANDRA-13630: - The core idea here is that if the outgoing message is large/huge, we don't want to naively allocate a huge buffer just for serialization. For example, if it's a large mutation (say 16MB), we don't want to allocate 16MB * n number of replica buffers on the coordinator. A safer approach is to allocate standard sized buffers (currently 64k), serialize into them via {{DataOutputPlus}} interface, write each buffer to the netty channel when the buffer is full, and allocate another buffer for further serialization. The outbound side which splits up serialization into multiple buffers is implemented in {{MessageOutHandler.ByteBufDataOutputStreamPlus}}. At the same time, I've made it so that all messages are written into a shared buffer (via {{MessageOutHandler.ByteBufDataOutputStreamPlus}}), whether it's a large message being chunked across multiple buffers, or multiple small messages being aggregated into one buffer (think mutations ACKs). This upside here is that we don't need to go to the netty allocator for each individual small message, and thus just send the single, 'aggregation' buffer downstream in the channel when we need to flush. As I implemented this behavior, I discovered that the 'aggregating buffer' could be a problem wrt {{MessageOutHandler#channelWritabilityChanged}} as that method, when it gets the signal the channel is writable, attempts to drain any backlog from {{OutboundMessagingConnection}} (via the {{MessageOutHandler#backlogSupplier}}). If i had retained the current code it is quite likely that I would start to serialize a backlogged message while in the middle of a message already being serialized (from {{MessageOutHandler#write}}), which happened to fill the buffer and write it to the channel. Further, I noticed I needed to forward-port more of CASSANDRA-13265 in order to handle expiring messages from the backlog. (FTR, {{MessageOutHandler#userEventTriggered}} handles closing the channel when we make no progress, but there's no other purging or removing items from the backlog queue. Closing the channel will fail any messages in the channel, but not from the backlog). Thus, I added the backlog-expiring behavior to {{OutboundMessagingConnection#expireMessages}}, and now drain messages from the backlog in {{MessageOutHandler#write}}. By trying to send the backlogged messages before the incoming message on the channel, it gives us a better shot at ordering the sending of the messages wrt the order in which they came into the {{OutboundMessagingConnection}}. I updated jctools to 2.0.2. Instead of using a {{LinkedBlockingQueue}} in {{OutboundMessagingConnection}} for the backlog, I decided to use something without locks from jctools. Even though the queue still needs to be an unbounded multi-producer/multi-consumer (at least, to replicate existing behaviors), the jctools queue should be a bit more efficient than an LBQ. Fixing the outbound size is only half of the problem, as we don't want to naively allocate a huge buffer on the receiving node, either. This is a bit trickier due to the blocking IO style of our deserializers. Thus, similar to what I've done in CASSANDRA-12229, I need to add incoming {{ByteBuf}}s to a {{RebufferingByteBufDataInputPlus}} and spin up a background thread for performing the deserialization. Since we only need to spin up the the thread when we have large message payloads, this will only happen in a minority of use cases: - we are actually transmitting a message larger than {{OutboundMessagingPool#LARGE_MESSAGE_THRESHOLD}}, which defaults to 64k. At that point we're sending all of those over the outbound large message queue anyway, so all messages on that channel/socket will be over the threshold and require the background deserialization. So this won't apply to the small messages channel, where we can still handle all those messages in-line on the inbound netty event loop. - If you are operating a huge sized cluster (I'm guessing at least 500 nodes in size, haven't done the math, tbh), large gossip messages might trigger the receiving gossip channel to switch to the background deserialization mode, especially ACK/ACK2 messages after a bounce as they will contain all the {{ApplicationState}}s for all the peers in the cluster. I do not think this will be a problem in practice. I want to add more comments/documentation before committing, but that should not hold up a review. Also, this code is based on the current CASSANDRA-12229. Currently failing tests for this branch seem to be race conditions only in the streaming code, so I'll fix on the CASSANDRA-12229 branch. > support large internode messages with netty > --- > >
cassandra-dtest git commit: Update regex for expected digest mismatch log message
Repository: cassandra-dtest Updated Branches: refs/heads/master 959208749 -> 459943a35 Update regex for expected digest mismatch log message patch by Zhao Yang; reviewed by Stefan Podkowinski for CASSANDRA-13723 Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/459943a3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/459943a3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/459943a3 Branch: refs/heads/master Commit: 459943a35e7ea9ef49791b47bebaacc0b5af6e04 Parents: 9592087 Author: Zhao YangAuthored: Mon Aug 7 15:49:04 2017 +0800 Committer: Stefan Podkowinski Committed: Thu Aug 10 08:30:39 2017 +0200 -- materialized_views_test.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/459943a3/materialized_views_test.py -- diff --git a/materialized_views_test.py b/materialized_views_test.py index 77b20e6..79679ca 100644 --- a/materialized_views_test.py +++ b/materialized_views_test.py @@ -228,7 +228,6 @@ class TestMaterializedViews(Tester): debug("wait that all batchlogs are replayed") self._replay_batchlogs() - for i in xrange(5): for j in xrange(1): assert_one(session, "SELECT * FROM t_by_v WHERE id = {} AND v = {}".format(i, j), [j, i]) @@ -1064,8 +1063,8 @@ class TestMaterializedViews(Tester): # execution happening # Look for messages like: -# Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey -regex = r"Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey" +# Digest mismatch: Mismatch for key DecoratedKey +regex = r"Digest mismatch: Mismatch for key DecoratedKey" for event in trace.events: desc = event.description match = re.match(regex, desc) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-dtest git commit: Restore <4.0 compatibility for digest mismatch log message matching
Repository: cassandra-dtest Updated Branches: refs/heads/master 459943a35 -> 61cbd5cdc Restore <4.0 compatibility for digest mismatch log message matching Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/61cbd5cd Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/61cbd5cd Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/61cbd5cd Branch: refs/heads/master Commit: 61cbd5cdcb435503bcb828249cce60ca779995e0 Parents: 459943a Author: Stefan PodkowinskiAuthored: Thu Aug 10 09:02:24 2017 +0200 Committer: Stefan Podkowinski Committed: Thu Aug 10 09:02:24 2017 +0200 -- materialized_views_test.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/61cbd5cd/materialized_views_test.py -- diff --git a/materialized_views_test.py b/materialized_views_test.py index 79679ca..574d90f 100644 --- a/materialized_views_test.py +++ b/materialized_views_test.py @@ -1063,8 +1063,9 @@ class TestMaterializedViews(Tester): # execution happening # Look for messages like: -# Digest mismatch: Mismatch for key DecoratedKey -regex = r"Digest mismatch: Mismatch for key DecoratedKey" +# 4.0+Digest mismatch: Mismatch for key DecoratedKey +# <4.0 Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey +regex = r"Digest mismatch: ([a-zA-Z.]+:\s)?Mismatch for key DecoratedKey" for event in trace.events: desc = event.description match = re.match(regex, desc) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13723) fix exception logging that should be consumed by placeholder to 'getMessage()' for new slf4j version
[ https://issues.apache.org/jira/browse/CASSANDRA-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13723: --- Resolution: Fixed Reviewer: Stefan Podkowinski Status: Resolved (was: Patch Available) Merged as ba87ab4e954ad2 Thanks! > fix exception logging that should be consumed by placeholder to > 'getMessage()' for new slf4j version > > > Key: CASSANDRA-13723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13723 > Project: Cassandra > Issue Type: Bug >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Trivial > Fix For: 4.0 > > Attachments: CASSANDRA-13723.patch > > > The wrong tracing log will fail > {{materialized_views_test.py:TestMaterializedViews.view_tombstone_test}} and > impact clients. > Current log: {{Digest mismatch: {} on 127.0.0.1}} > Expected log: {{Digest mismatch: > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey... on 127.0.0.1}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13615) Include 'ppc64le' library for sigar-1.6.4.jar
[ https://issues.apache.org/jira/browse/CASSANDRA-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121107#comment-16121107 ] Amitkumar Ghatwal commented on CASSANDRA-13615: --- thanks a the merge ...[~mshuler] , [~jjirsa] > Include 'ppc64le' library for sigar-1.6.4.jar > - > > Key: CASSANDRA-13615 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13615 > Project: Cassandra > Issue Type: Improvement > Components: Libraries > Environment: # arch > ppc64le >Reporter: Amitkumar Ghatwal >Assignee: Michael Shuler > Labels: easyfix > Fix For: 4.0 > > Attachments: libsigar-ppc64le-linux.so > > > Hi All, > sigar-1.6.4.jar does not include a ppc64le library, so we had to install > libsigar-ppc64le-linux.so.As the community has been inactive for long > (https://github.com/hyperic/sigar), requesting the community to include the > ppc64le library directly here. > Attaching the ppc64le library ( *.so) file to be included under > "/lib/sigar-bin". let me know of issues/dependency if any. > FYI - [~ReiOdaira],[~jjirsa], [~mshuler] > Regards, > Amit -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13615) Include 'ppc64le' library for sigar-1.6.4.jar
[ https://issues.apache.org/jira/browse/CASSANDRA-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121107#comment-16121107 ] Amitkumar Ghatwal edited comment on CASSANDRA-13615 at 8/10/17 5:58 AM: thanks for the merge ...[~mshuler] , [~jjirsa] was (Author: amitkumar_ghatwal): thanks a the merge ...[~mshuler] , [~jjirsa] > Include 'ppc64le' library for sigar-1.6.4.jar > - > > Key: CASSANDRA-13615 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13615 > Project: Cassandra > Issue Type: Improvement > Components: Libraries > Environment: # arch > ppc64le >Reporter: Amitkumar Ghatwal >Assignee: Michael Shuler > Labels: easyfix > Fix For: 4.0 > > Attachments: libsigar-ppc64le-linux.so > > > Hi All, > sigar-1.6.4.jar does not include a ppc64le library, so we had to install > libsigar-ppc64le-linux.so.As the community has been inactive for long > (https://github.com/hyperic/sigar), requesting the community to include the > ppc64le library directly here. > Attaching the ppc64le library ( *.so) file to be included under > "/lib/sigar-bin". let me know of issues/dependency if any. > FYI - [~ReiOdaira],[~jjirsa], [~mshuler] > Regards, > Amit -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Explicitly use e.getMessage() for log message formatting
Repository: cassandra Updated Branches: refs/heads/trunk bcdbee5cd -> ba87ab4e9 Explicitly use e.getMessage() for log message formatting patch by Zhao Yang; reviewed by Stefan Podkowinski for CASSANDRA-13723 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ba87ab4e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ba87ab4e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ba87ab4e Branch: refs/heads/trunk Commit: ba87ab4e954ad2e537f6690953bd7ebaa069f5cd Parents: bcdbee5 Author: Zhao YangAuthored: Mon Jul 24 18:13:14 2017 +0800 Committer: Stefan Podkowinski Committed: Thu Aug 10 08:13:45 2017 +0200 -- src/java/org/apache/cassandra/auth/CassandraAuthorizer.java| 6 -- src/java/org/apache/cassandra/batchlog/BatchlogManager.java| 2 +- .../concurrent/AbstractLocalAwareExecutorService.java | 2 +- src/java/org/apache/cassandra/concurrent/SEPWorker.java| 4 ++-- src/java/org/apache/cassandra/db/Directories.java | 2 +- src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java | 4 +++- src/java/org/apache/cassandra/io/util/FileUtils.java | 2 +- src/java/org/apache/cassandra/service/StorageProxy.java| 2 +- .../apache/cassandra/streaming/DefaultConnectionFactory.java | 2 +- src/java/org/apache/cassandra/utils/NativeLibrary.java | 2 +- 10 files changed, 16 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ba87ab4e/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java -- diff --git a/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java b/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java index e95a1fd..d760dce 100644 --- a/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java +++ b/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java @@ -129,7 +129,9 @@ public class CassandraAuthorizer implements IAuthorizer } catch (RequestExecutionException | RequestValidationException e) { -logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", revokee.getRoleName(), e); +logger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", +revokee.getRoleName(), +e.getMessage()); } } @@ -166,7 +168,7 @@ public class CassandraAuthorizer implements IAuthorizer } catch (RequestExecutionException | RequestValidationException e) { -logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", droppedResource, e); +logger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", droppedResource, e.getMessage()); return; } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ba87ab4e/src/java/org/apache/cassandra/batchlog/BatchlogManager.java -- diff --git a/src/java/org/apache/cassandra/batchlog/BatchlogManager.java b/src/java/org/apache/cassandra/batchlog/BatchlogManager.java index 9ca7acf..9d2867f 100644 --- a/src/java/org/apache/cassandra/batchlog/BatchlogManager.java +++ b/src/java/org/apache/cassandra/batchlog/BatchlogManager.java @@ -272,7 +272,7 @@ public class BatchlogManager implements BatchlogManagerMBean } catch (IOException e) { -logger.warn("Skipped batch replay of {} due to {}", id, e); +logger.warn("Skipped batch replay of {} due to {}", id, e.getMessage()); remove(id); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/ba87ab4e/src/java/org/apache/cassandra/concurrent/AbstractLocalAwareExecutorService.java -- diff --git a/src/java/org/apache/cassandra/concurrent/AbstractLocalAwareExecutorService.java b/src/java/org/apache/cassandra/concurrent/AbstractLocalAwareExecutorService.java index 530f46e..97dbe86 100644 --- a/src/java/org/apache/cassandra/concurrent/AbstractLocalAwareExecutorService.java +++ b/src/java/org/apache/cassandra/concurrent/AbstractLocalAwareExecutorService.java @@ -164,7 +164,7 @@ public abstract class AbstractLocalAwareExecutorService implements LocalAwareExe catch (Throwable t) { JVMStabilityInspector.inspectThrowable(t); -logger.warn("Uncaught exception on thread {}: {}", Thread.currentThread(), t); +logger.warn("Uncaught exception on thread {}: {}", Thread.currentThread(), t.getMessage());
[jira] [Commented] (CASSANDRA-11483) Enhance sstablemetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121238#comment-16121238 ] Marcus Eriksson commented on CASSANDRA-11483: - I ran the dtests but it seems they have been rotated out now - I must have missed this failure > Enhance sstablemetadata > --- > > Key: CASSANDRA-11483 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11483 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 4.0 > > Attachments: CASSANDRA-11483.txt, CASSANDRA-11483v2.txt, > CASSANDRA-11483v3.txt, CASSANDRA-11483v4.txt, CASSANDRA-11483v5.txt, Screen > Shot 2016-04-03 at 11.40.32 PM.png > > > sstablemetadata provides quite a bit of useful information but theres a few > hiccups I would like to see addressed: > * Does not use client mode > * Units are not provided (or anything for that matter). There is data in > micros, millis, seconds as durations and timestamps from epoch. But there is > no way to tell what one is without a non-trival code dive > * in general pretty frustrating to parse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13630) support large internode messages with netty
[ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13630: Status: Patch Available (was: Open) > support large internode messages with netty > --- > > Key: CASSANDRA-13630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13630 > Project: Cassandra > Issue Type: Task > Components: Streaming and Messaging >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.0 > > > As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the > scope of that ticket. However, we still need that functionality to ship a > correctly operating internode messaging subsystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12884) Batch logic can lead to unbalanced use of system.batches
[ https://issues.apache.org/jira/browse/CASSANDRA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1614#comment-1614 ] Jeff Jirsa commented on CASSANDRA-12884: [~iamaleksey] will have a more comprehensive review, I'm sure, but a few notes from a very cursory glance: 1) I don't see the purpose of stubbing out {{BatchlogManager::shuffle}} as a helper function here. 2) In the case where {{validated.keySet().size() == 1}} , shuffling all of the IPs in a given rack may not be all that efficient - may be quicker to just pick 2 random ints, and grab the IPs at those offsets (like we do for the case where we have more than 2 racks, {{result.add(rackMembers.get(getRandomInt(rackMembers.size(;}} ) > Batch logic can lead to unbalanced use of system.batches > > > Key: CASSANDRA-12884 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12884 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Adam Hattrell >Assignee: Daniel Cranford > Fix For: 3.0.x, 3.11.x > > Attachments: 0001-CASSANDRA-12884.patch > > > It looks as though there are some odd edge cases in how we distribute the > copies in system.batches. > The main issue is in the filter method for > org.apache.cassandra.batchlog.BatchlogManager > {code:java} > if (validated.size() - validated.get(localRack).size() >= 2) > { > // we have enough endpoints in other racks > validated.removeAll(localRack); > } > if (validated.keySet().size() == 1) > { >// we have only 1 `other` rack >Collection otherRack = > Iterables.getOnlyElement(validated.asMap().values()); > > return Lists.newArrayList(Iterables.limit(otherRack, 2)); > } > {code} > So with one or two racks we just return the first 2 entries in the list. > There's no shuffle or randomisation here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12884) Batch logic can lead to unbalanced use of system.batches
[ https://issues.apache.org/jira/browse/CASSANDRA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1614#comment-1614 ] Jeff Jirsa edited comment on CASSANDRA-12884 at 8/10/17 8:34 PM: - [~iamaleksey] will have a more comprehensive review, I'm sure, but a few notes from a very cursory glance: -1) I don't see the purpose of stubbing out {{BatchlogManager::shuffle}} as a helper function here.- (You're overriding it for deterministic testing) 2) In the case where {{validated.keySet().size() == 1}} , shuffling all of the IPs in a given rack may not be all that efficient - may be quicker to just pick 2 random ints, and grab the IPs at those offsets (like we do for the case where we have more than 2 racks, {{result.add(rackMembers.get(getRandomInt(rackMembers.size(;}} ) was (Author: jjirsa): [~iamaleksey] will have a more comprehensive review, I'm sure, but a few notes from a very cursory glance: 1) I don't see the purpose of stubbing out {{BatchlogManager::shuffle}} as a helper function here. 2) In the case where {{validated.keySet().size() == 1}} , shuffling all of the IPs in a given rack may not be all that efficient - may be quicker to just pick 2 random ints, and grab the IPs at those offsets (like we do for the case where we have more than 2 racks, {{result.add(rackMembers.get(getRandomInt(rackMembers.size(;}} ) > Batch logic can lead to unbalanced use of system.batches > > > Key: CASSANDRA-12884 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12884 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Adam Hattrell >Assignee: Daniel Cranford > Fix For: 3.0.x, 3.11.x > > Attachments: 0001-CASSANDRA-12884.patch > > > It looks as though there are some odd edge cases in how we distribute the > copies in system.batches. > The main issue is in the filter method for > org.apache.cassandra.batchlog.BatchlogManager > {code:java} > if (validated.size() - validated.get(localRack).size() >= 2) > { > // we have enough endpoints in other racks > validated.removeAll(localRack); > } > if (validated.keySet().size() == 1) > { >// we have only 1 `other` rack >Collection otherRack = > Iterables.getOnlyElement(validated.asMap().values()); > > return Lists.newArrayList(Iterables.limit(otherRack, 2)); > } > {code} > So with one or two racks we just return the first 2 entries in the list. > There's no shuffle or randomisation here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12884) Batch logic can lead to unbalanced use of system.batches
[ https://issues.apache.org/jira/browse/CASSANDRA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122288#comment-16122288 ] Daniel Cranford commented on CASSANDRA-12884: - 1) BatchlogManager::shuffle is stubbed out so the unit test can provide a deterministic override. The unit test has been expanded to provide a test which catches this regression. (the existing code used the same pattern for getRandomInt which is overridden to be non-random in the unit test) 2) getRandomInt could return the same value twice (sampling with replacement) resulting in the same replica being chosen. The existing code uses the shuffle+take head pattern, eg in BatchlogManager.java line 545 {{shuffle((List) racks);}} and line 550 {{for (String rack : Iterables.limit(racks, 2))}} to perform sampling without replacement. > Batch logic can lead to unbalanced use of system.batches > > > Key: CASSANDRA-12884 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12884 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Adam Hattrell >Assignee: Daniel Cranford > Fix For: 3.0.x, 3.11.x > > Attachments: 0001-CASSANDRA-12884.patch > > > It looks as though there are some odd edge cases in how we distribute the > copies in system.batches. > The main issue is in the filter method for > org.apache.cassandra.batchlog.BatchlogManager > {code:java} > if (validated.size() - validated.get(localRack).size() >= 2) > { > // we have enough endpoints in other racks > validated.removeAll(localRack); > } > if (validated.keySet().size() == 1) > { >// we have only 1 `other` rack >Collection otherRack = > Iterables.getOnlyElement(validated.asMap().values()); > > return Lists.newArrayList(Iterables.limit(otherRack, 2)); > } > {code} > So with one or two racks we just return the first 2 entries in the list. > There's no shuffle or randomisation here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12884) Batch logic can lead to unbalanced use of system.batches
[ https://issues.apache.org/jira/browse/CASSANDRA-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122335#comment-16122335 ] Daniel Cranford commented on CASSANDRA-12884: - Technically, if efficiency is key, we could implement something like a Durstenfeld/Knuth shuffle, eg https://stackoverflow.com/a/35278327 > Batch logic can lead to unbalanced use of system.batches > > > Key: CASSANDRA-12884 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12884 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Adam Hattrell >Assignee: Daniel Cranford > Fix For: 3.0.x, 3.11.x > > Attachments: 0001-CASSANDRA-12884.patch > > > It looks as though there are some odd edge cases in how we distribute the > copies in system.batches. > The main issue is in the filter method for > org.apache.cassandra.batchlog.BatchlogManager > {code:java} > if (validated.size() - validated.get(localRack).size() >= 2) > { > // we have enough endpoints in other racks > validated.removeAll(localRack); > } > if (validated.keySet().size() == 1) > { >// we have only 1 `other` rack >Collection otherRack = > Iterables.getOnlyElement(validated.asMap().values()); > > return Lists.newArrayList(Iterables.limit(otherRack, 2)); > } > {code} > So with one or two racks we just return the first 2 entries in the list. > There's no shuffle or randomisation here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122390#comment-16122390 ] Eric Evans commented on CASSANDRA-13754: Cassandra 3.11.0, Netty 4.0.44.Final > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: OpenJDK 8u141-b15 >Reporter: Eric Evans > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Evans updated CASSANDRA-13754: --- Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 (was: OpenJDK 8u141-b15) > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13752) Corrupted SSTables created in 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122413#comment-16122413 ] Hannu Kröger commented on CASSANDRA-13752: -- Background information: - Incremental repairs are being run regularly - Same cluster suffers also from this: https://issues.apache.org/jira/browse/CASSANDRA-13718 - To mitigate the previous bug we have run full repairs on the full cluster on problematic tables - Lucene index plugin is installed but not in use in the keyspace in question - Cassandra version was 2.2.8 but was upgraded to 3.11.0 - 4 nodes in DC1 (DC2 not connected atm.), RF=3 - Upgrade to 3.11 was done maybe 1,5 weeks ago - Cluster has been running since may '17 > Corrupted SSTables created in 3.11 > -- > > Key: CASSANDRA-13752 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13752 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Priority: Blocker > > We have discovered issues with corrupted SSTables. > {code} > ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - > Cannot read sstable > /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, > Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, > Index.db, Filter.db]; other IO error, skipping table > java.io.EOFException: EOF after 1898 bytes out of 21093 > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:488) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:396) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader$5.run(SSTableReader.java:561) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_111] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.0.jar:3.11.0] > {code} > Files look like this: > {code} > -rw-r--r--. 1 cassandra cassandra 3899251 Aug 7 08:37 > mc-6166-big-CompressionInfo.db > -rw-r--r--. 1 cassandra cassandra 16874421686 Aug 7 08:37 mc-6166-big-Data.db > -rw-r--r--. 1 cassandra cassandra 10 Aug 7 08:37 > mc-6166-big-Digest.crc32 > -rw-r--r--. 1 cassandra cassandra 2930904 Aug 7 08:37 > mc-6166-big-Filter.db > -rw-r--r--. 1 cassandra cassandra 75880 Aug 7 08:37 > mc-6166-big-Index.db > -rw-r--r--. 1 cassandra cassandra 13762 Aug 7 08:37 > mc-6166-big-Statistics.db > -rw-r--r--. 1 cassandra cassandra 882008 Aug 7 08:37 > mc-6166-big-Summary.db > -rw-r--r--. 1 cassandra cassandra 92 Aug 7 08:37 mc-6166-big-TOC.txt > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
Blake Eggleston created CASSANDRA-13755: --- Summary: dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test Key: CASSANDRA-13755 URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-13755: Status: Patch Available (was: Open) A change in sstablemeta output format broke the test. This branch fixes it: https://github.com/bdeggleston/cassandra-dtest/tree/13755 > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Issue Comment Deleted] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-13755: Comment: was deleted (was: A change in sstablemeta output format broke the test. This branch fixes it: https://github.com/bdeggleston/cassandra-dtest/tree/13755) > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122445#comment-16122445 ] Blake Eggleston commented on CASSANDRA-13755: - patch by [~jkni] here: https://github.com/jkni/cassandra-dtest/commit/f55f78b093fc668dc5cc9d1fc72f66dc5a9bf3a6 > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston reassigned CASSANDRA-13755: --- Assignee: Joel Knighton (was: Blake Eggleston) Reviewer: Blake Eggleston > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Joel Knighton > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-dtest git commit: Handle difference in sstablemetadata output for pending repairs following CASSANDRA-11483
Repository: cassandra-dtest Updated Branches: refs/heads/master 61cbd5cdc -> 013efa11f Handle difference in sstablemetadata output for pending repairs following CASSANDRA-11483 Patch by Joel Knighton; reviewed by Blake Eggleston for CASSANDRA-13755 Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/013efa11 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/013efa11 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/013efa11 Branch: refs/heads/master Commit: 013efa11f3d7bd2e3f64a4a5a865ff5dad565552 Parents: 61cbd5c Author: Joel KnightonAuthored: Wed Aug 9 13:03:21 2017 -0500 Committer: Blake Eggleston Committed: Thu Aug 10 15:34:00 2017 -0700 -- repair_tests/incremental_repair_test.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/013efa11/repair_tests/incremental_repair_test.py -- diff --git a/repair_tests/incremental_repair_test.py b/repair_tests/incremental_repair_test.py index a447d56..b081d44 100644 --- a/repair_tests/incremental_repair_test.py +++ b/repair_tests/incremental_repair_test.py @@ -34,7 +34,7 @@ class TestIncRepair(Tester): def _get_repaired_data(cls, node, keyspace): _sstable_name = compile('SSTable: (.+)') _repaired_at = compile('Repaired at: (\d+)') -_pending_repair = compile('Pending repair: (null|[a-f0-9\-]+)') +_pending_repair = compile('Pending repair: (\-\-|null|[a-f0-9\-]+)') _sstable_data = namedtuple('_sstabledata', ('name', 'repaired', 'pending_id')) out = node.run_sstablemetadata(keyspace=keyspace).stdout @@ -45,7 +45,7 @@ class TestIncRepair(Tester): repaired_times = [int(m.group(1)) for m in matches(_repaired_at)] def uuid_or_none(s): -return None if s == 'null' else UUID(s) +return None if s == 'null' or s == '--' else UUID(s) pending_repairs = [uuid_or_none(m.group(1)) for m in matches(_pending_repair)] assert names assert repaired_times - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-13755: Resolution: Fixed Status: Resolved (was: Patch Available) committed as {{013efa11f3d7bd2e3f64a4a5a865ff5dad565552}} thanks! > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Joel Knighton > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Fix digest calculation for counter cells
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 1a70dede3 -> eb6f03c89 Fix digest calculation for counter cells Patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-13750 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/eb6f03c8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/eb6f03c8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/eb6f03c8 Branch: refs/heads/cassandra-3.0 Commit: eb6f03c8928e913cb6f9eaa7c9ea9f4501039112 Parents: 1a70ded Author: Blake EgglestonAuthored: Tue Aug 8 13:45:41 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:42:31 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 1f42c70..0b92a7e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.15 + * Fix digest calculation for counter cells (CASSANDRA-13750) * Fix ColumnDefinition.cellValueType() for non-frozen collection and change SSTabledump to use type.toJSONString() (CASSANDRA-13573) * Skip materialized view addition if the base table doesn't exist (CASSANDRA-13737) * Drop table should remove corresponding entries in dropped_columns table (CASSANDRA-13730) http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- diff --git a/src/java/org/apache/cassandra/db/rows/AbstractCell.java b/src/java/org/apache/cassandra/db/rows/AbstractCell.java index 7e93c2e..576351e 100644 --- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java +++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java @@ -42,7 +42,15 @@ public abstract class AbstractCell extends Cell public void digest(MessageDigest digest) { -digest.update(value().duplicate()); +if (isCounterCell()) +{ +CounterContext.instance().updateDigest(digest, value()); +} +else +{ +digest.update(value().duplicate()); +} + FBUtilities.updateWithLong(digest, timestamp()); FBUtilities.updateWithInt(digest, ttl()); FBUtilities.updateWithBoolean(digest, isCounterCell()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/test/unit/org/apache/cassandra/db/CounterCellTest.java -- diff --git a/test/unit/org/apache/cassandra/db/CounterCellTest.java b/test/unit/org/apache/cassandra/db/CounterCellTest.java index 08e0b25..a8ddfcc 100644 --- a/test/unit/org/apache/cassandra/db/CounterCellTest.java +++ b/test/unit/org/apache/cassandra/db/CounterCellTest.java @@ -276,8 +276,8 @@ public class CounterCellTest ColumnDefinition cDef = cfs.metadata.getColumnDefinition(col); Cell cleared = BufferCell.live(cfs.metadata, cDef, 5, CounterContext.instance().clearAllLocal(state.context)); -CounterContext.instance().updateDigest(digest1, original.value()); -CounterContext.instance().updateDigest(digest2, cleared.value()); +original.digest(digest1); +cleared.digest(digest2); assert Arrays.equals(digest1.digest(), digest2.digest()); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/2] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e018bec8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e018bec8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e018bec8 Branch: refs/heads/cassandra-3.11 Commit: e018bec8ad482a1892b97b5f829ff5fa5801190a Parents: 303dba6 eb6f03c Author: Blake EgglestonAuthored: Thu Aug 10 15:43:45 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:47:22 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e018bec8/CHANGES.txt -- diff --cc CHANGES.txt index 145a746,0b92a7e..3308287 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,5 +1,10 @@@ -3.0.15 +3.11.1 + * "ignore" option is ignored in sstableloader (CASSANDRA-13721) + * Deadlock in AbstractCommitLogSegmentManager (CASSANDRA-13652) + * Duplicate the buffer before passing it to analyser in SASI operation (CASSANDRA-13512) + * Properly evict pstmts from prepared statements cache (CASSANDRA-13641) +Merged from 3.0: + * Fix digest calculation for counter cells (CASSANDRA-13750) * Fix ColumnDefinition.cellValueType() for non-frozen collection and change SSTabledump to use type.toJSONString() (CASSANDRA-13573) * Skip materialized view addition if the base table doesn't exist (CASSANDRA-13737) * Drop table should remove corresponding entries in dropped_columns table (CASSANDRA-13730) http://git-wip-us.apache.org/repos/asf/cassandra/blob/e018bec8/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- diff --cc src/java/org/apache/cassandra/db/rows/AbstractCell.java index 54c8f24,576351e..744d113 --- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java +++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java @@@ -44,84 -40,17 +44,92 @@@ public abstract class AbstractCell exte super(column); } +public boolean isCounterCell() +{ +return !isTombstone() && column.isCounterColumn(); +} + +public boolean isLive(int nowInSec) +{ +return localDeletionTime() == NO_DELETION_TIME || (ttl() != NO_TTL && nowInSec < localDeletionTime()); +} + +public boolean isTombstone() +{ +return localDeletionTime() != NO_DELETION_TIME && ttl() == NO_TTL; +} + +public boolean isExpiring() +{ +return ttl() != NO_TTL; +} + +public Cell markCounterLocalToBeCleared() +{ +if (!isCounterCell()) +return this; + +ByteBuffer value = value(); +ByteBuffer marked = CounterContext.instance().markLocalToBeCleared(value); +return marked == value ? this : new BufferCell(column, timestamp(), ttl(), localDeletionTime(), marked, path()); +} + +public Cell purge(DeletionPurger purger, int nowInSec) +{ +if (!isLive(nowInSec)) +{ +if (purger.shouldPurge(timestamp(), localDeletionTime())) +return null; + +// We slightly hijack purging to convert expired but not purgeable columns to tombstones. The reason we do that is +// that once a column has expired it is equivalent to a tombstone but actually using a tombstone is more compact since +// we don't keep the column value. The reason we do it here is that 1) it's somewhat related to dealing with tombstones +// so hopefully not too surprising and 2) we want to this and purging at the same places, so it's simpler/more efficient +// to do both here. +if (isExpiring()) +{ +// Note that as long as the expiring column and the tombstone put together live longer than GC grace seconds, +// we'll fulfil our responsibility to repair. See discussion at +// http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html +return BufferCell.tombstone(column, timestamp(), localDeletionTime() - ttl(), path()).purge(purger, nowInSec); +} +} +return this; +} + +public Cell copy(AbstractAllocator allocator) +{ +CellPath path = path(); +return new BufferCell(column, timestamp(), ttl(), localDeletionTime(), allocator.clone(value()), path == null
[1/2] cassandra git commit: Fix digest calculation for counter cells
Repository: cassandra Updated Branches: refs/heads/cassandra-3.11 303dba650 -> e018bec8a Fix digest calculation for counter cells Patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-13750 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/eb6f03c8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/eb6f03c8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/eb6f03c8 Branch: refs/heads/cassandra-3.11 Commit: eb6f03c8928e913cb6f9eaa7c9ea9f4501039112 Parents: 1a70ded Author: Blake EgglestonAuthored: Tue Aug 8 13:45:41 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:42:31 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 1f42c70..0b92a7e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.15 + * Fix digest calculation for counter cells (CASSANDRA-13750) * Fix ColumnDefinition.cellValueType() for non-frozen collection and change SSTabledump to use type.toJSONString() (CASSANDRA-13573) * Skip materialized view addition if the base table doesn't exist (CASSANDRA-13737) * Drop table should remove corresponding entries in dropped_columns table (CASSANDRA-13730) http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- diff --git a/src/java/org/apache/cassandra/db/rows/AbstractCell.java b/src/java/org/apache/cassandra/db/rows/AbstractCell.java index 7e93c2e..576351e 100644 --- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java +++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java @@ -42,7 +42,15 @@ public abstract class AbstractCell extends Cell public void digest(MessageDigest digest) { -digest.update(value().duplicate()); +if (isCounterCell()) +{ +CounterContext.instance().updateDigest(digest, value()); +} +else +{ +digest.update(value().duplicate()); +} + FBUtilities.updateWithLong(digest, timestamp()); FBUtilities.updateWithInt(digest, ttl()); FBUtilities.updateWithBoolean(digest, isCounterCell()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/test/unit/org/apache/cassandra/db/CounterCellTest.java -- diff --git a/test/unit/org/apache/cassandra/db/CounterCellTest.java b/test/unit/org/apache/cassandra/db/CounterCellTest.java index 08e0b25..a8ddfcc 100644 --- a/test/unit/org/apache/cassandra/db/CounterCellTest.java +++ b/test/unit/org/apache/cassandra/db/CounterCellTest.java @@ -276,8 +276,8 @@ public class CounterCellTest ColumnDefinition cDef = cfs.metadata.getColumnDefinition(col); Cell cleared = BufferCell.live(cfs.metadata, cDef, 5, CounterContext.instance().clearAllLocal(state.context)); -CounterContext.instance().updateDigest(digest1, original.value()); -CounterContext.instance().updateDigest(digest2, cleared.value()); +original.digest(digest1); +cleared.digest(digest2); assert Arrays.equals(digest1.digest(), digest2.digest()); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e018bec8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e018bec8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e018bec8 Branch: refs/heads/trunk Commit: e018bec8ad482a1892b97b5f829ff5fa5801190a Parents: 303dba6 eb6f03c Author: Blake EgglestonAuthored: Thu Aug 10 15:43:45 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:47:22 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e018bec8/CHANGES.txt -- diff --cc CHANGES.txt index 145a746,0b92a7e..3308287 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,5 +1,10 @@@ -3.0.15 +3.11.1 + * "ignore" option is ignored in sstableloader (CASSANDRA-13721) + * Deadlock in AbstractCommitLogSegmentManager (CASSANDRA-13652) + * Duplicate the buffer before passing it to analyser in SASI operation (CASSANDRA-13512) + * Properly evict pstmts from prepared statements cache (CASSANDRA-13641) +Merged from 3.0: + * Fix digest calculation for counter cells (CASSANDRA-13750) * Fix ColumnDefinition.cellValueType() for non-frozen collection and change SSTabledump to use type.toJSONString() (CASSANDRA-13573) * Skip materialized view addition if the base table doesn't exist (CASSANDRA-13737) * Drop table should remove corresponding entries in dropped_columns table (CASSANDRA-13730) http://git-wip-us.apache.org/repos/asf/cassandra/blob/e018bec8/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- diff --cc src/java/org/apache/cassandra/db/rows/AbstractCell.java index 54c8f24,576351e..744d113 --- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java +++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java @@@ -44,84 -40,17 +44,92 @@@ public abstract class AbstractCell exte super(column); } +public boolean isCounterCell() +{ +return !isTombstone() && column.isCounterColumn(); +} + +public boolean isLive(int nowInSec) +{ +return localDeletionTime() == NO_DELETION_TIME || (ttl() != NO_TTL && nowInSec < localDeletionTime()); +} + +public boolean isTombstone() +{ +return localDeletionTime() != NO_DELETION_TIME && ttl() == NO_TTL; +} + +public boolean isExpiring() +{ +return ttl() != NO_TTL; +} + +public Cell markCounterLocalToBeCleared() +{ +if (!isCounterCell()) +return this; + +ByteBuffer value = value(); +ByteBuffer marked = CounterContext.instance().markLocalToBeCleared(value); +return marked == value ? this : new BufferCell(column, timestamp(), ttl(), localDeletionTime(), marked, path()); +} + +public Cell purge(DeletionPurger purger, int nowInSec) +{ +if (!isLive(nowInSec)) +{ +if (purger.shouldPurge(timestamp(), localDeletionTime())) +return null; + +// We slightly hijack purging to convert expired but not purgeable columns to tombstones. The reason we do that is +// that once a column has expired it is equivalent to a tombstone but actually using a tombstone is more compact since +// we don't keep the column value. The reason we do it here is that 1) it's somewhat related to dealing with tombstones +// so hopefully not too surprising and 2) we want to this and purging at the same places, so it's simpler/more efficient +// to do both here. +if (isExpiring()) +{ +// Note that as long as the expiring column and the tombstone put together live longer than GC grace seconds, +// we'll fulfil our responsibility to repair. See discussion at +// http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html +return BufferCell.tombstone(column, timestamp(), localDeletionTime() - ttl(), path()).purge(purger, nowInSec); +} +} +return this; +} + +public Cell copy(AbstractAllocator allocator) +{ +CellPath path = path(); +return new BufferCell(column, timestamp(), ttl(), localDeletionTime(), allocator.clone(value()), path == null ? null :
[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4b736366 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4b736366 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4b736366 Branch: refs/heads/trunk Commit: 4b736366c2a958e67dffa12ad776d850ba370752 Parents: 9c3354e e018bec Author: Blake EgglestonAuthored: Thu Aug 10 15:48:09 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:49:20 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4b736366/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4b736366/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4b736366/test/unit/org/apache/cassandra/db/CounterCellTest.java -- diff --cc test/unit/org/apache/cassandra/db/CounterCellTest.java index 8c1347d,74599c3..b10a9c7 --- a/test/unit/org/apache/cassandra/db/CounterCellTest.java +++ b/test/unit/org/apache/cassandra/db/CounterCellTest.java @@@ -272,11 -272,11 +272,11 @@@ public class CounterCellTes Cell original = createCounterCellFromContext(cfs, col, state, 5); -ColumnDefinition cDef = cfs.metadata.getColumnDefinition(col); +ColumnMetadata cDef = cfs.metadata().getColumn(col); Cell cleared = BufferCell.live(cDef, 5, CounterContext.instance().clearAllLocal(state.context)); - CounterContext.instance().updateDigest(digest1, original.value()); - CounterContext.instance().updateDigest(digest2, cleared.value()); + original.digest(digest1); + cleared.digest(digest2); assert Arrays.equals(digest1.digest(), digest2.digest()); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/3] cassandra git commit: Fix digest calculation for counter cells
Repository: cassandra Updated Branches: refs/heads/trunk 9c3354e32 -> 4b736366c Fix digest calculation for counter cells Patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-13750 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/eb6f03c8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/eb6f03c8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/eb6f03c8 Branch: refs/heads/trunk Commit: eb6f03c8928e913cb6f9eaa7c9ea9f4501039112 Parents: 1a70ded Author: Blake EgglestonAuthored: Tue Aug 8 13:45:41 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 15:42:31 2017 -0700 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/db/rows/AbstractCell.java | 10 +- test/unit/org/apache/cassandra/db/CounterCellTest.java | 4 ++-- 3 files changed, 12 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 1f42c70..0b92a7e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.15 + * Fix digest calculation for counter cells (CASSANDRA-13750) * Fix ColumnDefinition.cellValueType() for non-frozen collection and change SSTabledump to use type.toJSONString() (CASSANDRA-13573) * Skip materialized view addition if the base table doesn't exist (CASSANDRA-13737) * Drop table should remove corresponding entries in dropped_columns table (CASSANDRA-13730) http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/src/java/org/apache/cassandra/db/rows/AbstractCell.java -- diff --git a/src/java/org/apache/cassandra/db/rows/AbstractCell.java b/src/java/org/apache/cassandra/db/rows/AbstractCell.java index 7e93c2e..576351e 100644 --- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java +++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java @@ -42,7 +42,15 @@ public abstract class AbstractCell extends Cell public void digest(MessageDigest digest) { -digest.update(value().duplicate()); +if (isCounterCell()) +{ +CounterContext.instance().updateDigest(digest, value()); +} +else +{ +digest.update(value().duplicate()); +} + FBUtilities.updateWithLong(digest, timestamp()); FBUtilities.updateWithInt(digest, ttl()); FBUtilities.updateWithBoolean(digest, isCounterCell()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/eb6f03c8/test/unit/org/apache/cassandra/db/CounterCellTest.java -- diff --git a/test/unit/org/apache/cassandra/db/CounterCellTest.java b/test/unit/org/apache/cassandra/db/CounterCellTest.java index 08e0b25..a8ddfcc 100644 --- a/test/unit/org/apache/cassandra/db/CounterCellTest.java +++ b/test/unit/org/apache/cassandra/db/CounterCellTest.java @@ -276,8 +276,8 @@ public class CounterCellTest ColumnDefinition cDef = cfs.metadata.getColumnDefinition(col); Cell cleared = BufferCell.live(cfs.metadata, cDef, 5, CounterContext.instance().clearAllLocal(state.context)); -CounterContext.instance().updateDigest(digest1, original.value()); -CounterContext.instance().updateDigest(digest2, cleared.value()); +original.digest(digest1); +cleared.digest(digest2); assert Arrays.equals(digest1.digest(), digest2.digest()); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13750) Counter digests include local data
[ https://issues.apache.org/jira/browse/CASSANDRA-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-13750: Resolution: Fixed Status: Resolved (was: Patch Available) Committed as {{eb6f03c8928e913cb6f9eaa7c9ea9f4501039112}} Opened/reviewed/committed CASSANDRA-13755 to fix only non-flaky test failure > Counter digests include local data > -- > > Key: CASSANDRA-13750 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13750 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0, 3.0.x, 3.11.x > > > In 3.x+, the raw counter value bytes are used when hashing counters for reads > and repair, including local shard data, which is removed when streamed. This > leads to constant digest mismatches and repair overstreaming. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
[ https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122535#comment-16122535 ] Joel Knighton commented on CASSANDRA-13755: --- Thanks! > dtest failure: > repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test > > > Key: CASSANDRA-13755 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13755 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Joel Knighton > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121857#comment-16121857 ] Aleksey Yeschenko commented on CASSANDRA-11748: --- bq. But we should also not forget to look at the receiver side for incoming pull requests. Joining the cluster with a schema mismatch should not cause a node to answer each of those in parallel. Good observation, though maybe there is a better solution. I think we shouldn't pull schema immediately from a node that just went up (and is potentially missing updates). Schedule that pull with a delay instead, give the new node a chance to pull the new schema from one of the nodes in the cluster. It'll most likely converge by the time the delay has passed, so we'd just abort the request if schema versions now match. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122012#comment-16122012 ] Jeff Jirsa commented on CASSANDRA-13754: What version are you on [~urandom] (or really, which version of netty is in the classpath) ? > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: OpenJDK 8u141-b15 >Reporter: Eric Evans > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sankalp kohli reassigned CASSANDRA-10726: - Assignee: (was: Marcus Eriksson) > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13752) Corrupted SSTables created in 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122020#comment-16122020 ] Jeff Jirsa commented on CASSANDRA-13752: Any additional context you can provide - how old is the cluster? Have you changed anything recently? When did you upgrade to 3.11.0? How long before you saw those errors? Do you run repairs? Incremental repairs or full? Anything else in the logs that looks atypical? > Corrupted SSTables created in 3.11 > -- > > Key: CASSANDRA-13752 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13752 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Priority: Blocker > > We have discovered issues with corrupted SSTables. > {code} > ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - > Cannot read sstable > /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, > Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, > Index.db, Filter.db]; other IO error, skipping table > java.io.EOFException: EOF after 1898 bytes out of 21093 > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:488) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:396) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader$5.run(SSTableReader.java:561) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_111] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.0.jar:3.11.0] > {code} > Files look like this: > {code} > -rw-r--r--. 1 cassandra cassandra 3899251 Aug 7 08:37 > mc-6166-big-CompressionInfo.db > -rw-r--r--. 1 cassandra cassandra 16874421686 Aug 7 08:37 mc-6166-big-Data.db > -rw-r--r--. 1 cassandra cassandra 10 Aug 7 08:37 > mc-6166-big-Digest.crc32 > -rw-r--r--. 1 cassandra cassandra 2930904 Aug 7 08:37 > mc-6166-big-Filter.db > -rw-r--r--. 1 cassandra cassandra 75880 Aug 7 08:37 > mc-6166-big-Index.db > -rw-r--r--. 1 cassandra cassandra 13762 Aug 7 08:37 > mc-6166-big-Statistics.db > -rw-r--r--. 1 cassandra cassandra 882008 Aug 7 08:37 > mc-6166-big-Summary.db > -rw-r--r--. 1 cassandra cassandra 92 Aug 7 08:37 mc-6166-big-TOC.txt > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Fix race / ref leak in PendingRepairManager
Repository: cassandra Updated Branches: refs/heads/trunk ba87ab4e9 -> 9c3354e32 Fix race / ref leak in PendingRepairManager Patch by Blake Eggleston; Reviewed by Marcus Eriksson for CASSANDRA-13751 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9c3354e3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9c3354e3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9c3354e3 Branch: refs/heads/trunk Commit: 9c3354e3211c6a3f3982e87477e156c29cd9b7ea Parents: ba87ab4 Author: Blake EgglestonAuthored: Tue Aug 8 10:32:35 2017 -0700 Committer: Blake Eggleston Committed: Thu Aug 10 12:01:00 2017 -0700 -- CHANGES.txt | 1 + .../compaction/AbstractCompactionStrategy.java | 29 ++--- .../compaction/CompactionStrategyManager.java | 25 +++--- .../compaction/LeveledCompactionStrategy.java | 10 +- .../db/compaction/PendingRepairManager.java | 34 +--- .../cassandra/io/sstable/ISSTableScanner.java | 34 .../db/compaction/PendingRepairManagerTest.java | 24 ++ 7 files changed, 113 insertions(+), 44 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c3354e3/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 849848f..e997b50 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Fix race / ref leak in PendingRepairManager (CASSANDRA-13751) * Enable ppc64le runtime as unsupported architecture (CASSANDRA-13615) * Improve sstablemetadata output (CASSANDRA-11483) * Support for migrating legacy users to roles has been dropped (CASSANDRA-13371) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c3354e3/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java index 5333683..f1f42a7 100644 --- a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java @@ -293,15 +293,7 @@ public abstract class AbstractCompactionStrategy } catch (Throwable t) { -try -{ -new ScannerList(scanners).close(); -} -catch (Throwable t2) -{ -t.addSuppressed(t2); -} -throw t; +ISSTableScanner.closeAllAndPropagate(scanners, t); } return new ScannerList(scanners); } @@ -385,24 +377,7 @@ public abstract class AbstractCompactionStrategy public void close() { -Throwable t = null; -for (ISSTableScanner scanner : scanners) -{ -try -{ -scanner.close(); -} -catch (Throwable t2) -{ -JVMStabilityInspector.inspectThrowable(t2); -if (t == null) -t = t2; -else -t.addSuppressed(t2); -} -} -if (t != null) -throw Throwables.propagate(t); +ISSTableScanner.closeAllAndPropagate(scanners, null); } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/9c3354e3/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java index e58ccc2..6342a1b 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java @@ -21,7 +21,6 @@ package org.apache.cassandra.db.compaction; import java.util.*; import java.util.concurrent.Callable; import java.util.concurrent.locks.ReentrantReadWriteLock; -import java.util.function.Predicate; import java.util.stream.Collectors; import java.util.stream.Stream; import java.util.function.Supplier; @@ -735,7 +734,7 @@ public class CompactionStrategyManager implements INotificationConsumer * @return */ @SuppressWarnings("resource") -public AbstractCompactionStrategy.ScannerList getScanners(Collection sstables, Collection ranges) +public
[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sankalp kohli reassigned CASSANDRA-10726: - Assignee: Marcus Eriksson (was: Xiaolong Jiang) > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Marcus Eriksson > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13363) java.lang.ArrayIndexOutOfBoundsException: null
[ https://issues.apache.org/jira/browse/CASSANDRA-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13363: -- Reviewer: Aleksey Yeschenko > java.lang.ArrayIndexOutOfBoundsException: null > -- > > Key: CASSANDRA-13363 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13363 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Cassandra 3.10 >Reporter: Artem Rokhin >Assignee: zhaoyan >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > Constantly see this error in the log without any additional information or a > stack trace. > {code} > Exception in thread Thread[MessagingService-Incoming-/10.0.1.26,5,main] > {code} > {code} > java.lang.ArrayIndexOutOfBoundsException: null > {code} > Logger: org.apache.cassandra.service.CassandraDaemon > Thrdead: MessagingService-Incoming-/10.0.1.12 > Method: uncaughtException > File: CassandraDaemon.java > Line: 229 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13751) Race / ref leak in PendingRepairManager
[ https://issues.apache.org/jira/browse/CASSANDRA-13751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-13751: Resolution: Fixed Status: Resolved (was: Patch Available) Got the utest passing. dtests failures were flaky/succeeding locally. Committed as {{9c3354e3211c6a3f3982e87477e156c29cd9b7ea}} > Race / ref leak in PendingRepairManager > --- > > Key: CASSANDRA-13751 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13751 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > PendingRepairManager#getScanners has an assertion that confirms an sstable > is, in fact, marked as pending repair. Since validation compactions don't use > the same concurrency controls as proper compactions, they can race with > promotion/demotion compactions and end up getting assertion errors when the > pending repair id is changed while the scanners are being acquired. Also, > error handling in PendingRepairManager and CompactionStrategyManager leaks > refs when this happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sankalp kohli updated CASSANDRA-10726: -- Reviewer: Marcus Eriksson (was: Blake Eggleston) > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sankalp kohli reassigned CASSANDRA-10726: - Assignee: Xiaolong Jiang > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13752) Corrupted SSTables created in 3.11
Hannu Kröger created CASSANDRA-13752: Summary: Corrupted SSTables created in 3.11 Key: CASSANDRA-13752 URL: https://issues.apache.org/jira/browse/CASSANDRA-13752 Project: Cassandra Issue Type: Bug Reporter: Hannu Kröger Priority: Blocker We have discovered issues with corrupted SSTables. {code} ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - Cannot read sstable /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, Index.db, Filter.db]; other IO error, skipping table java.io.EOFException: EOF after 1898 bytes out of 21093 at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:488) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:396) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader$5.run(SSTableReader.java:561) ~[apache-cassandra-3.11.0.jar:3.11.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.0.jar:3.11.0] {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13752) Corrupted SSTables created in 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121388#comment-16121388 ] Hannu Kröger commented on CASSANDRA-13752: -- This has happened on 2 servers for total of at least 3 sstables so far and I can read those files with unix tools like cat so it doesn't seem like it's a FS or HW issue. > Corrupted SSTables created in 3.11 > -- > > Key: CASSANDRA-13752 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13752 > Project: Cassandra > Issue Type: Bug >Reporter: Hannu Kröger >Priority: Blocker > > We have discovered issues with corrupted SSTables. > {code} > ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - > Cannot read sstable > /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, > Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, > Index.db, Filter.db]; other IO error, skipping table > java.io.EOFException: EOF after 1898 bytes out of 21093 > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:488) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:396) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > org.apache.cassandra.io.sstable.format.SSTableReader$5.run(SSTableReader.java:561) > ~[apache-cassandra-3.11.0.jar:3.11.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_111] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.0.jar:3.11.0] > {code} > Files look like this: > {code} > -rw-r--r--. 1 cassandra cassandra 3899251 Aug 7 08:37 > mc-6166-big-CompressionInfo.db > -rw-r--r--. 1 cassandra cassandra 16874421686 Aug 7 08:37 mc-6166-big-Data.db > -rw-r--r--. 1 cassandra cassandra 10 Aug 7 08:37 > mc-6166-big-Digest.crc32 > -rw-r--r--. 1 cassandra cassandra 2930904 Aug 7 08:37 > mc-6166-big-Filter.db > -rw-r--r--. 1 cassandra cassandra 75880 Aug 7 08:37 > mc-6166-big-Index.db > -rw-r--r--. 1 cassandra cassandra 13762 Aug 7 08:37 > mc-6166-big-Statistics.db > -rw-r--r--. 1 cassandra cassandra 882008 Aug 7 08:37 > mc-6166-big-Summary.db > -rw-r--r--. 1 cassandra cassandra 92 Aug 7 08:37 mc-6166-big-TOC.txt > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13753) The documentation website can be fitted well on device width.
Ashish Tomer created CASSANDRA-13753: Summary: The documentation website can be fitted well on device width. Key: CASSANDRA-13753 URL: https://issues.apache.org/jira/browse/CASSANDRA-13753 Project: Cassandra Issue Type: Improvement Components: Documentation and Website Environment: *Operating System : *Ubuntu *Browsers: * * Firefox * Google Chrome Reporter: Ashish Tomer Fix For: 4.x The following shortcomings/ issues are noticed on the pages of cassandra documentation website ([http://cassandra.apache.org/doc/latest/]) *1.* On laptop screen with resolution 1366 768 the width of the webpage is more than the width of the screen. The content of the website is going left and user has to scroll horizontally to read the lines. The horizontal scrollbar at the bottom needs to be removed. *2.* When some pages are scrolled down the whole page fluctuate and jump back to top of the page. {color:red}Example link - {color}[http://cassandra.apache.org/doc/latest/architecture/overview.html] *3.* The website is not mobile friendly and can be made responsive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12728) Handling partially written hint files
[ https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121904#comment-16121904 ] Hansey Chen commented on CASSANDRA-12728: - I was looking at this issue and could not understand one of the effects of this bug. Garvit Juniwal mentioned in [one|https://issues.apache.org/jira/browse/CASSANDRA-12728?focusedCommentId=15576548=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15576548] of his comments that this bug will "put cassandra in a crash loop". Also Harikrishnan said in [a related issue|https://issues.apache.org/jira/browse/CASSANDRA-12844] that this bug crashed many nodes. But I cannot figure out how an EOFE during hinted handoff can crash a cassandra node. Is it only crashing the hints dispatching thread? And how can it affect other nodes? Could anyone please explain a little bit more? Many thanks in advance. > Handling partially written hint files > - > > Key: CASSANDRA-12728 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12728 > Project: Cassandra > Issue Type: Bug >Reporter: Sharvanath Pathak >Assignee: Garvit Juniwal > Labels: lhf > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: CASSANDRA-12728.patch > > > {noformat} > ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397 > HintsDispatchExecutor.java:225 - Failed to dispatch hints file > d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is corrupted > ({}) > org.apache.cassandra.io.FSReadError: java.io.EOFException > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259) > [apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242) > [apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220) > [apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199) > [apache-cassandra-3.0.6.jar:3.0.6] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_77] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_77] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] > Caused by: java.io.EOFException: null > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278) > ~[apache-cassandra-3.0.6.jar:3.0.6] > ... 15 common frames omitted > {noformat} > We've found out that the hint file was truncated because there was a hard > reboot
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122194#comment-16122194 ] Matt Byrd commented on CASSANDRA-11748: --- {quote} But we should at least take the schema Ids and/or endpoints into account as well. It just doesn't make sense to queue 50 requests for the same schema Id and potentially drop requests for a different schema afterwards. {quote} Yes, I did also have a patch with an expiring map of schema-version to counter and was limiting it per schema version, but decided to keep it simple, since the single limit sufficed for a particular scenario. Less relevant, but it also provides some protection in the rather strange case that there are actually lots of different schema versions in the cluster. I could resurrect the schema version patch, but it sounds like we're considering a slightly different approach. {quote} Schedule that pull with a delay instead, give the new node a chance to pull the new schema from one of the nodes in the cluster. It'll most likely converge by the time the delay has passed, so we'd just abort the request if schema versions now match. {quote} Once a node has been up for MIGRATION_DELAY_IN_MS and doesn't have an empty schema, it will always schedule the task to pull schema with a delay of MIGRATION_DELAY_IN_MS and then do a further check within the task itself to see if the schema versions still differ before asking for schema. Though admittedly this problem does still exist if two nodes start up at the same time, they may pull from each other. I suppose we're going to schedule a pull from a newer node too, then assuming we successively merge the schema together we end up hopefully at the final desired state? Although in the interim I suppose it's possible a node might come into play with a slightly older schema, but I suppose that can just happen whenever a DOWN node comes up with out of date schema. It's also possible that if the node is so overwhelmed by the reverse problem, it won't have made it to the correct schema version in MIGRATION_DELAY_IN_MS and hence will start sending it's old schema back at all the other nodes in the cluster, fortunately the sending happens on the migration stage so is single threaded and less likely to cause OOMS. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19
[jira] [Updated] (CASSANDRA-13655) Range deletes in a CAS batch are ignored
[ https://issues.apache.org/jira/browse/CASSANDRA-13655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13655: --- Priority: Blocker (was: Critical) > Range deletes in a CAS batch are ignored > > > Key: CASSANDRA-13655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13655 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Jeff Jirsa >Assignee: Jeff Jirsa >Priority: Blocker > Fix For: 3.0.x, 3.11.x, 4.x > > > Range deletes in a CAS batch are ignored -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122718#comment-16122718 ] Michael Fong edited comment on CASSANDRA-11748 at 8/11/17 2:34 AM: --- Hi, guys, Thanks for putting some time on this issue, and this is an awesome discussion thread. When we reported this issue a year ago, we ended up patching the C* (v2.0) with similar approach to CASSANDRA-13569, but later we found it was not addressing the root problem but putting more patches on top of one another as time goes by. In my humble opinion, I am not sure if we want to have many more types of soft/hard caps to reduce risks of running into OOM. Instead, we could probably look deeper into causes behind the current working model, such as 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code:java}Schema.instance.mergeAndAnnounceVersion(){code} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. Maybe there are other tickets working to address these items already, which I may not know. Thanks. Michael Fong was (Author: mcfongtw): Hi, guys, Thanks for putting some time on this issue, and this is an awesome discussion thread. When we reported this issue a year ago, we ended up patching the C* (v2.0) with similar approach to CASSANDRA-13569, but later we found it was not addressing the root problem but putting more patches on top of one another as time goes by. In my humble opinion, I am not sure if we want to have many more types of soft/hard caps to reduce risks of running into OOM. Instead, we could probably look deeper into causes behind the current working model, such as 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code:java} Schema.instance.mergeAndAnnounceVersion() {code} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. Maybe there are other tickets working to address these items already, which I may not know. Thanks. Michael Fong > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122718#comment-16122718 ] Michael Fong commented on CASSANDRA-11748: -- Hi, guys, Thanks for putting some time on this issue, and this is an awesome discussion thread. When we reported this issue a year ago, we ended up patching the C* (v2.0) with similar approach to CASSANDRA-13569, but later we found it was not addressing the root problem but putting more patches on top of one another as time goes by. In my humble opinion, I am not sure if we want to have many more types of soft/hard caps to reduce risks of running into OOM. Instead, we could probably look deeper into causes behind the current working model, such as 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code:java} Schema.instance.mergeAndAnnounceVersion() {code} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. Maybe there are other tickets working to address these items already, which I may not know. Thanks. Michael Fong > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the >
[jira] [Comment Edited] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122718#comment-16122718 ] Michael Fong edited comment on CASSANDRA-11748 at 8/11/17 2:35 AM: --- Hi, guys, Thanks for putting some time on this issue, and this is an awesome discussion thread. When we reported this issue a year ago, we ended up patching the C* (v2.0) with similar approach to CASSANDRA-13569, but later we found it was not addressing the root problem but putting more patches on top of one another as time goes by. In my humble opinion, I am not sure if we want to have many more types of soft/hard caps to reduce risks of running into OOM. Instead, we could probably look deeper into causes behind the current working model, such as 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code:java}Schema.instance.mergeSchemaAndAnnounceVersion(){code} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. Maybe there are other tickets working to address these items already, which I may not know. Thanks. Michael Fong was (Author: mcfongtw): Hi, guys, Thanks for putting some time on this issue, and this is an awesome discussion thread. When we reported this issue a year ago, we ended up patching the C* (v2.0) with similar approach to CASSANDRA-13569, but later we found it was not addressing the root problem but putting more patches on top of one another as time goes by. In my humble opinion, I am not sure if we want to have many more types of soft/hard caps to reduce risks of running into OOM. Instead, we could probably look deeper into causes behind the current working model, such as 1. Have migration checks and requests fired asynchronously and finally stack up the all message at the receiver end merge the schema one-by-one at {code:java}Schema.instance.mergeAndAnnounceVersion(){code} 2. Send the receiver the complete copy of schema, instead of delta copy of schema out of diff between two nodes. 3. Last but not least, the most mysterious problem that leads to OOM and we could not figure out why back then, is that there are hundreds of migration task all fired nearly simultaneously, within 2 s. The number of rpcs does not match with the nodes in cluster, but is close to number of second taken for the node to reboot. Maybe there are other tickets working to address these items already, which I may not know. Thanks. Michael Fong > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the
[jira] [Assigned] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING
[ https://issues.apache.org/jira/browse/CASSANDRA-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa reassigned CASSANDRA-13743: -- Assignee: Corentin Chary > CAPTURE not easilly usable with PAGING > -- > > Key: CASSANDRA-13743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13743 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 4.x > > > See > https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3 > for a patch fixing that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13744) Better bootstrap failure message when blocked by (potential) range movement
[ https://issues.apache.org/jira/browse/CASSANDRA-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122628#comment-16122628 ] Jeff Jirsa commented on CASSANDRA-13744: I'll claim it, not only because I'm first, but because I'm not sure if Jason's +1 carries through on the new tests. > Better bootstrap failure message when blocked by (potential) range movement > --- > > Key: CASSANDRA-13744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13744 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck >Priority: Trivial > Fix For: 3.11.x, 4.x > > > The UnsupportedOperationException thrown from > {{StorageService.joinTokenRing(..)}} when it's detected that other nodes are > bootstrapping|leaving|moving offers no information as to which are those > other nodes. > In a large cluster this might not be obvious nor easy to discover, gossipinfo > can hold information that takes a bit of effort to uncover. Even when it is > easily seen it's helpful to have it confirmed. > Attached is the patch that provides a more thorough exception message to the > failed bootstrap attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13744) Better bootstrap failure message when blocked by (potential) range movement
[ https://issues.apache.org/jira/browse/CASSANDRA-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13744: --- Reviewer: Jeff Jirsa > Better bootstrap failure message when blocked by (potential) range movement > --- > > Key: CASSANDRA-13744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13744 > Project: Cassandra > Issue Type: Bug >Reporter: mck >Assignee: mck >Priority: Trivial > Fix For: 3.11.x, 4.x > > > The UnsupportedOperationException thrown from > {{StorageService.joinTokenRing(..)}} when it's detected that other nodes are > bootstrapping|leaving|moving offers no information as to which are those > other nodes. > In a large cluster this might not be obvious nor easy to discover, gossipinfo > can hold information that takes a bit of effort to uncover. Even when it is > easily seen it's helpful to have it confirmed. > Attached is the patch that provides a more thorough exception message to the > failed bootstrap attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: cqlsh: don't pause when capturing data
Repository: cassandra Updated Branches: refs/heads/trunk 4b736366c -> ed0243954 cqlsh: don't pause when capturing data Patch by Corentin Chary; Reviewed by Chris Lohfink for CASSANDRA-13473 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ed024395 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ed024395 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ed024395 Branch: refs/heads/trunk Commit: ed0243954f9ab9c5c68a4516a836ab3710891d5b Parents: 4b73636 Author: Corentin CharyAuthored: Fri Aug 4 10:19:57 2017 +0200 Committer: Jeff Jirsa Committed: Thu Aug 10 18:02:31 2017 -0700 -- CHANGES.txt | 1 + bin/cqlsh.py | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed024395/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 808665a..5c6994a 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -107,6 +107,7 @@ * Nodetool repair can hang forever if we lose the notification for the repair completing/failing (CASSANDRA-13480) * Anticompaction can cause noisy log messages (CASSANDRA-13684) * Switch to client init for sstabledump (CASSANDRA-13683) + * CQLSH: Don't pause when capturing data (CASSANDRA-13473) 3.11.1 http://git-wip-us.apache.org/repos/asf/cassandra/blob/ed024395/bin/cqlsh.py -- diff --git a/bin/cqlsh.py b/bin/cqlsh.py index 4e634ca..2e10490 100644 --- a/bin/cqlsh.py +++ b/bin/cqlsh.py @@ -1084,7 +1084,9 @@ class Shell(cmd.Cmd): num_rows += len(result.current_rows) self.print_static_result(result, table_meta) if result.has_more_pages: -raw_input("---MORE---") +if self.shunted_query_out is None: +# Only pause when not capturing. +raw_input("---MORE---") result.fetch_next_page() else: break - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING
[ https://issues.apache.org/jira/browse/CASSANDRA-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13743: --- Resolution: Fixed Reviewer: Chris Lohfink Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Ready to Commit) Thanks guys, committed as {{ed0243954f9ab9c5c68a4516a836ab3710891d5b}} > CAPTURE not easilly usable with PAGING > -- > > Key: CASSANDRA-13743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13743 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 4.0 > > > See > https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3 > for a patch fixing that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING
[ https://issues.apache.org/jira/browse/CASSANDRA-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13743: --- Status: Ready to Commit (was: Patch Available) > CAPTURE not easilly usable with PAGING > -- > > Key: CASSANDRA-13743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13743 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 4.x > > > See > https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3 > for a patch fixing that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13664) RangeFetchMapCalculator should not try to optimise 'trivial' ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122640#comment-16122640 ] Jeff Jirsa commented on CASSANDRA-13664: This is marked ready to commit - is it good to go? That dtest run has already expired. > RangeFetchMapCalculator should not try to optimise 'trivial' ranges > --- > > Key: CASSANDRA-13664 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13664 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 4.x > > > RangeFetchMapCalculator (CASSANDRA-4650) tries to make the number of streams > out of each node as even as possible. > In a typical multi-dc ring the nodes in the dcs are setup using token + 1, > creating many tiny ranges. If we only try to optimise over the number of > streams, it is likely that the amount of data streamed out of each node is > unbalanced. > We should ignore those trivial ranges and only optimise the big ones, then > share the tiny ones over the nodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[3/6] cassandra git commit: sstabledump reports incorrect usage for argument order
sstabledump reports incorrect usage for argument order Patch by Varun Barala; Reviewed by ZhaoYang for CASSANDRA-13532 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fab38456 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fab38456 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fab38456 Branch: refs/heads/trunk Commit: fab384560311ec1f3043fbf6137093ea129afa68 Parents: eb6f03c Author: Jeff JirsaAuthored: Thu Aug 10 18:11:56 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:11:56 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0b92a7e..2e9e8ad 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -17,6 +17,7 @@ * Allow native function calls in CQLSSTableWriter (CASSANDRA-12606) * Fix secondary index queries on COMPACT tables (CASSANDRA-13627) * Nodetool listsnapshots output is missing a newline, if there are no snapshots (CASSANDRA-13568) + * sstabledump reports incorrect usage for argument order (CASSANDRA-13532) Merged from 2.2: * Prevent integer overflow on exabyte filesystems (CASSANDRA-13067) * Fix queries with LIMIT and filtering on clustering columns (CASSANDRA-11223) http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/src/java/org/apache/cassandra/tools/SSTableExport.java -- diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java index cff1516..ac8ea61 100644 --- a/src/java/org/apache/cassandra/tools/SSTableExport.java +++ b/src/java/org/apache/cassandra/tools/SSTableExport.java @@ -254,7 +254,7 @@ public class SSTableExport private static void printUsage() { -String usage = String.format("sstabledump %n"); +String usage = String.format("sstabledump %n"); String header = "Dump contents of given SSTable to standard output in JSON format."; new HelpFormatter().printHelp(usage, header, options, ""); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1884dbe2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1884dbe2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1884dbe2 Branch: refs/heads/trunk Commit: 1884dbe288bf53af9359e4e1ad9e1cfc0d212f0c Parents: e018bec fab3845 Author: Jeff JirsaAuthored: Thu Aug 10 18:12:49 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:13:24 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1884dbe2/CHANGES.txt -- diff --cc CHANGES.txt index 3308287,2e9e8ad..c672675 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -18,9 -14,11 +18,10 @@@ Merged from 3.0 * Make concat work with iterators that have different subsets of columns (CASSANDRA-13482) * Set test.runners based on cores and memory size (CASSANDRA-13078) * Allow different NUMACTL_ARGS to be passed in (CASSANDRA-13557) - * Allow native function calls in CQLSSTableWriter (CASSANDRA-12606) * Fix secondary index queries on COMPACT tables (CASSANDRA-13627) * Nodetool listsnapshots output is missing a newline, if there are no snapshots (CASSANDRA-13568) + * sstabledump reports incorrect usage for argument order (CASSANDRA-13532) - Merged from 2.2: +Merged from 2.2: * Prevent integer overflow on exabyte filesystems (CASSANDRA-13067) * Fix queries with LIMIT and filtering on clustering columns (CASSANDRA-11223) * Fix potential NPE when resume bootstrap fails (CASSANDRA-13272) http://git-wip-us.apache.org/repos/asf/cassandra/blob/1884dbe2/src/java/org/apache/cassandra/tools/SSTableExport.java -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/6] cassandra git commit: sstabledump reports incorrect usage for argument order
sstabledump reports incorrect usage for argument order Patch by Varun Barala; Reviewed by ZhaoYang for CASSANDRA-13532 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fab38456 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fab38456 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fab38456 Branch: refs/heads/cassandra-3.11 Commit: fab384560311ec1f3043fbf6137093ea129afa68 Parents: eb6f03c Author: Jeff JirsaAuthored: Thu Aug 10 18:11:56 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:11:56 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0b92a7e..2e9e8ad 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -17,6 +17,7 @@ * Allow native function calls in CQLSSTableWriter (CASSANDRA-12606) * Fix secondary index queries on COMPACT tables (CASSANDRA-13627) * Nodetool listsnapshots output is missing a newline, if there are no snapshots (CASSANDRA-13568) + * sstabledump reports incorrect usage for argument order (CASSANDRA-13532) Merged from 2.2: * Prevent integer overflow on exabyte filesystems (CASSANDRA-13067) * Fix queries with LIMIT and filtering on clustering columns (CASSANDRA-11223) http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/src/java/org/apache/cassandra/tools/SSTableExport.java -- diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java index cff1516..ac8ea61 100644 --- a/src/java/org/apache/cassandra/tools/SSTableExport.java +++ b/src/java/org/apache/cassandra/tools/SSTableExport.java @@ -254,7 +254,7 @@ public class SSTableExport private static void printUsage() { -String usage = String.format("sstabledump %n"); +String usage = String.format("sstabledump %n"); String header = "Dump contents of given SSTable to standard output in JSON format."; new HelpFormatter().printHelp(usage, header, options, ""); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d68357a4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d68357a4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d68357a4 Branch: refs/heads/trunk Commit: d68357a447a5296e8c5dfd097fbc092e586819fe Parents: ed02439 1884dbe Author: Jeff JirsaAuthored: Thu Aug 10 18:13:45 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:14:14 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d68357a4/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d68357a4/src/java/org/apache/cassandra/tools/SSTableExport.java -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/6] cassandra git commit: sstabledump reports incorrect usage for argument order
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 eb6f03c89 -> fab384560 refs/heads/cassandra-3.11 e018bec8a -> 1884dbe28 refs/heads/trunk ed0243954 -> d68357a44 sstabledump reports incorrect usage for argument order Patch by Varun Barala; Reviewed by ZhaoYang for CASSANDRA-13532 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fab38456 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fab38456 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fab38456 Branch: refs/heads/cassandra-3.0 Commit: fab384560311ec1f3043fbf6137093ea129afa68 Parents: eb6f03c Author: Jeff JirsaAuthored: Thu Aug 10 18:11:56 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:11:56 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0b92a7e..2e9e8ad 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -17,6 +17,7 @@ * Allow native function calls in CQLSSTableWriter (CASSANDRA-12606) * Fix secondary index queries on COMPACT tables (CASSANDRA-13627) * Nodetool listsnapshots output is missing a newline, if there are no snapshots (CASSANDRA-13568) + * sstabledump reports incorrect usage for argument order (CASSANDRA-13532) Merged from 2.2: * Prevent integer overflow on exabyte filesystems (CASSANDRA-13067) * Fix queries with LIMIT and filtering on clustering columns (CASSANDRA-11223) http://git-wip-us.apache.org/repos/asf/cassandra/blob/fab38456/src/java/org/apache/cassandra/tools/SSTableExport.java -- diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java index cff1516..ac8ea61 100644 --- a/src/java/org/apache/cassandra/tools/SSTableExport.java +++ b/src/java/org/apache/cassandra/tools/SSTableExport.java @@ -254,7 +254,7 @@ public class SSTableExport private static void printUsage() { -String usage = String.format("sstabledump %n"); +String usage = String.format("sstabledump %n"); String header = "Dump contents of given SSTable to standard output in JSON format."; new HelpFormatter().printHelp(usage, header, options, ""); } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1884dbe2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1884dbe2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1884dbe2 Branch: refs/heads/cassandra-3.11 Commit: 1884dbe288bf53af9359e4e1ad9e1cfc0d212f0c Parents: e018bec fab3845 Author: Jeff JirsaAuthored: Thu Aug 10 18:12:49 2017 -0700 Committer: Jeff Jirsa Committed: Thu Aug 10 18:13:24 2017 -0700 -- CHANGES.txt| 1 + src/java/org/apache/cassandra/tools/SSTableExport.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1884dbe2/CHANGES.txt -- diff --cc CHANGES.txt index 3308287,2e9e8ad..c672675 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -18,9 -14,11 +18,10 @@@ Merged from 3.0 * Make concat work with iterators that have different subsets of columns (CASSANDRA-13482) * Set test.runners based on cores and memory size (CASSANDRA-13078) * Allow different NUMACTL_ARGS to be passed in (CASSANDRA-13557) - * Allow native function calls in CQLSSTableWriter (CASSANDRA-12606) * Fix secondary index queries on COMPACT tables (CASSANDRA-13627) * Nodetool listsnapshots output is missing a newline, if there are no snapshots (CASSANDRA-13568) + * sstabledump reports incorrect usage for argument order (CASSANDRA-13532) - Merged from 2.2: +Merged from 2.2: * Prevent integer overflow on exabyte filesystems (CASSANDRA-13067) * Fix queries with LIMIT and filtering on clustering columns (CASSANDRA-11223) * Fix potential NPE when resume bootstrap fails (CASSANDRA-13272) http://git-wip-us.apache.org/repos/asf/cassandra/blob/1884dbe2/src/java/org/apache/cassandra/tools/SSTableExport.java -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13532) sstabledump reports incorrect usage for argument order
[ https://issues.apache.org/jira/browse/CASSANDRA-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13532: --- Resolution: Fixed Fix Version/s: 4.0 3.11.1 3.0.15 Status: Resolved (was: Ready to Commit) Thanks all! Committed as {{fab384560311ec1f3043fbf6137093ea129afa68}} > sstabledump reports incorrect usage for argument order > -- > > Key: CASSANDRA-13532 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13532 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Ian Ilsley >Assignee: Varun Barala >Priority: Minor > Labels: lhf > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: sstabledump#printUsage.patch > > > sstabledump usage reports > {{usage: sstabledump }} > However the actual usage is > {{sstabledump }} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13756) StreamingHistogram is not thread safe
xiangzhou xia created CASSANDRA-13756: - Summary: StreamingHistogram is not thread safe Key: CASSANDRA-13756 URL: https://issues.apache.org/jira/browse/CASSANDRA-13756 Project: Cassandra Issue Type: Bug Reporter: xiangzhou xia optimization in CASSANDRA-13038 led to a spool flush every time when we call sum. Since TreeMap is not thread safe, threads will be stuck when multiple threads visit sum() at the same time, and finally 100% cpu is stuck in that function. I think this issue is not limit to sum(), update() and merge() both have the same issue since they all need to update TreeMap. Add lock to bin solved this issue but it also introduced extra overhead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13752) Corrupted SSTables created in 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannu Kröger updated CASSANDRA-13752: - Description: We have discovered issues with corrupted SSTables. {code} ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - Cannot read sstable /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, Index.db, Filter.db]; other IO error, skipping table java.io.EOFException: EOF after 1898 bytes out of 21093 at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:93) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:488) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:396) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.format.SSTableReader$5.run(SSTableReader.java:561) ~[apache-cassandra-3.11.0.jar:3.11.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.0.jar:3.11.0] {code} Files look like this: {code} -rw-r--r--. 1 cassandra cassandra 3899251 Aug 7 08:37 mc-6166-big-CompressionInfo.db -rw-r--r--. 1 cassandra cassandra 16874421686 Aug 7 08:37 mc-6166-big-Data.db -rw-r--r--. 1 cassandra cassandra 10 Aug 7 08:37 mc-6166-big-Digest.crc32 -rw-r--r--. 1 cassandra cassandra 2930904 Aug 7 08:37 mc-6166-big-Filter.db -rw-r--r--. 1 cassandra cassandra 75880 Aug 7 08:37 mc-6166-big-Index.db -rw-r--r--. 1 cassandra cassandra 13762 Aug 7 08:37 mc-6166-big-Statistics.db -rw-r--r--. 1 cassandra cassandra 882008 Aug 7 08:37 mc-6166-big-Summary.db -rw-r--r--. 1 cassandra cassandra 92 Aug 7 08:37 mc-6166-big-TOC.txt {code} was: We have discovered issues with corrupted SSTables. {code} ERROR [SSTableBatchOpen:22] 2017-08-03 20:19:53,195 SSTableReader.java:577 - Cannot read sstable /cassandra/data/mykeyspace/mytable-7a4992800d5611e7b782cb90016f2d17/mc-35556-big=[Data.db, Statistics.db, Summary.db, Digest.crc32, CompressionInfo.db, TOC.txt, Index.db, Filter.db]; other IO error, skipping table java.io.EOFException: EOF after 1898 bytes out of 21093 at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:325) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.StatsMetadata$StatsMetadataSerializer.deserialize(StatsMetadata.java:231) ~[apache-cassandra-3.11.0.jar:3.11.0] at org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:122) ~[apache-cassandra-3.11.0.jar:3.11.0] at
[jira] [Commented] (CASSANDRA-13723) fix exception logging that should be consumed by placeholder to 'getMessage()' for new slf4j version
[ https://issues.apache.org/jira/browse/CASSANDRA-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121385#comment-16121385 ] ZhaoYang commented on CASSANDRA-13723: -- Thank you. > fix exception logging that should be consumed by placeholder to > 'getMessage()' for new slf4j version > > > Key: CASSANDRA-13723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13723 > Project: Cassandra > Issue Type: Bug >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Trivial > Fix For: 4.0 > > Attachments: CASSANDRA-13723.patch > > > The wrong tracing log will fail > {{materialized_views_test.py:TestMaterializedViews.view_tombstone_test}} and > impact clients. > Current log: {{Digest mismatch: {} on 127.0.0.1}} > Expected log: {{Digest mismatch: > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey... on 127.0.0.1}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13717) INSERT statement fails when Tuple type is used as clustering column with default DESC order
[ https://issues.apache.org/jira/browse/CASSANDRA-13717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121472#comment-16121472 ] Stavros Kontopoulos edited comment on CASSANDRA-13717 at 8/10/17 11:14 AM: --- [~jjirsa] I fixed it for trunk (version 4). I could backport it to 3.11 (version reported) as soon as it is verified that this fix is ok. Good to know about the test procedure, thanx a lot! I will check the unit tests. was (Author: skonto): [~jjirsa] I fixed it for trunk (version 4). I could backport it to 3.11 (version reported) as soon as it is verified that this fix is ok. Good to know about the test procedure thanx a lot. I will check the unit tests. > INSERT statement fails when Tuple type is used as clustering column with > default DESC order > --- > > Key: CASSANDRA-13717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13717 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.11 >Reporter: Anastasios Kichidis >Assignee: Stavros Kontopoulos >Priority: Critical > Attachments: example_queries.cql, fix_13717 > > > When a column family is created and a Tuple is used on clustering column with > default clustering order DESC, then the INSERT statement fails. > For example, the following table will make the INSERT statement fail with > error message "Invalid tuple type literal for tdemo of type > frozen>" , although the INSERT statement is correct > (works as expected when the default order is ASC) > {noformat} > create table test_table ( > id int, > tdemo tuple , > primary key (id, tdemo) > ) with clustering order by (tdemo desc); > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13717) INSERT statement fails when Tuple type is used as clustering column with default DESC order
[ https://issues.apache.org/jira/browse/CASSANDRA-13717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121472#comment-16121472 ] Stavros Kontopoulos edited comment on CASSANDRA-13717 at 8/10/17 11:13 AM: --- [~jjirsa] I fixed it for trunk (version 4). I could backport it to 3.11 (version reported) as soon as it is verified that this fix is ok. Good to know about the test procedure thanx a lot. I will check the unit tests. was (Author: skonto): [~jjirsa] I fixed for trunk (version 4). I could backport it to 3.11 (version reported) as soon as it is verified that this fix is ok. Good to know about the test procedure thanx a lot. I will check the unit tests. > INSERT statement fails when Tuple type is used as clustering column with > default DESC order > --- > > Key: CASSANDRA-13717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13717 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.11 >Reporter: Anastasios Kichidis >Assignee: Stavros Kontopoulos >Priority: Critical > Attachments: example_queries.cql, fix_13717 > > > When a column family is created and a Tuple is used on clustering column with > default clustering order DESC, then the INSERT statement fails. > For example, the following table will make the INSERT statement fail with > error message "Invalid tuple type literal for tdemo of type > frozen>" , although the INSERT statement is correct > (works as expected when the default order is ASC) > {noformat} > create table test_table ( > id int, > tdemo tuple , > primary key (id, tdemo) > ) with clustering order by (tdemo desc); > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13717) INSERT statement fails when Tuple type is used as clustering column with default DESC order
[ https://issues.apache.org/jira/browse/CASSANDRA-13717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121472#comment-16121472 ] Stavros Kontopoulos commented on CASSANDRA-13717: - [~jjirsa] I fixed for trunk (version 4). I could backport it to 3.11 (version reported) as soon as it is verified that this fix is ok. Good to know about the test procedure thanx a lot. I will check the unit tests. > INSERT statement fails when Tuple type is used as clustering column with > default DESC order > --- > > Key: CASSANDRA-13717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13717 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.11 >Reporter: Anastasios Kichidis >Assignee: Stavros Kontopoulos >Priority: Critical > Attachments: example_queries.cql, fix_13717 > > > When a column family is created and a Tuple is used on clustering column with > default clustering order DESC, then the INSERT statement fails. > For example, the following table will make the INSERT statement fail with > error message "Invalid tuple type literal for tdemo of type > frozen>" , although the INSERT statement is correct > (works as expected when the default order is ASC) > {noformat} > create table test_table ( > id int, > tdemo tuple , > primary key (id, tdemo) > ) with clustering order by (tdemo desc); > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121506#comment-16121506 ] Stefan Podkowinski commented on CASSANDRA-11748: I'm not sure introducing a hard cap on pending outgoing pull requests and simply dopping anything from there is the way to go here. The good thing about the approach is that it's pretty much stateless, except from the atomic counter. But we should at least take the schema Ids and/or endpoints into account as well. It just doesn't make sense to queue 50 requests for the same schema Id and potentially drop requests for a different schema afterwards. Also as already noted, issuing pulls in parallel is probably not what we want, as this could lead to the described OOM issue, when too many responses get queued and applied at the same time. So I think we don't get around managing some more state, such as schema Ids, endpoints, last request time, delay, .., that we can use to schedule pulls in a more efficient way, by doing one request after another. But we should also not forget to look at the receiver side for incoming pull requests. Joining the cluster with a schema mismatch should not cause a node to answer each of those in parallel. If we keep track of pending incoming schema requests, we could introduce a delay before responding and create the schema mutations just once as payload to be used for all of them. We might have to bump up the MIGRATION_REQUEST timeout a in that case, but otherwise just delaying a few seconds should make a notable difference for nodes joining the cluster and having to answer to many migration requests in a short time frame. > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have >
[jira] [Commented] (CASSANDRA-13717) INSERT statement fails when Tuple type is used as clustering column with default DESC order
[ https://issues.apache.org/jira/browse/CASSANDRA-13717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121536#comment-16121536 ] Stavros Kontopoulos commented on CASSANDRA-13717: - I added a test there in TupleTypeTest, updated the branch. > INSERT statement fails when Tuple type is used as clustering column with > default DESC order > --- > > Key: CASSANDRA-13717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13717 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.11 >Reporter: Anastasios Kichidis >Assignee: Stavros Kontopoulos >Priority: Critical > Attachments: example_queries.cql, fix_13717 > > > When a column family is created and a Tuple is used on clustering column with > default clustering order DESC, then the INSERT statement fails. > For example, the following table will make the INSERT statement fail with > error message "Invalid tuple type literal for tdemo of type > frozen>" , although the INSERT statement is correct > (works as expected when the default order is ASC) > {noformat} > create table test_table ( > id int, > tdemo tuple , > primary key (id, tdemo) > ) with clustering order by (tdemo desc); > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13717) INSERT statement fails when Tuple type is used as clustering column with default DESC order
[ https://issues.apache.org/jira/browse/CASSANDRA-13717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121536#comment-16121536 ] Stavros Kontopoulos edited comment on CASSANDRA-13717 at 8/10/17 12:29 PM: --- [~jjirsa] I added a test there in TupleTypeTest, updated the branch. How can I update the patch? Should I cancel it and add a new one? was (Author: skonto): I added a test there in TupleTypeTest, updated the branch. > INSERT statement fails when Tuple type is used as clustering column with > default DESC order > --- > > Key: CASSANDRA-13717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13717 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.11 >Reporter: Anastasios Kichidis >Assignee: Stavros Kontopoulos >Priority: Critical > Attachments: example_queries.cql, fix_13717 > > > When a column family is created and a Tuple is used on clustering column with > default clustering order DESC, then the INSERT statement fails. > For example, the following table will make the INSERT statement fail with > error message "Invalid tuple type literal for tdemo of type > frozen>" , although the INSERT statement is correct > (works as expected when the default order is ASC) > {noformat} > create table test_table ( > id int, > tdemo tuple , > primary key (id, tdemo) > ) with clustering order by (tdemo desc); > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views
[ https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta resolved CASSANDRA-13162. - Resolution: Fixed > Batchlog replay is throttled during bootstrap, creating conditions for > incorrect query results on materialized views > > > Key: CASSANDRA-13162 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13162 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Wei Deng >Assignee: Andrés de la Peña >Priority: Critical > Labels: bootstrap, materializedviews > > I've tested this in a C* 3.0 cluster with a couple of Materialized Views > defined (one base table and two MVs on that base table). The data volume is > not very high per node (about 80GB of data per node total, and that > particular base table has about 25GB of data uncompressed with one MV taking > 18GB compressed and the other MV taking 3GB), and the cluster is using decent > hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB > gp2 EBS volumes). > This is originally a 9-node cluster. It appears that after adding 3 more > nodes to the DC, the system.batches table accumulated a lot of data on the 3 > new nodes (each having around 20GB under system.batches directory), and in > the subsequent week the batchlog on the 3 new nodes got slowly replayed back > to the rest of the nodes in the cluster. The bottleneck seems to be the > throttling defined in this cassandra.yaml setting: > batchlog_replay_throttle_in_kb, which by default is set to 1MB/s. > Given that it is taking almost a week (and still hasn't finished) for the > batchlog (from MV) to be replayed after the boostrap finishes, it seems only > reasonable to unthrottle (or at least give it a much higher throttle rate) > during the initial bootstrap, and hence I'd consider this a bug for our > current MV implementation. > Also as far as I understand, the bootstrap logic won't wait for the > backlogged batchlog to be fully replayed before changing the new > bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in > this state for a long time, we basically will get wrong answers on the MVs > during that whole duration (until batchlog is fully played to the cluster), > which adds even more criticality to this bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views
[ https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122833#comment-16122833 ] Paulo Motta commented on CASSANDRA-13162: - Closing as this was superseded by CASSANDRA-13614 and CASSANDRA-13065. > Batchlog replay is throttled during bootstrap, creating conditions for > incorrect query results on materialized views > > > Key: CASSANDRA-13162 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13162 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Wei Deng >Assignee: Andrés de la Peña >Priority: Critical > Labels: bootstrap, materializedviews > > I've tested this in a C* 3.0 cluster with a couple of Materialized Views > defined (one base table and two MVs on that base table). The data volume is > not very high per node (about 80GB of data per node total, and that > particular base table has about 25GB of data uncompressed with one MV taking > 18GB compressed and the other MV taking 3GB), and the cluster is using decent > hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB > gp2 EBS volumes). > This is originally a 9-node cluster. It appears that after adding 3 more > nodes to the DC, the system.batches table accumulated a lot of data on the 3 > new nodes (each having around 20GB under system.batches directory), and in > the subsequent week the batchlog on the 3 new nodes got slowly replayed back > to the rest of the nodes in the cluster. The bottleneck seems to be the > throttling defined in this cassandra.yaml setting: > batchlog_replay_throttle_in_kb, which by default is set to 1MB/s. > Given that it is taking almost a week (and still hasn't finished) for the > batchlog (from MV) to be replayed after the boostrap finishes, it seems only > reasonable to unthrottle (or at least give it a much higher throttle rate) > during the initial bootstrap, and hence I'd consider this a bug for our > current MV implementation. > Also as far as I understand, the bootstrap logic won't wait for the > backlogged batchlog to be fully replayed before changing the new > bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in > this state for a long time, we basically will get wrong answers on the MVs > during that whole duration (until batchlog is fully played to the cluster), > which adds even more criticality to this bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org