[jira] [Updated] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths
[ https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth updated CASSANDRA-13065: -- Fix Version/s: 4.0 > Consistent range movements to not require MV updates to go through write > paths > --- > > Key: CASSANDRA-13065 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13065 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth >Priority: Critical > Fix For: 4.0 > > > Booting or decommisioning nodes with MVs is unbearably slow as all streams go > through the regular write paths. This causes read-before-writes for every > mutation and during bootstrap it causes them to be sent to batchlog. > The makes it virtually impossible to boot a new node in an acceptable amount > of time. > Using the regular streaming behaviour for consistent range movements works > much better in this case and does not break the MV local consistency contract. > Already tested on own cluster. > Bootstrap case is super easy to handle, decommission case requires > CASSANDRA-13064 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths
[ https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889675#comment-15889675 ] Benjamin Roth commented on CASSANDRA-13065: --- [~pauloricardomg] This is the follow-up to CASSANDRA-13064. I also optimized behaviour for CDC if no writepath is required due to MVs. This will allow incremental repairs for CFs with CDC without MVs. > Consistent range movements to not require MV updates to go through write > paths > --- > > Key: CASSANDRA-13065 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13065 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth >Priority: Critical > Fix For: 4.0 > > > Booting or decommisioning nodes with MVs is unbearably slow as all streams go > through the regular write paths. This causes read-before-writes for every > mutation and during bootstrap it causes them to be sent to batchlog. > The makes it virtually impossible to boot a new node in an acceptable amount > of time. > Using the regular streaming behaviour for consistent range movements works > much better in this case and does not break the MV local consistency contract. > Already tested on own cluster. > Bootstrap case is super easy to handle, decommission case requires > CASSANDRA-13064 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths
[ https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth updated CASSANDRA-13065: -- Status: Patch Available (was: Open) https://github.com/Jaumo/cassandra/commit/95a215e4f9c46e62580dcd4f638c80d3cf9716db > Consistent range movements to not require MV updates to go through write > paths > --- > > Key: CASSANDRA-13065 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13065 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth >Priority: Critical > > Booting or decommisioning nodes with MVs is unbearably slow as all streams go > through the regular write paths. This causes read-before-writes for every > mutation and during bootstrap it causes them to be sent to batchlog. > The makes it virtually impossible to boot a new node in an acceptable amount > of time. > Using the regular streaming behaviour for consistent range movements works > much better in this case and does not break the MV local consistency contract. > Already tested on own cluster. > Bootstrap case is super easy to handle, decommission case requires > CASSANDRA-13064 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13064) Add stream type or purpose to stream plan / stream
[ https://issues.apache.org/jira/browse/CASSANDRA-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889672#comment-15889672 ] Benjamin Roth commented on CASSANDRA-13064: --- [~pauloricardomg] Would you like to take a look at my patch? For a start I only replaced Stream descriptions by a discreet enum. It's the easiest refactoring to not break compatibility with existing serialization. If you want you can also take a look at the next commit which belongs to CASSANDRA-13065 > Add stream type or purpose to stream plan / stream > -- > > Key: CASSANDRA-13064 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13064 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Fix For: 4.0 > > > It would be very good to know the type or purpose of a certain stream on the > receiver side. It should be both available in a stream request and a stream > task. > Why? > It would be helpful to distinguish the purpose to allow different handling of > streams and requests. Examples: > - In stream request a global flush is done. This is not necessary for all > types of streams. A repair stream(-plan) does not require a flush as this has > been done shortly before in validation compaction and only the sstables that > have been validated also have to be streamed. > - In StreamReceiveTask streams for MVs go through the regular write path this > is painfully slow especially on bootstrap and decomission. Both for bootstrap > and decommission this is not necessary. Sstables can be directly streamed > down in this case. Handling bootstrap is no problem as it relies on a local > state but during decommission, the decom-state is bound to the sender and not > the receiver, so the receiver has to know that it is safe to stream that > sstable directly, not through the write-path. Thats why we have to know the > purpose of the stream. > I'd love to implement this on my own but I am not sure how not to break the > streaming protocol for backwards compat or if it is ok to do so. > Furthermore I'd love to get some feedback on that idea and some proposals > what stream types to distinguish. I could imagine: > - bootstrap > - decommission > - repair > - replace node > - remove node > - range relocation > Comments like this support my idea, knowing the purpose could avoid this. > {quote} > // TODO each call to transferRanges re-flushes, this is > potentially a lot of waste > streamPlan.transferRanges(newEndpoint, preferred, > keyspaceName, ranges); > {quote} > And alternative to passing the purpose of the stream was to pass flags like: > - requiresFlush > - requiresWritePathForMaterializedView > ... > I guess passing the purpose will make the streaming protocol more robust for > future changes and leaves decisions up to the receiver. > But an additional "requiresFlush" would also avoid putting too much logic > into the streaming code. The streaming code should not care about purposes, > the caller or receiver should. So the decision if a stream requires as flush > before stream should be up to the stream requester and the stream request > receiver depending on the purpose of the stream. > I'm excited about your feedback :) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13064) Add stream type or purpose to stream plan / stream
[ https://issues.apache.org/jira/browse/CASSANDRA-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth updated CASSANDRA-13064: -- Fix Version/s: 4.0 Status: Patch Available (was: Open) https://github.com/Jaumo/cassandra/commit/4189c949336f3c7e4ba25da80fdd7da5faa2ea65 > Add stream type or purpose to stream plan / stream > -- > > Key: CASSANDRA-13064 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13064 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Fix For: 4.0 > > > It would be very good to know the type or purpose of a certain stream on the > receiver side. It should be both available in a stream request and a stream > task. > Why? > It would be helpful to distinguish the purpose to allow different handling of > streams and requests. Examples: > - In stream request a global flush is done. This is not necessary for all > types of streams. A repair stream(-plan) does not require a flush as this has > been done shortly before in validation compaction and only the sstables that > have been validated also have to be streamed. > - In StreamReceiveTask streams for MVs go through the regular write path this > is painfully slow especially on bootstrap and decomission. Both for bootstrap > and decommission this is not necessary. Sstables can be directly streamed > down in this case. Handling bootstrap is no problem as it relies on a local > state but during decommission, the decom-state is bound to the sender and not > the receiver, so the receiver has to know that it is safe to stream that > sstable directly, not through the write-path. Thats why we have to know the > purpose of the stream. > I'd love to implement this on my own but I am not sure how not to break the > streaming protocol for backwards compat or if it is ok to do so. > Furthermore I'd love to get some feedback on that idea and some proposals > what stream types to distinguish. I could imagine: > - bootstrap > - decommission > - repair > - replace node > - remove node > - range relocation > Comments like this support my idea, knowing the purpose could avoid this. > {quote} > // TODO each call to transferRanges re-flushes, this is > potentially a lot of waste > streamPlan.transferRanges(newEndpoint, preferred, > keyspaceName, ranges); > {quote} > And alternative to passing the purpose of the stream was to pass flags like: > - requiresFlush > - requiresWritePathForMaterializedView > ... > I guess passing the purpose will make the streaming protocol more robust for > future changes and leaves decisions up to the receiver. > But an additional "requiresFlush" would also avoid putting too much logic > into the streaming code. The streaming code should not care about purposes, > the caller or receiver should. So the decision if a stream requires as flush > before stream should be up to the stream requester and the stream request > receiver depending on the purpose of the stream. > I'm excited about your feedback :) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889664#comment-15889664 ] Marcus Eriksson commented on CASSANDRA-13153: - Makes sense, but lets add this if/when we do that change? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889653#comment-15889653 ] Benjamin Roth commented on CASSANDRA-13241: --- I thought of 2 arrays because a semantic meaning (position vs chunk size) and a single alignment (8, 3, 2 byte) for each could be easier to understand and to maintain. Of course it works either way. With 2 arrays, you could still "pull sections", it's just a single fetch more to get the 8 byte absolute offset. Loop summing vs. "relative-absolute offset": At the end this is always a tradeoff between mem/cpu. I personally am not the one who fights for every single byte in this case. But I also think some CPU cycles more to sum a bunch of ints is still bearable. I guess if I had to decide, I'd give "loop summing" a try. Any different opinions? Do you mean a ChunkCache cache miss? Sorry for that kind of questions. I never came across this part of the code. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13265: --- Resolution: Duplicate Status: Resolved (was: Awaiting Feedback) Closing > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment
[ https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13282: --- Description: Following CASSANDRA-9749 , stricter correctness checks on commitlog replay can incorrectly detect "corrupt segments" and stop commitlog replay (and potentially stop cassandra, depending on the configured policy). In {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the segment was created), we continue on to the next segment. However, it appears that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining in the segment, we'll pass the {{isEOF}} on the while loop but fail to read the {{serializedSize}} int, and fail. (was: Following CASSANDRA-9749 , stricter correctness checks on commitlog replay can incorrectly detect "corrupt segments" and stop commitlog replay (and potentially stop cassandra, depending on the configured policy). In {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the segment was created), we continue on to the next segment. However, it appears that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining in the segment, we'll hit pass the {{isEOF}} on the while loop but fail to read the {{serializedSize}} int, and fail. ) > Commitlog replay may fail if last mutation is within 4 bytes of end of segment > -- > > Key: CASSANDRA-13282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13282 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeff Jirsa >Assignee: Jeff Jirsa > Fix For: 3.0.x, 3.11.x, 4.x > > > Following CASSANDRA-9749 , stricter correctness checks on commitlog replay > can incorrectly detect "corrupt segments" and stop commitlog replay (and > potentially stop cassandra, depending on the configured policy). In > {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int > {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the > segment was created), we continue on to the next segment. However, it appears > that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining > in the segment, we'll pass the {{isEOF}} on the while loop but fail to read > the {{serializedSize}} int, and fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment
[ https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13282: --- Component/s: Core > Commitlog replay may fail if last mutation is within 4 bytes of end of segment > -- > > Key: CASSANDRA-13282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13282 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jeff Jirsa >Assignee: Jeff Jirsa > Fix For: 3.0.x, 3.11.x, 4.x > > > Following CASSANDRA-9749 , stricter correctness checks on commitlog replay > can incorrectly detect "corrupt segments" and stop commitlog replay (and > potentially stop cassandra, depending on the configured policy). In > {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int > {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the > segment was created), we continue on to the next segment. However, it appears > that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining > in the segment, we'll hit pass the {{isEOF}} on the while loop but fail to > read the {{serializedSize}} int, and fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment
Jeff Jirsa created CASSANDRA-13282: -- Summary: Commitlog replay may fail if last mutation is within 4 bytes of end of segment Key: CASSANDRA-13282 URL: https://issues.apache.org/jira/browse/CASSANDRA-13282 Project: Cassandra Issue Type: Bug Reporter: Jeff Jirsa Assignee: Jeff Jirsa Fix For: 3.0.x, 3.11.x, 4.x Following CASSANDRA-9749 , stricter correctness checks on commitlog replay can incorrectly detect "corrupt segments" and stop commitlog replay (and potentially stop cassandra, depending on the configured policy). In {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the segment was created), we continue on to the next segment. However, it appears that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining in the segment, we'll hit pass the {{isEOF}} on the while loop but fail to read the {{serializedSize}} int, and fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888953#comment-15888953 ] Ariel Weisberg edited comment on CASSANDRA-13241 at 3/1/17 2:50 AM: [~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it in a single array so that when you cache miss you pull in the entire section you are looking for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values. It could also be 40 3-byte values that are not relative to each other but just the one absolute offset. Then you don't have do loop summing. was (Author: aweisberg): [~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it in a single array so that when you cache miss and you pull in the entire section you are looking for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values. It could also be 40 3-byte values that are not relative to each other but just the one absolute offset. Then you don't have do a loop summing. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[07/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/56d3f932 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/56d3f932 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/56d3f932 Branch: refs/heads/trunk Commit: 56d3f9324909be7b59ea057fc280faec28532f84 Parents: df28bcf aa66c99 Author: Ariel WeisbergAuthored: Tue Feb 28 19:49:52 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:49:52 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/CHANGES.txt -- diff --cc CHANGES.txt index 5cdc2e4,c27c2b1..a63bd12 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,10 -1,7 +1,11 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/build.xml -- diff --cc build.xml index 2d2d313,d815ede..69b6bdf --- a/build.xml +++ b/build.xml @@@ -159,8 -152,13 +160,9 @@@ + - - - - @@@ -1131,10 -1181,9 +1133,10 @@@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" - target="${target.version}"> + target="${target.version}" + encoding="utf-8">
[03/10] cassandra git commit: Fix "multiple versions of ant detected..." when running ant test
Fix "multiple versions of ant detected..." when running ant test Patch by Michael Kjellman; Reviewed by Ariel Weisberg for CASSANDRA-13232 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/aa66c999 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/aa66c999 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/aa66c999 Branch: refs/heads/cassandra-3.11 Commit: aa66c999ad18e63f9c6b53a2da0750099ec7132c Parents: 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 19:43:06 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:43:06 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e7e0367..c27c2b1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/build.xml -- diff --git a/build.xml b/build.xml index 53a1b27..d815ede 100644 --- a/build.xml +++ b/build.xml @@ -147,10 +147,12 @@ + + @@ -1179,7 +1181,7 @@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" target="${target.version}">
[05/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/56d3f932 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/56d3f932 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/56d3f932 Branch: refs/heads/cassandra-3.11 Commit: 56d3f9324909be7b59ea057fc280faec28532f84 Parents: df28bcf aa66c99 Author: Ariel WeisbergAuthored: Tue Feb 28 19:49:52 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:49:52 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/CHANGES.txt -- diff --cc CHANGES.txt index 5cdc2e4,c27c2b1..a63bd12 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,10 -1,7 +1,11 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/build.xml -- diff --cc build.xml index 2d2d313,d815ede..69b6bdf --- a/build.xml +++ b/build.xml @@@ -159,8 -152,13 +160,9 @@@ + - - - - @@@ -1131,10 -1181,9 +1133,10 @@@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" - target="${target.version}"> + target="${target.version}" + encoding="utf-8">
[01/10] cassandra git commit: Fix "multiple versions of ant detected..." when running ant test
Repository: cassandra Updated Branches: refs/heads/cassandra-2.2 3748bf7c2 -> aa66c999a refs/heads/cassandra-3.0 df28bcfaa -> 56d3f9324 refs/heads/cassandra-3.11 91c8d9157 -> 760d6c33d refs/heads/trunk 895e6ce11 -> 4bb5ada53 Fix "multiple versions of ant detected..." when running ant test Patch by Michael Kjellman; Reviewed by Ariel Weisberg for CASSANDRA-13232 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/aa66c999 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/aa66c999 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/aa66c999 Branch: refs/heads/cassandra-2.2 Commit: aa66c999ad18e63f9c6b53a2da0750099ec7132c Parents: 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 19:43:06 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:43:06 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e7e0367..c27c2b1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/build.xml -- diff --git a/build.xml b/build.xml index 53a1b27..d815ede 100644 --- a/build.xml +++ b/build.xml @@ -147,10 +147,12 @@ + + @@ -1179,7 +1181,7 @@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" target="${target.version}">
[08/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/760d6c33 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/760d6c33 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/760d6c33 Branch: refs/heads/cassandra-3.11 Commit: 760d6c33d5727d94e89afb2e43d551f10a2721b7 Parents: 91c8d91 56d3f93 Author: Ariel WeisbergAuthored: Tue Feb 28 19:54:59 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:54:59 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/760d6c33/CHANGES.txt -- diff --cc CHANGES.txt index 1cced71,a63bd12..497f5c4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,14 -1,4 +1,15 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) +Merged from 3.0: ++ * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) http://git-wip-us.apache.org/repos/asf/cassandra/blob/760d6c33/build.xml -- diff --cc build.xml index 0eef700,69b6bdf..dcaa780 --- a/build.xml +++ b/build.xml @@@ -156,12 -153,14 +156,14 @@@ - - - + + ++ - - - + + ++
[10/10] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4bb5ada5 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4bb5ada5 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4bb5ada5 Branch: refs/heads/trunk Commit: 4bb5ada5358cb4287f445d9d49145e76f2bf3a07 Parents: 895e6ce 760d6c3 Author: Ariel WeisbergAuthored: Tue Feb 28 19:57:57 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:57:57 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4bb5ada5/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4bb5ada5/build.xml -- diff --cc build.xml index 8fde61b,dcaa780..e2bbe94 --- a/build.xml +++ b/build.xml @@@ -1057,8 -1198,8 +1059,8 @@@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" - source="${source.version}" + source="${source.version}" target="${target.version}" encoding="utf-8">
[09/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/760d6c33 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/760d6c33 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/760d6c33 Branch: refs/heads/trunk Commit: 760d6c33d5727d94e89afb2e43d551f10a2721b7 Parents: 91c8d91 56d3f93 Author: Ariel WeisbergAuthored: Tue Feb 28 19:54:59 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:54:59 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/760d6c33/CHANGES.txt -- diff --cc CHANGES.txt index 1cced71,a63bd12..497f5c4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,14 -1,4 +1,15 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) +Merged from 3.0: ++ * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) http://git-wip-us.apache.org/repos/asf/cassandra/blob/760d6c33/build.xml -- diff --cc build.xml index 0eef700,69b6bdf..dcaa780 --- a/build.xml +++ b/build.xml @@@ -156,12 -153,14 +156,14 @@@ - - - + + ++ - - - + + ++
[06/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/56d3f932 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/56d3f932 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/56d3f932 Branch: refs/heads/cassandra-3.0 Commit: 56d3f9324909be7b59ea057fc280faec28532f84 Parents: df28bcf aa66c99 Author: Ariel WeisbergAuthored: Tue Feb 28 19:49:52 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:49:52 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/CHANGES.txt -- diff --cc CHANGES.txt index 5cdc2e4,c27c2b1..a63bd12 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,10 -1,7 +1,11 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) http://git-wip-us.apache.org/repos/asf/cassandra/blob/56d3f932/build.xml -- diff --cc build.xml index 2d2d313,d815ede..69b6bdf --- a/build.xml +++ b/build.xml @@@ -159,8 -152,13 +160,9 @@@ + - - - - @@@ -1131,10 -1181,9 +1133,10 @@@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" - target="${target.version}"> + target="${target.version}" + encoding="utf-8">
[04/10] cassandra git commit: Fix "multiple versions of ant detected..." when running ant test
Fix "multiple versions of ant detected..." when running ant test Patch by Michael Kjellman; Reviewed by Ariel Weisberg for CASSANDRA-13232 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/aa66c999 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/aa66c999 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/aa66c999 Branch: refs/heads/trunk Commit: aa66c999ad18e63f9c6b53a2da0750099ec7132c Parents: 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 19:43:06 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:43:06 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e7e0367..c27c2b1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/build.xml -- diff --git a/build.xml b/build.xml index 53a1b27..d815ede 100644 --- a/build.xml +++ b/build.xml @@ -147,10 +147,12 @@ + + @@ -1179,7 +1181,7 @@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" target="${target.version}">
[02/10] cassandra git commit: Fix "multiple versions of ant detected..." when running ant test
Fix "multiple versions of ant detected..." when running ant test Patch by Michael Kjellman; Reviewed by Ariel Weisberg for CASSANDRA-13232 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/aa66c999 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/aa66c999 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/aa66c999 Branch: refs/heads/cassandra-3.0 Commit: aa66c999ad18e63f9c6b53a2da0750099ec7132c Parents: 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 19:43:06 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 19:43:06 2017 -0500 -- CHANGES.txt | 1 + build.xml | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e7e0367..c27c2b1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) http://git-wip-us.apache.org/repos/asf/cassandra/blob/aa66c999/build.xml -- diff --git a/build.xml b/build.xml index 53a1b27..d815ede 100644 --- a/build.xml +++ b/build.xml @@ -147,10 +147,12 @@ + + @@ -1179,7 +1181,7 @@ debug="true" debuglevel="${debuglevel}" destdir="${test.classes}" - includeantruntime="false" + includeantruntime="true" source="${source.version}" target="${target.version}">
[jira] [Assigned] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa reassigned CASSANDRA-11748: -- Assignee: Matt Byrd (was: Jeff Jirsa) > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Matt Byrd >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-11748: --- Fix Version/s: 4.x 3.11.x 3.0.x > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Jeff Jirsa >Priority: Critical > Fix For: 3.0.x, 3.11.x, 4.x > > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
[ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa reassigned CASSANDRA-11748: -- Assignee: Jeff Jirsa > Schema version mismatch may leads to Casandra OOM at bootstrap during a > rolling upgrade process > --- > > Key: CASSANDRA-11748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11748 > Project: Cassandra > Issue Type: Bug > Environment: Rolling upgrade process from 1.2.19 to 2.0.17. > CentOS 6.6 > Occurred in different C* node of different scale of deployment (2G ~ 5G) >Reporter: Michael Fong >Assignee: Jeff Jirsa >Priority: Critical > > We have observed multiple times when a multi-node C* (v2.0.17) cluster ran > into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. > Here is the simple guideline of our rolling upgrade process > 1. Update schema on a node, and wait until all nodes to be in schema version > agreemnt - via nodetool describeclulster > 2. Restart a Cassandra node > 3. After restart, there is a chance that the the restarted node has different > schema version. > 4. All nodes in cluster start to rapidly exchange schema information, and any > of node could run into OOM. > The following is the system.log that occur in one of our 2-node cluster test > bed > -- > Before rebooting node 2: > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > MigrationManager.java (line 328) Gossiping my schema version > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > After rebooting node 2, > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) > Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b > The node2 keeps submitting the migration task over 100+ times to the other > node. > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node > /192.168.88.33 has restarted, now UP > INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) > Updating topology for /192.168.88.33 > ... > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 > ... ( over 100+ times) > -- > On the otherhand, Node 1 keeps updating its gossip information, followed by > receiving and submitting migrationTask afterwards: > INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP > ... > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from > /192.168.88.34. > …… ( over 100+ times) > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > 127) submitting migration task for /192.168.88.34 > . (over 50+ times) > On the side note, we have over 200+ column families defined in Cassandra > database, which may related to this amount of rpc traffic. > P.S.2 The over requested schema migration task will eventually have > InternalResponseStage performing schema merge operation. Since this operation > requires a compaction for each merge and is much slower to consume. Thus, the > back-pressure of incoming schema migration content objects consumes all of > the heap space and ultimately ends up OOM! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13232) "multiple versions of ant detected in path for junit" printed for every junit test case spawned by "ant test"
[ https://issues.apache.org/jira/browse/CASSANDRA-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13232: --- Status: Ready to Commit (was: Patch Available) > "multiple versions of ant detected in path for junit" printed for every junit > test case spawned by "ant test" > - > > Key: CASSANDRA-13232 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13232 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Michael Kjellman >Assignee: Michael Kjellman > Fix For: 4.x > > Attachments: 673.diff > > > There is a super annoying junit warning logged before every junit test case > when you run "ant test". This is due to the fact that the ant junit task that > we have configured in our build.xml sources the system class path and most > importantly what's in ant.library.dir. > [junit] WARNING: multiple versions of ant detected in path for junit > [junit] > jar:file:/usr/local/ant/lib/ant.jar!/org/apache/tools/ant/Project.class > [junit] and > jar:file:/Users/mkjellman/Documents/mkjellman-cie-cassandra-trunk/build/lib/jars/ant-1.9.6.jar!/org/apache/tools/ant/Project.class > The fix here is to explicitly exclude the ant jar downloaded from the maven > tasks that ends up in ${build.lib} and ${build.dir.lib} so only the ant > libraries from the system class path are used. > I played around with excluding the ant classes/jars from the system class > path in favor of using the ones we copy into ${build.lib} and > ${build.dir.lib} with no success. After reading the documentation it seems > you always want to use the libs that shipped with whatever is in $ANT_HOME so > i believe excluding the jars from the build lib directories is the correct > change anyways. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13090) Coalescing strategy sleeps too much
[ https://issues.apache.org/jira/browse/CASSANDRA-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889083#comment-15889083 ] Ariel Weisberg commented on CASSANDRA-13090: CHANGES.txt fixed in [3748bf7c2a135baab33ccd3b79db5f3fb9132995|https://github.com/apache/cassandra/tree/3748bf7c2a135baab33ccd3b79db5f3fb9132995] > Coalescing strategy sleeps too much > --- > > Key: CASSANDRA-13090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13090 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 2.2.9, 3.0.11, 3.11.0, 4.0 > > Attachments: 0001-Fix-wait-time-coalescing-CASSANDRA-13090-2.patch, > 0001-Fix-wait-time-coalescing-CASSANDRA-13090.patch > > > With the current code maybeSleep is called even if we managed to take > maxItems out of the backlog. In this case we should really avoid sleeping > because it means that backlog is building up. > I'll send a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13090) Coalescing strategy sleeps too much
[ https://issues.apache.org/jira/browse/CASSANDRA-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13090: --- Resolution: Fixed Status: Resolved (was: Ready to Commit) > Coalescing strategy sleeps too much > --- > > Key: CASSANDRA-13090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13090 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.0, 4.0, 3.0.11, 2.2.9 > > Attachments: 0001-Fix-wait-time-coalescing-CASSANDRA-13090-2.patch, > 0001-Fix-wait-time-coalescing-CASSANDRA-13090.patch > > > With the current code maybeSleep is called even if we managed to take > maxItems out of the backlog. In this case we should really avoid sleeping > because it means that backlog is building up. > I'll send a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13090) Coalescing strategy sleeps too much
[ https://issues.apache.org/jira/browse/CASSANDRA-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13090: --- Status: Ready to Commit (was: Patch Available) > Coalescing strategy sleeps too much > --- > > Key: CASSANDRA-13090 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13090 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 2.2.9, 3.0.11, 3.11.0, 4.0 > > Attachments: 0001-Fix-wait-time-coalescing-CASSANDRA-13090-2.patch, > 0001-Fix-wait-time-coalescing-CASSANDRA-13090.patch > > > With the current code maybeSleep is called even if we managed to take > maxItems out of the backlog. In this case we should really avoid sleeping > because it means that backlog is building up. > I'll send a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[02/10] cassandra git commit: Fix CHANGES.txt versions for CASSANDRA-13090.
Fix CHANGES.txt versions for CASSANDRA-13090. Patch by Ariel Weisberg; Reviewed by Jason Brown for CASSANDRA-13090. Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3748bf7c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3748bf7c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3748bf7c Branch: refs/heads/cassandra-3.0 Commit: 3748bf7c2a135baab33ccd3b79db5f3fb9132995 Parents: dffb1a6 Author: Ariel WeisbergAuthored: Tue Feb 28 17:56:24 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:56:24 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3748bf7c/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b565acb..e7e0367 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) @@ -8,7 +9,6 @@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) 2.2.9 - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when creating commitlog segments (CASSANDRA-12539) * Fix speculative retry bugs (CASSANDRA-13009)
[10/10] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/895e6ce1 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/895e6ce1 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/895e6ce1 Branch: refs/heads/trunk Commit: 895e6ce11c4467011df106631d0f818fb2298d73 Parents: d24f4c6 91c8d91 Author: Ariel WeisbergAuthored: Tue Feb 28 18:01:31 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 18:02:48 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/895e6ce1/CHANGES.txt --
[09/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/91c8d915 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/91c8d915 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/91c8d915 Branch: refs/heads/trunk Commit: 91c8d915742a8dfcd1ef28b1e326ec201ee68c9c Parents: 942b83c df28bcf Author: Ariel WeisbergAuthored: Tue Feb 28 18:00:07 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 18:01:06 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/91c8d915/CHANGES.txt -- diff --cc CHANGES.txt index 1810cae,5cdc2e4..1cced71 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,13 -1,4 +1,14 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) +Merged from 3.0: ++ * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) @@@ -227,21 -115,6 +228,20 @@@ Merged from 3.0 * Correct log message for statistics of offheap memtable flush (CASSANDRA-12776) * Explicitly set locale for string validation (CASSANDRA-12541,CASSANDRA-12542,CASSANDRA-12543,CASSANDRA-12545) Merged from 2.2: - * Coalescing strategy sleeps too much (CASSANDRA-13090) + * Fix speculative retry bugs (CASSANDRA-13009) + * Fix handling of nulls and unsets in IN conditions (CASSANDRA-12981) + * Fix race causing infinite loop if Thrift server is stopped before it starts listening (CASSANDRA-12856) + * CompactionTasks now correctly drops sstables out of compaction when not enough disk space is available (CASSANDRA-12979) + * Remove support for non-JavaScript UDFs (CASSANDRA-12883) + * Fix DynamicEndpointSnitch noop in multi-datacenter situations (CASSANDRA-13074) + * cqlsh copy-from: encode column names to avoid primary key parsing errors (CASSANDRA-12909) + * Temporarily fix bug that creates commit log when running offline tools (CASSANDRA-8616) + * Reduce granuality of OpOrder.Group during index build (CASSANDRA-12796) + * Test bind parameters and unset parameters in InsertUpdateIfConditionTest (CASSANDRA-12980) + * Use saved tokens when setting local tokens on StorageService.joinRing (CASSANDRA-12935) + * cqlsh: fix DESC TYPES errors (CASSANDRA-12914) + * Fix leak on skipped SSTables in sstableupgrade (CASSANDRA-12899) + * Avoid blocking gossip during pending range calculation (CASSANDRA-12281) * Fix purgeability of tombstones with max timestamp (CASSANDRA-12792) * Fail repair if participant dies during sync or anticompaction (CASSANDRA-12901) * cqlsh COPY: unprotected pk values before converting them if not using prepared statements (CASSANDRA-12863)
[08/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/91c8d915 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/91c8d915 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/91c8d915 Branch: refs/heads/cassandra-3.11 Commit: 91c8d915742a8dfcd1ef28b1e326ec201ee68c9c Parents: 942b83c df28bcf Author: Ariel WeisbergAuthored: Tue Feb 28 18:00:07 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 18:01:06 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/91c8d915/CHANGES.txt -- diff --cc CHANGES.txt index 1810cae,5cdc2e4..1cced71 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,13 -1,4 +1,14 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) +Merged from 3.0: ++ * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) @@@ -227,21 -115,6 +228,20 @@@ Merged from 3.0 * Correct log message for statistics of offheap memtable flush (CASSANDRA-12776) * Explicitly set locale for string validation (CASSANDRA-12541,CASSANDRA-12542,CASSANDRA-12543,CASSANDRA-12545) Merged from 2.2: - * Coalescing strategy sleeps too much (CASSANDRA-13090) + * Fix speculative retry bugs (CASSANDRA-13009) + * Fix handling of nulls and unsets in IN conditions (CASSANDRA-12981) + * Fix race causing infinite loop if Thrift server is stopped before it starts listening (CASSANDRA-12856) + * CompactionTasks now correctly drops sstables out of compaction when not enough disk space is available (CASSANDRA-12979) + * Remove support for non-JavaScript UDFs (CASSANDRA-12883) + * Fix DynamicEndpointSnitch noop in multi-datacenter situations (CASSANDRA-13074) + * cqlsh copy-from: encode column names to avoid primary key parsing errors (CASSANDRA-12909) + * Temporarily fix bug that creates commit log when running offline tools (CASSANDRA-8616) + * Reduce granuality of OpOrder.Group during index build (CASSANDRA-12796) + * Test bind parameters and unset parameters in InsertUpdateIfConditionTest (CASSANDRA-12980) + * Use saved tokens when setting local tokens on StorageService.joinRing (CASSANDRA-12935) + * cqlsh: fix DESC TYPES errors (CASSANDRA-12914) + * Fix leak on skipped SSTables in sstableupgrade (CASSANDRA-12899) + * Avoid blocking gossip during pending range calculation (CASSANDRA-12281) * Fix purgeability of tombstones with max timestamp (CASSANDRA-12792) * Fail repair if participant dies during sync or anticompaction (CASSANDRA-12901) * cqlsh COPY: unprotected pk values before converting them if not using prepared statements (CASSANDRA-12863)
[05/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/df28bcfa Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/df28bcfa Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/df28bcfa Branch: refs/heads/cassandra-3.11 Commit: df28bcfaaa3df95bcd06a399e487b4174ff96462 Parents: a5ce963 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 17:58:33 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:59:37 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/df28bcfa/CHANGES.txt -- diff --cc CHANGES.txt index 1100bfd,e7e0367..5cdc2e4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,6 +1,10 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) @@@ -11,48 -8,7 +12,47 @@@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) -2.2.9 +3.0.11 + * Use keyspace replication settings on system.size_estimates table (CASSANDRA-9639) + * Add vm.max_map_count StartupCheck (CASSANDRA-13008) + * Hint related logging should include the IP address of the destination in addition to + host ID (CASSANDRA-13205) + * Reloading logback.xml does not work (CASSANDRA-13173) + * Lightweight transactions temporarily fail after upgrade from 2.1 to 3.0 (CASSANDRA-13109) + * Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9 (CASSANDRA-13125) + * Fix UPDATE queries with empty IN restrictions (CASSANDRA-13152) + * Abort or retry on failed hints delivery (CASSANDRA-13124) + * Fix handling of partition with partition-level deletion plus + live rows in sstabledump (CASSANDRA-13177) + * Provide user workaround when system_schema.columns does not contain entries + for a table that's in system_schema.tables (CASSANDRA-13180) + * Dump threads when unit tests time out (CASSANDRA-13117) + * Better error when modifying function permissions without explicit keyspace (CASSANDRA-12925) + * Indexer is not correctly invoked when building indexes over sstables (CASSANDRA-13075) + * Read repair is not blocking repair to finish in foreground repair (CASSANDRA-13115) + * Stress daemon help is incorrect (CASSANDRA-12563) + * Remove ALTER TYPE support (CASSANDRA-12443) + * Fix assertion for certain legacy range tombstone pattern (CASSANDRA-12203) + * Set javac encoding to utf-8 (CASSANDRA-11077) + * Replace empty strings with null values if they cannot be converted (CASSANDRA-12794) + * Fixed flacky SSTableRewriterTest: check file counts before calling validateCFS (CASSANDRA-12348) + * Fix deserialization of 2.x DeletedCells (CASSANDRA-12620) + * Add parent repair session id to anticompaction log message (CASSANDRA-12186) + * Improve contention handling on failure to acquire MV lock for streaming and hints (CASSANDRA-12905) + * Fix DELETE and UPDATE queries with empty IN restrictions (CASSANDRA-12829) + * Mark MVs as built after successful bootstrap (CASSANDRA-12984) + * Estimated TS drop-time histogram updated with Cell.NO_DELETION_TIME (CASSANDRA-13040) + * Nodetool compactionstats fails with NullPointerException (CASSANDRA-13021) + * Thread local pools never cleaned up (CASSANDRA-13033) + * Set RPC_READY to false when draining or if a node is marked as shutdown (CASSANDRA-12781) + * Make sure sstables only get committed when it's safe to discard commit log records (CASSANDRA-12956) + * Reject default_time_to_live option when creating or altering MVs (CASSANDRA-12868) + * Nodetool should use a more sane max heap size (CASSANDRA-12739) + * LocalToken ensures token values are cloned on heap (CASSANDRA-12651) + * AnticompactionRequestSerializer serializedSize is incorrect (CASSANDRA-12934) + * Prevent reloading of logback.xml from UDF sandbox (CASSANDRA-12535) + * Reenable HeapPool (CASSANDRA-12900) +Merged from 2.2: - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when
[01/10] cassandra git commit: Fix CHANGES.txt versions for CASSANDRA-13090.
Repository: cassandra Updated Branches: refs/heads/cassandra-2.2 dffb1a6da -> 3748bf7c2 refs/heads/cassandra-3.0 a5ce96311 -> df28bcfaa refs/heads/cassandra-3.11 942b83ca9 -> 91c8d9157 refs/heads/trunk d24f4c68d -> 895e6ce11 Fix CHANGES.txt versions for CASSANDRA-13090. Patch by Ariel Weisberg; Reviewed by Jason Brown for CASSANDRA-13090. Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3748bf7c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3748bf7c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3748bf7c Branch: refs/heads/cassandra-2.2 Commit: 3748bf7c2a135baab33ccd3b79db5f3fb9132995 Parents: dffb1a6 Author: Ariel WeisbergAuthored: Tue Feb 28 17:56:24 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:56:24 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3748bf7c/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b565acb..e7e0367 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) @@ -8,7 +9,6 @@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) 2.2.9 - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when creating commitlog segments (CASSANDRA-12539) * Fix speculative retry bugs (CASSANDRA-13009)
[04/10] cassandra git commit: Fix CHANGES.txt versions for CASSANDRA-13090.
Fix CHANGES.txt versions for CASSANDRA-13090. Patch by Ariel Weisberg; Reviewed by Jason Brown for CASSANDRA-13090. Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3748bf7c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3748bf7c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3748bf7c Branch: refs/heads/trunk Commit: 3748bf7c2a135baab33ccd3b79db5f3fb9132995 Parents: dffb1a6 Author: Ariel WeisbergAuthored: Tue Feb 28 17:56:24 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:56:24 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3748bf7c/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b565acb..e7e0367 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) @@ -8,7 +9,6 @@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) 2.2.9 - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when creating commitlog segments (CASSANDRA-12539) * Fix speculative retry bugs (CASSANDRA-13009)
[07/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/df28bcfa Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/df28bcfa Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/df28bcfa Branch: refs/heads/trunk Commit: df28bcfaaa3df95bcd06a399e487b4174ff96462 Parents: a5ce963 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 17:58:33 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:59:37 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/df28bcfa/CHANGES.txt -- diff --cc CHANGES.txt index 1100bfd,e7e0367..5cdc2e4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,6 +1,10 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) @@@ -11,48 -8,7 +12,47 @@@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) -2.2.9 +3.0.11 + * Use keyspace replication settings on system.size_estimates table (CASSANDRA-9639) + * Add vm.max_map_count StartupCheck (CASSANDRA-13008) + * Hint related logging should include the IP address of the destination in addition to + host ID (CASSANDRA-13205) + * Reloading logback.xml does not work (CASSANDRA-13173) + * Lightweight transactions temporarily fail after upgrade from 2.1 to 3.0 (CASSANDRA-13109) + * Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9 (CASSANDRA-13125) + * Fix UPDATE queries with empty IN restrictions (CASSANDRA-13152) + * Abort or retry on failed hints delivery (CASSANDRA-13124) + * Fix handling of partition with partition-level deletion plus + live rows in sstabledump (CASSANDRA-13177) + * Provide user workaround when system_schema.columns does not contain entries + for a table that's in system_schema.tables (CASSANDRA-13180) + * Dump threads when unit tests time out (CASSANDRA-13117) + * Better error when modifying function permissions without explicit keyspace (CASSANDRA-12925) + * Indexer is not correctly invoked when building indexes over sstables (CASSANDRA-13075) + * Read repair is not blocking repair to finish in foreground repair (CASSANDRA-13115) + * Stress daemon help is incorrect (CASSANDRA-12563) + * Remove ALTER TYPE support (CASSANDRA-12443) + * Fix assertion for certain legacy range tombstone pattern (CASSANDRA-12203) + * Set javac encoding to utf-8 (CASSANDRA-11077) + * Replace empty strings with null values if they cannot be converted (CASSANDRA-12794) + * Fixed flacky SSTableRewriterTest: check file counts before calling validateCFS (CASSANDRA-12348) + * Fix deserialization of 2.x DeletedCells (CASSANDRA-12620) + * Add parent repair session id to anticompaction log message (CASSANDRA-12186) + * Improve contention handling on failure to acquire MV lock for streaming and hints (CASSANDRA-12905) + * Fix DELETE and UPDATE queries with empty IN restrictions (CASSANDRA-12829) + * Mark MVs as built after successful bootstrap (CASSANDRA-12984) + * Estimated TS drop-time histogram updated with Cell.NO_DELETION_TIME (CASSANDRA-13040) + * Nodetool compactionstats fails with NullPointerException (CASSANDRA-13021) + * Thread local pools never cleaned up (CASSANDRA-13033) + * Set RPC_READY to false when draining or if a node is marked as shutdown (CASSANDRA-12781) + * Make sure sstables only get committed when it's safe to discard commit log records (CASSANDRA-12956) + * Reject default_time_to_live option when creating or altering MVs (CASSANDRA-12868) + * Nodetool should use a more sane max heap size (CASSANDRA-12739) + * LocalToken ensures token values are cloned on heap (CASSANDRA-12651) + * AnticompactionRequestSerializer serializedSize is incorrect (CASSANDRA-12934) + * Prevent reloading of logback.xml from UDF sandbox (CASSANDRA-12535) + * Reenable HeapPool (CASSANDRA-12900) +Merged from 2.2: - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when creating
[06/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0
Merge branch 'cassandra-2.2' into cassandra-3.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/df28bcfa Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/df28bcfa Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/df28bcfa Branch: refs/heads/cassandra-3.0 Commit: df28bcfaaa3df95bcd06a399e487b4174ff96462 Parents: a5ce963 3748bf7 Author: Ariel WeisbergAuthored: Tue Feb 28 17:58:33 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:59:37 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/df28bcfa/CHANGES.txt -- diff --cc CHANGES.txt index 1100bfd,e7e0367..5cdc2e4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,6 +1,10 @@@ -2.2.10 +3.0.12 + * Faster StreamingHistogram (CASSANDRA-13038) + * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) + * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) + * Fix cqlsh COPY for dates before 1900 (CASSANDRA-13185) +Merged from 2.2 + * Coalescing strategy sleeps too much (CASSANDRA-13090) - * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) * Fix ColumnCounter::countAll behaviour for reverse queries (CASSANDRA-13222) @@@ -11,48 -8,7 +12,47 @@@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) -2.2.9 +3.0.11 + * Use keyspace replication settings on system.size_estimates table (CASSANDRA-9639) + * Add vm.max_map_count StartupCheck (CASSANDRA-13008) + * Hint related logging should include the IP address of the destination in addition to + host ID (CASSANDRA-13205) + * Reloading logback.xml does not work (CASSANDRA-13173) + * Lightweight transactions temporarily fail after upgrade from 2.1 to 3.0 (CASSANDRA-13109) + * Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9 (CASSANDRA-13125) + * Fix UPDATE queries with empty IN restrictions (CASSANDRA-13152) + * Abort or retry on failed hints delivery (CASSANDRA-13124) + * Fix handling of partition with partition-level deletion plus + live rows in sstabledump (CASSANDRA-13177) + * Provide user workaround when system_schema.columns does not contain entries + for a table that's in system_schema.tables (CASSANDRA-13180) + * Dump threads when unit tests time out (CASSANDRA-13117) + * Better error when modifying function permissions without explicit keyspace (CASSANDRA-12925) + * Indexer is not correctly invoked when building indexes over sstables (CASSANDRA-13075) + * Read repair is not blocking repair to finish in foreground repair (CASSANDRA-13115) + * Stress daemon help is incorrect (CASSANDRA-12563) + * Remove ALTER TYPE support (CASSANDRA-12443) + * Fix assertion for certain legacy range tombstone pattern (CASSANDRA-12203) + * Set javac encoding to utf-8 (CASSANDRA-11077) + * Replace empty strings with null values if they cannot be converted (CASSANDRA-12794) + * Fixed flacky SSTableRewriterTest: check file counts before calling validateCFS (CASSANDRA-12348) + * Fix deserialization of 2.x DeletedCells (CASSANDRA-12620) + * Add parent repair session id to anticompaction log message (CASSANDRA-12186) + * Improve contention handling on failure to acquire MV lock for streaming and hints (CASSANDRA-12905) + * Fix DELETE and UPDATE queries with empty IN restrictions (CASSANDRA-12829) + * Mark MVs as built after successful bootstrap (CASSANDRA-12984) + * Estimated TS drop-time histogram updated with Cell.NO_DELETION_TIME (CASSANDRA-13040) + * Nodetool compactionstats fails with NullPointerException (CASSANDRA-13021) + * Thread local pools never cleaned up (CASSANDRA-13033) + * Set RPC_READY to false when draining or if a node is marked as shutdown (CASSANDRA-12781) + * Make sure sstables only get committed when it's safe to discard commit log records (CASSANDRA-12956) + * Reject default_time_to_live option when creating or altering MVs (CASSANDRA-12868) + * Nodetool should use a more sane max heap size (CASSANDRA-12739) + * LocalToken ensures token values are cloned on heap (CASSANDRA-12651) + * AnticompactionRequestSerializer serializedSize is incorrect (CASSANDRA-12934) + * Prevent reloading of logback.xml from UDF sandbox (CASSANDRA-12535) + * Reenable HeapPool (CASSANDRA-12900) +Merged from 2.2: - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when
[03/10] cassandra git commit: Fix CHANGES.txt versions for CASSANDRA-13090.
Fix CHANGES.txt versions for CASSANDRA-13090. Patch by Ariel Weisberg; Reviewed by Jason Brown for CASSANDRA-13090. Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3748bf7c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3748bf7c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3748bf7c Branch: refs/heads/cassandra-3.11 Commit: 3748bf7c2a135baab33ccd3b79db5f3fb9132995 Parents: dffb1a6 Author: Ariel WeisbergAuthored: Tue Feb 28 17:56:24 2017 -0500 Committer: Ariel Weisberg Committed: Tue Feb 28 17:56:24 2017 -0500 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/3748bf7c/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b565acb..e7e0367 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.2.10 + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Make sure compaction stats are updated when compaction is interrupted (Backport from 3.0, CASSANDRA-12100) * Fix flaky LongLeveledCompactionStrategyTest (CASSANDRA-12202) * Fix failing COPY TO STDOUT (CASSANDRA-12497) @@ -8,7 +9,6 @@ Merged from 2.1: * Log stacktrace of uncaught exceptions (CASSANDRA-13108) 2.2.9 - * Coalescing strategy sleeps too much (CASSANDRA-13090) * Fix negative mean latency metric (CASSANDRA-12876) * Use only one file pointer when creating commitlog segments (CASSANDRA-12539) * Fix speculative retry bugs (CASSANDRA-13009)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888953#comment-15888953 ] Ariel Weisberg commented on CASSANDRA-13241: [~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it in a single array so that when you cache miss and you pull in the entire section you are looking for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values. It could also be 40 3-byte values that are not relative to each other but just the one absolute offset. Then you don't have do a loop summing. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13179) Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Vernon updated CASSANDRA-13179: Labels: configuration (was: ) Fix Version/s: 3.0.x Status: Patch Available (was: Open) > Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml > -- > > Key: CASSANDRA-13179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13179 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Brad Vernon > Labels: configuration > Fix For: 3.0.x > > Attachments: cassandra-3.0-offheap_buffer_docs.patch > > > With [CASSANDRA-11039] disallowing offheap_buffers as option for > memtable_allocation_type the cassandra.yaml included documentation should be > updated to match. > Patch included. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13179) Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Vernon updated CASSANDRA-13179: Attachment: cassandra-3.0-offheap_buffer_docs.patch Adding updated patch with new wording recommended by [~snazy] > Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml > -- > > Key: CASSANDRA-13179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13179 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Brad Vernon > Attachments: cassandra-3.0-offheap_buffer_docs.patch > > > With [CASSANDRA-11039] disallowing offheap_buffers as option for > memtable_allocation_type the cassandra.yaml included documentation should be > updated to match. > Patch included. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13179) Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Vernon updated CASSANDRA-13179: Attachment: (was: 3.0-cass_yaml_offheap_doc.patch) > Remove offheap_buffer as option for memtable_allocation_type in cassandra.yaml > -- > > Key: CASSANDRA-13179 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13179 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Brad Vernon > > With [CASSANDRA-11039] disallowing offheap_buffers as option for > memtable_allocation_type the cassandra.yaml included documentation should be > updated to match. > Patch included. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888911#comment-15888911 ] Benjamin Roth commented on CASSANDRA-13241: --- How about this: You create 2 chunk lookup tables. One with absolute pointers (long, 8 byte). A second one with relative pointers or chunk-sizes - 2 bytes are enough for up to 64kb chunks. You store an absolute pointer for every $x chunks (1000 in this example). So you can get the absolute offset looking up the absolute position with $idx = ($pos - ($pos % 100)) / $x Then you iterate through the size lookup from ($pos - ($pos % 100)) to $pos - 1. A fallback can be provided for chunks >64kb. Either relative pointers are completely avoided or are increased to 3 bytes. There you go. Payload of 1 TB = 1024 * 1024 * 1024kb CS 64 (NOW): chunks = 1024 * 1024 * 1024kb / 64kb = 16777216 (16M) compression = 1.99 compressed_size = 1024 * 1024 * 1024kb / 1.99 = 539568756kb kernel_pages = 134892189 absolute_pointer_size = 8 * chunks = 134217728 (128MB) kernel_page_size = 134892189 * 8 (1029 MB) total_size = 1157MB CS 4 with relative positions chunks = 1024 * 1024 * 1024kb / 4kb = 268435456 (256M) compression = 1.75 compressed_size = 1024 * 1024 * 1024kb / 1.75 = 613566757kb kernel_pages = 153391689 absolute_pointer_size = 8 * chunks / 1000 = 2147484 (2 MB) relative_pointer_size = 2 * chunks = 536870912 (512 MB) kernel_page_size = 153391689 * 8 = 1227133512 (1170MB) total_size = 1684MB increase = 45% => Reduces memory overhead when reducing chunk size from 64kb to 4kb from the initially mentioned 800% to 45% when you also take kernel structs into account which are also of a relevant size - even more than the initially discussed "128M" for 64kb chunks Pro: A lot less memory required Con: Some CPU overhead. But is this really relevant compared to decompressing 4kb or even 64kb? P.S.: Kernel memory calculation is based on the 8 bytes [~aweisberg] has researched. Compression ratios are taken from the percona blog. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888600#comment-15888600 ] Ariel Weisberg commented on CASSANDRA-13241: Based on http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/ and http://lxr.linux.no/linux+v2.6.28.1/arch/ia64/include/asm/page.h#L174 it seems like the kernel introduces it's own 8-bytes of overhead per 4k page. I think it's worth doing something more efficient with the offsets and then reducing the chunk size to at least memory utilization parity with what we have today. We should at least push it to the free lunch point. I'm still researching integer compression options to see how cheap we can make offset storage. The algorithms are out there it's the implementations that are a chore. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13053) GRANT/REVOKE on table without keyspace performs permissions check incorrectly
[ https://issues.apache.org/jira/browse/CASSANDRA-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888599#comment-15888599 ] Aleksey Yeschenko commented on CASSANDRA-13053: --- ||branch||testall||dtest|| |[13053-2.2|https://github.com/iamaleksey/cassandra/tree/13053-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-2.2-dtest]| |[13053-3.0|https://github.com/iamaleksey/cassandra/tree/13053-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-3.0-dtest]| |[13053-3.11|https://github.com/iamaleksey/cassandra/tree/13053-3.11]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-3.11-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-3.11-dtest]| |[13053-4.0|https://github.com/iamaleksey/cassandra/tree/13053-4.0]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-4.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-13053-4.0-dtest]| Simple patch attached (2.2 merges cleanly upwards). Will kick off a basic CI run and write up a quick unit test in the meantime. > GRANT/REVOKE on table without keyspace performs permissions check incorrectly > - > > Key: CASSANDRA-13053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13053 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Sam Tunnicliffe >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.11.x > > > When a {{GRANT}} or {{REVOKE}} statement is executed on a table without > specifying the keyspace, we attempt to use the client session's keyspace to > qualify the resource. > This is done when validating the statement, which occurs after checking that > the user executing the statement has sufficient permissions. This means that > the permissions checking uses an incorrect resource, namely a table with a > null keyspace. If that user is a superuser, then no error is encountered as > superuser privs implicitly grants *all* permissions. If the user is not a > superuser, then the {{GRANT}} or {{REVOKE}} fails with an ugly error, > regardless of which keyspace the client session is bound to: > {code} > Unauthorized: Error from server: code=2100 [Unauthorized] message="User admin > has no AUTHORIZE permission on or any of its parents" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13053) GRANT/REVOKE on table without keyspace performs permissions check incorrectly
[ https://issues.apache.org/jira/browse/CASSANDRA-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13053: -- Reviewer: Sam Tunnicliffe Reproduced In: 3.10, 3.0.11, 2.2.9 (was: 2.2.9, 3.0.11, 3.10) Status: Patch Available (was: In Progress) > GRANT/REVOKE on table without keyspace performs permissions check incorrectly > - > > Key: CASSANDRA-13053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13053 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Sam Tunnicliffe >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.11.x > > > When a {{GRANT}} or {{REVOKE}} statement is executed on a table without > specifying the keyspace, we attempt to use the client session's keyspace to > qualify the resource. > This is done when validating the statement, which occurs after checking that > the user executing the statement has sufficient permissions. This means that > the permissions checking uses an incorrect resource, namely a table with a > null keyspace. If that user is a superuser, then no error is encountered as > superuser privs implicitly grants *all* permissions. If the user is not a > superuser, then the {{GRANT}} or {{REVOKE}} fails with an ugly error, > regardless of which keyspace the client session is bound to: > {code} > Unauthorized: Error from server: code=2100 [Unauthorized] message="User admin > has no AUTHORIZE permission on or any of its parents" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888551#comment-15888551 ] Ariel Weisberg edited comment on CASSANDRA-13241 at 2/28/17 6:14 PM: - -So umm... struct page in the kernel is like more than 64-bytes. It's awful.- http://lxr.free-electrons.com/source/include/linux/mm_types.h#L45 -My understanding is that when you map a file it's going to create one of these entries for every 4k page. You can't use huge pages when mapping files.- -Should we even be concerned about the overhead of these offsets?- Turns out that this structure doesn't define the mapping for pages of a mapped file. Still looking into it. There might not actually be one per page. was (Author: aweisberg): So umm... struct page in the kernel is like more than 64-bytes. It's awful. http://lxr.free-electrons.com/source/include/linux/mm_types.h#L45 My understanding is that when you map a file it's going to create one of these entries for every 4k page. You can't use huge pages when mapping files. Should we even be concerned about the overhead of these offsets? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13053) GRANT/REVOKE on table without keyspace performs permissions check incorrectly
[ https://issues.apache.org/jira/browse/CASSANDRA-13053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13053: -- Priority: Minor (was: Major) > GRANT/REVOKE on table without keyspace performs permissions check incorrectly > - > > Key: CASSANDRA-13053 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13053 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: Sam Tunnicliffe >Assignee: Aleksey Yeschenko >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.11.x > > > When a {{GRANT}} or {{REVOKE}} statement is executed on a table without > specifying the keyspace, we attempt to use the client session's keyspace to > qualify the resource. > This is done when validating the statement, which occurs after checking that > the user executing the statement has sufficient permissions. This means that > the permissions checking uses an incorrect resource, namely a table with a > null keyspace. If that user is a superuser, then no error is encountered as > superuser privs implicitly grants *all* permissions. If the user is not a > superuser, then the {{GRANT}} or {{REVOKE}} fails with an ugly error, > regardless of which keyspace the client session is bound to: > {code} > Unauthorized: Error from server: code=2100 [Unauthorized] message="User admin > has no AUTHORIZE permission on or any of its parents" > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888551#comment-15888551 ] Ariel Weisberg commented on CASSANDRA-13241: So umm... struct page in the kernel is like more than 64-bytes. It's awful. http://lxr.free-electrons.com/source/include/linux/mm_types.h#L45 My understanding is that when you map a file it's going to create one of these entries for every 4k page. You can't use huge pages when mapping files. Should we even be concerned about the overhead of these offsets? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888552#comment-15888552 ] Stefan Podkowinski commented on CASSANDRA-13153: Yes, that could be an option as well. But as we already discussed the possibility of actually doing anti-compaction on repaired sstables for the sake of tracking repairedAt more accurately, I was hoping someone someday would be able to make use of the method as is by providing a reasonable repairedAt value for both anti-compaction outputs. But I'm open to add an assert instead, if you think I'm a bit to optimistic here. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CASSANDRA-13064) Add stream type or purpose to stream plan / stream
[ https://issues.apache.org/jira/browse/CASSANDRA-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth reassigned CASSANDRA-13064: - Assignee: Benjamin Roth > Add stream type or purpose to stream plan / stream > -- > > Key: CASSANDRA-13064 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13064 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth > > It would be very good to know the type or purpose of a certain stream on the > receiver side. It should be both available in a stream request and a stream > task. > Why? > It would be helpful to distinguish the purpose to allow different handling of > streams and requests. Examples: > - In stream request a global flush is done. This is not necessary for all > types of streams. A repair stream(-plan) does not require a flush as this has > been done shortly before in validation compaction and only the sstables that > have been validated also have to be streamed. > - In StreamReceiveTask streams for MVs go through the regular write path this > is painfully slow especially on bootstrap and decomission. Both for bootstrap > and decommission this is not necessary. Sstables can be directly streamed > down in this case. Handling bootstrap is no problem as it relies on a local > state but during decommission, the decom-state is bound to the sender and not > the receiver, so the receiver has to know that it is safe to stream that > sstable directly, not through the write-path. Thats why we have to know the > purpose of the stream. > I'd love to implement this on my own but I am not sure how not to break the > streaming protocol for backwards compat or if it is ok to do so. > Furthermore I'd love to get some feedback on that idea and some proposals > what stream types to distinguish. I could imagine: > - bootstrap > - decommission > - repair > - replace node > - remove node > - range relocation > Comments like this support my idea, knowing the purpose could avoid this. > {quote} > // TODO each call to transferRanges re-flushes, this is > potentially a lot of waste > streamPlan.transferRanges(newEndpoint, preferred, > keyspaceName, ranges); > {quote} > And alternative to passing the purpose of the stream was to pass flags like: > - requiresFlush > - requiresWritePathForMaterializedView > ... > I guess passing the purpose will make the streaming protocol more robust for > future changes and leaves decisions up to the receiver. > But an additional "requiresFlush" would also avoid putting too much logic > into the streaming code. The streaming code should not care about purposes, > the caller or receiver should. So the decision if a stream requires as flush > before stream should be up to the stream requester and the stream request > receiver depending on the purpose of the stream. > I'm excited about your feedback :) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888465#comment-15888465 ] Ariel Weisberg commented on CASSANDRA-13265: This bug was noticed recently and fixed as part of CASSANDRA-13159. > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888439#comment-15888439 ] Christian Esken commented on CASSANDRA-13265: - Here is one possibly very important observation. It looks like Coalescing is doing an infinite loop while doing maybeSleep(). I checked 10 Thread dumps, and in each of them the Thread was at the same location. Is it possible that averageGap is 0? This would lead to infinite recursion. {code} private static boolean maybeSleep(int messages, long averageGap, long maxCoalesceWindow, Parker parker) { // only sleep if we can expect to double the number of messages we're sending in the time interval long sleep = messages * averageGap; // TODO can averageGap be 0 ? if (sleep > maxCoalesceWindow) return false; // assume we receive as many messages as we expect; apply the same logic to the future batch: // expect twice as many messages to consider sleeping for "another" interval; this basically translates // to doubling our sleep period until we exceed our max sleep window while (sleep * 2 < maxCoalesceWindow) sleep *= 2; // CoalescingStrategies:106 parker.park(sleep); return true; } {code} If sum is bigger than MEASURED_INTERVAL, then averageGap() returns 0. I am aware that this is highly unlikely, but I cannot explain the likely hanging in maybeSleep() line 106. {code} private long averageGap() { if (sum == 0) return Integer.MAX_VALUE; return MEASURED_INTERVAL / sum; } {code} > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13042) The two cassandra nodes suddenly encounter hints each other and failed replaying.
[ https://issues.apache.org/jira/browse/CASSANDRA-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888394#comment-15888394 ] Greg Doermann commented on CASSANDRA-13042: --- Ok, so I am also seeing this on the other nodes (this is from 1.1.1.2, the other seed node) when this is happening and the nodes are flapping: {code} WARN [MessagingService-Outgoing-/1.1.1.2] 2017-02-28 16:05:20,516 OutboundTcpConnection.java:427 - Seed gossip version is -2147483648; will not connect with that version {code} This message seems to be the root of all the evil. I was able to stabilize things (today) once I removed the 1.1.1.1 from the seed on it's own config and restarted it. Hinted handoffs passed again and everything went back to normal. Once everything calmed down I added 1.1.1.1 back as a seed node on the 1.1.1.1 cassandra.yaml and did a restart (disable gossip, disable thirft, drain, restart). When it came back up no more errors, no more problems. Things seem to be stable again. Thinking back this seems to be a problem every time a seed node goes down. If another node goes down while this is happening that other node also has massive issues until we resolve the seed gossip version error. Let me know if you need anything more from me. > The two cassandra nodes suddenly encounter hints each other and failed > replaying. > - > > Key: CASSANDRA-13042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13042 > Project: Cassandra > Issue Type: Bug >Reporter: YheonHo.Choi >Priority: Critical > Attachments: out_2.2.2.1.txt, out_2.2.2.2.txt > > > Although there are no changes to cassandra, two node suddenly encounter hints > and failed replaying. > Any commands like disablethrift, disablegossip can not solve the above > problem and the only way was restart. > When we check the status of cluster, all nodes are looks UN but > describecluster show unreachable each other. > Here's the state of the cassandra during the above problem occurred. > IP addresses in report anonymized: > cassandra version: 2.2.5 > node 1 = 1.1.1.1 > node 2 = 1.1.1.2 > others = x.x.x.x > system.log > {code} > ## result of nodetool status on 1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:15:07,969 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:15:09,969 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:25:09,736 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:25:11,738 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:55270] 2016-11-24 06:25:12,625 > BigTableWriter.java:184 - Writing large partition > system/hints:d640677d-f354-aa8c-be89-d2a1648c24b2 (109029803 bytes) > WARN [CompactionExecutor:37908] 2016-11-24 06:35:23,682 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (250651758 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:35:23,727 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:35:25,728 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37909] 2016-11-24 06:45:53,615 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (340801514 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:45:53,718 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:45:55,719 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37912] 2016-11-24 06:56:20,884 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (472465093 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:56:20,966 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:56:22,967 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error :
[jira] [Commented] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888378#comment-15888378 ] Murukesh Mohanan commented on CASSANDRA-13001: -- Somewhat. Do you mean something like http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2? I considered that and it looked like I'd have to move this out of MonitoringTask. I also tried using Timer objects, but then there's the problem that the actual CQL query isn't available here, but a reconstruction, so I wasn't really sure what I could track against. > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888348#comment-15888348 ] Marcus Eriksson commented on CASSANDRA-13153: - My thinking was that the only time anyone could expect the sstable with the non-repaired ranges to be something other than UNREPAIRED would be if they passed in repaired sstables, so having the assert shows that this is not expected > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-13001: --- Reviewer: Jon Haddad > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888343#comment-15888343 ] Jon Haddad commented on CASSANDRA-13001: So in Cassandra, when we talk about something being pluggable, we typically mean the following: 1. It's going to be java code 2. the pluggable thing implements an interface defined in cassandra. 3. the class would be compiled and dropped in lib (loaded into classpath automatically) 4. The class can be specified in the yaml and is loaded by {{Class.forName()}} to pull the interface in We would need to convert the current slow query logger into a class of the defined interface and have it be the default if no class is specified in the yaml. Other classes could be written later to do things like implement the metrics library, use statsd, or send logs to the ELK stack. Does that help? > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13042) The two cassandra nodes suddenly encounter hints each other and failed replaying.
[ https://issues.apache.org/jira/browse/CASSANDRA-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888334#comment-15888334 ] Greg Doermann commented on CASSANDRA-13042: --- Yes, 1.1.1.1 is the IP address. > The two cassandra nodes suddenly encounter hints each other and failed > replaying. > - > > Key: CASSANDRA-13042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13042 > Project: Cassandra > Issue Type: Bug >Reporter: YheonHo.Choi >Priority: Critical > Attachments: out_2.2.2.1.txt, out_2.2.2.2.txt > > > Although there are no changes to cassandra, two node suddenly encounter hints > and failed replaying. > Any commands like disablethrift, disablegossip can not solve the above > problem and the only way was restart. > When we check the status of cluster, all nodes are looks UN but > describecluster show unreachable each other. > Here's the state of the cassandra during the above problem occurred. > IP addresses in report anonymized: > cassandra version: 2.2.5 > node 1 = 1.1.1.1 > node 2 = 1.1.1.2 > others = x.x.x.x > system.log > {code} > ## result of nodetool status on 1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:15:07,969 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:15:09,969 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:25:09,736 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:25:11,738 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:55270] 2016-11-24 06:25:12,625 > BigTableWriter.java:184 - Writing large partition > system/hints:d640677d-f354-aa8c-be89-d2a1648c24b2 (109029803 bytes) > WARN [CompactionExecutor:37908] 2016-11-24 06:35:23,682 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (250651758 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:35:23,727 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:35:25,728 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37909] 2016-11-24 06:45:53,615 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (340801514 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:45:53,718 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:45:55,719 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37912] 2016-11-24 06:56:20,884 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (472465093 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:56:20,966 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:56:22,967 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37911] 2016-11-24 07:07:12,568 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (577392172 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:07:12,643 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 07:07:14,643 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 07:09:15,929 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > ## result of nodetool status on 1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:11:37,300 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:11:39,301 HintedHandOffManager.java:486 > - Failed
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888289#comment-15888289 ] Stefan Podkowinski commented on CASSANDRA-13153: I'm not sure I really understand what the additional {{repairedAtNotContainedInRange}} parameter has to do with adding an assert for making sure "all sstables are unrepaired". Even if all sstables are, we still need to apply a repairedAt value for those ranges not successfully repaired. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888245#comment-15888245 ] Marcus Eriksson commented on CASSANDRA-13153: - Not sure I agree that adding the parameter helps, could we just add an assert in {{anticompactGroup}} that all sstables are unrepaired instead? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888189#comment-15888189 ] Ariel Weisberg commented on CASSANDRA-13241: I was saying that the chunk offsets don't need to take up as much space as they do now. A simple relative offset encoding scheme could make it 3 bytes per offset instead of 8. There is also http://www.javadoc.io/doc/me.lemire.integercompression/JavaFastPFOR/0.1.10 which doesn't have an off heap implementation near as I can tell, but does demonstrate how you can have an even more compact encoding that supports random access. The performance/space efficiency may not be what we want I can't really tell. You could decrease the chunk size by 1/4 with no impact on memory utilization. My question is with density like this how do the bloom filters fit in memory? How are the chunk offsets the high pole in the tent? > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13281) testall failure in org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization
Sean McCarthy created CASSANDRA-13281: - Summary: testall failure in org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization Key: CASSANDRA-13281 URL: https://issues.apache.org/jira/browse/CASSANDRA-13281 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Sean McCarthy Attachments: TEST-org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.log example failure: http://cassci.datastax.com/job/cassandra-3.11_testall/96/testReport/org.apache.cassandra.io.sstable.metadata/MetadataSerializerTest/testSerialization {code} Error Message expected:but was: {code}{code} Stacktrace junit.framework.AssertionFailedError: expected: but was: at org.apache.cassandra.io.sstable.metadata.MetadataSerializerTest.testSerialization(MetadataSerializerTest.java:72) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13280) testall failure in org.apache.cassandra.cql3.ViewTest.testAlterMvWithTTL
Sean McCarthy created CASSANDRA-13280: - Summary: testall failure in org.apache.cassandra.cql3.ViewTest.testAlterMvWithTTL Key: CASSANDRA-13280 URL: https://issues.apache.org/jira/browse/CASSANDRA-13280 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Sean McCarthy Attachments: TEST-org.apache.cassandra.cql3.ViewTest.log example failure: http://cassci.datastax.com/job/cassandra-3.11_testall/96/testReport/org.apache.cassandra.cql3/ViewTest/testAlterMvWithTTL {code} Error Message No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename {code}{code} Stacktrace com.datastax.driver.core.exceptions.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47) at org.apache.cassandra.cql3.CQLTester.executeNet(CQLTester.java:723) at org.apache.cassandra.cql3.ViewTest.createView(ViewTest.java:73) at org.apache.cassandra.cql3.ViewTest.testAlterMvWithTTL(ViewTest.java:1225) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename at com.datastax.driver.core.Responses$Error.asException(Responses.java:136) at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:177) at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:792) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:611) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1013) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:936) at com.datastax.shaded.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) at com.datastax.shaded.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) at com.datastax.shaded.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) at com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) at com.datastax.shaded.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) at com.datastax.shaded.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) at com.datastax.shaded.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at
[jira] [Assigned] (CASSANDRA-13277) Duplicate results with secondary index on static column
[ https://issues.apache.org/jira/browse/CASSANDRA-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie reassigned CASSANDRA-13277: --- Assignee: Andrés de la Peña > Duplicate results with secondary index on static column > --- > > Key: CASSANDRA-13277 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13277 > Project: Cassandra > Issue Type: Bug >Reporter: Romain Hardouin >Assignee: Andrés de la Peña > Labels: 2i > > As a follow up of > http://www.mail-archive.com/user@cassandra.apache.org/msg50816.html > Duplicate results appear with secondary index on static column with RF > 1. > Number of results vary depending on consistency level. > Here is a CCM session to reproduce the issue: > {code} > romain@debian:~$ ccm create 39 -n 3 -v 3.9 -s > Current cluster is now: 39 > romain@debian:~$ ccm node1 cqlsh > Connected to 39 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] > Use HELP for help. > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 2}; > cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added > timestamp, source text static, dest text, primary key (id, added)); > cqlsh> CREATE index ON test.idx_static (id2); > cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values > ('id1', 22,'2017-01-28', 'src1', 'dst1'); > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest > -+-+-++-- > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > (2 rows) > cqlsh> CONSISTENCY ALL > Consistency level set to ALL. > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest > -+-+-++-- > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > (3 rows) > {code} > When RF matches the number of nodes, it works as expected. > Example with RF=3 and 3 nodes: > {code} > romain@debian:~$ ccm create 39 -n 3 -v 3.9 -s > Current cluster is now: 39 > romain@debian:~$ ccm node1 cqlsh > Connected to 39 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] > Use HELP for help. > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added > timestamp, source text static, dest text, primary key (id, added)); > cqlsh> CREATE index ON test.idx_static (id2); > cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values > ('id1', 22,'2017-01-28', 'src1', 'dst1'); > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest > -+-+-++-- > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > (1 rows) > cqlsh> CONSISTENCY all > Consistency level set to ALL. > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest > -+-+-++-- > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > (1 rows) > {code} > Example with RF = 2 and 2 nodes: > {code} > romain@debian:~$ ccm create 39 -n 2 -v 3.9 -s > Current cluster is now: 39 > romain@debian:~$ ccm node1 cqlsh > Connected to 39 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] > Use HELP for help. > cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 2}; > cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added > timestamp, source text static, dest text, primary key (id, added)); > cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values > ('id1', 22,'2017-01-28', 'src1', 'dst1'); > cqlsh> CREATE index ON test.idx_static (id2); > cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values > ('id1', 22,'2017-01-28', 'src1', 'dst1'); > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest > -+-+-++-- > id1 | 2017-01-27 23:00:00.00+ | 22 | src1 | dst1 > (1 rows) > cqlsh> CONSISTENCY ALL > Consistency level set to ALL. > cqlsh> SELECT * FROM test.idx_static where id2=22; > id | added | id2 | source | dest >
[jira] [Updated] (CASSANDRA-13258) Rethink read-time defragmentation introduced in 1.1 (CASSANDRA-2503)
[ https://issues.apache.org/jira/browse/CASSANDRA-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13258: -- Issue Type: Task (was: Bug) > Rethink read-time defragmentation introduced in 1.1 (CASSANDRA-2503) > > > Key: CASSANDRA-13258 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13258 > Project: Cassandra > Issue Type: Task >Reporter: Nate McCall > > tl,dr; we issue a Mutation(!) on a read when using STCS and there are more > than minCompactedThreshold SSTables encountered by the iterator. (See > org/apache/cassandra/db/SinglePartitionReadCommand.java:782) > I can see a couple of use cases where this *might* be useful, but from a > practical stand point, this is an excellent way to exacerbate compaction > falling behind. > With the introduction of other, purpose built compaction strategies, I would > be interested to hear why anyone would consider this still a good idea. Note > that we only do it for STCS so at best, we are inconsistent. > There are some interesting comments on CASSANDRA-10342 regarding this as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13258) Rethink read-time defragmentation introduced in 1.1 (CASSANDRA-2503)
[ https://issues.apache.org/jira/browse/CASSANDRA-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-13258: Issue Type: Improvement (was: Task) > Rethink read-time defragmentation introduced in 1.1 (CASSANDRA-2503) > > > Key: CASSANDRA-13258 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13258 > Project: Cassandra > Issue Type: Improvement >Reporter: Nate McCall > > tl,dr; we issue a Mutation(!) on a read when using STCS and there are more > than minCompactedThreshold SSTables encountered by the iterator. (See > org/apache/cassandra/db/SinglePartitionReadCommand.java:782) > I can see a couple of use cases where this *might* be useful, but from a > practical stand point, this is an excellent way to exacerbate compaction > falling behind. > With the introduction of other, purpose built compaction strategies, I would > be interested to hear why anyone would consider this still a good idea. Note > that we only do it for STCS so at best, we are inconsistent. > There are some interesting comments on CASSANDRA-10342 regarding this as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888112#comment-15888112 ] Benjamin Roth commented on CASSANDRA-13279: --- Maybe it was a bit misleading. I am not defending a new source per se. I am simply 'pro' improving docs by adding problem/solution centric resources in a place that can easily be found by anyone. E.g. if I google for "Cassandra performance tuning", the first match should go to an official guide. I'd love to volunteer but first I'd like to work on MVs which I am deferring since end of '16. But if there is a consensus of a possible structure and if I have access to the docs I am happy to add content whenever I feel like. > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13269) Snapshot support for custom secondary indices
[ https://issues.apache.org/jira/browse/CASSANDRA-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888103#comment-15888103 ] Andrés de la Peña commented on CASSANDRA-13269: --- Adding support for snapshots in secondary indexes seems to me an useful feature. I guess that the idea is that index snapshots would be cleared via {{nodetool clearsnapshot}} only if the index implementation stores them in the same directories where the base table snapshots are stored, isn't it? Regarding the implementation: - A default implementation of {{Index.getSnapshotWithoutFlushTask}} returning null (or maybe Optional.empty()) would make it backwards compatible with existing implementations, as it was done with {{Index.validate(ReadCommand)}}. - I think that the addition of the set of indexes in {{SecondaryIndexManager}} is probably out of the scope of this ticket and could have its own ticket, don't you think so? - It would be useful to add some tests. {{CustomIndexTest}} seems a good place to place them. - {{StubIndex.getSnapshotWithoutFlushTask}} has an autogenerated implementation with a TODO comment about it. - Braces placement doesn't satisfy the [code style|https://wiki.apache.org/cassandra/CodeStyle]. [~jjirsa], maybe someone more familiarised with ColumnFamilyStore and snapshots, and with access to the CI system, should take a look at it. > Snapshot support for custom secondary indices > - > > Key: CASSANDRA-13269 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13269 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: vincent royer >Priority: Trivial > Labels: features > Fix For: 3.0.12, 3.11.0 > > Attachments: 0001-CASSANDRA-13269-custom-indices-snapshot.patch > > > Enhance the index API to support snapshot of custom secondary indices. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888097#comment-15888097 ] Sylvain Lebresne commented on CASSANDRA-13279: -- I'm not sure about what you're trying to argue here. You seem to be saying that there is confusion due to having too many sources of information and you're seriously arguing adding a completely new one is going to solve the problem? bq. How about creating a structure in the official cassandra docs with use cases and Q for performance tuning? Sounds like a good idea. Are you volunteering? > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CASSANDRA-13065) Consistent range movements to not require MV updates to go through write paths
[ https://issues.apache.org/jira/browse/CASSANDRA-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth reassigned CASSANDRA-13065: - Assignee: Benjamin Roth > Consistent range movements to not require MV updates to go through write > paths > --- > > Key: CASSANDRA-13065 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13065 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: Benjamin Roth >Priority: Critical > > Booting or decommisioning nodes with MVs is unbearably slow as all streams go > through the regular write paths. This causes read-before-writes for every > mutation and during bootstrap it causes them to be sent to batchlog. > The makes it virtually impossible to boot a new node in an acceptable amount > of time. > Using the regular streaming behaviour for consistent range movements works > much better in this case and does not break the MV local consistency contract. > Already tested on own cluster. > Bootstrap case is super easy to handle, decommission case requires > CASSANDRA-13064 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888084#comment-15888084 ] Benjamin Roth commented on CASSANDRA-13279: --- I can understand your consideration about the deployment issues of centralized settings in a non-centralized settings file. But I have contradict in the second point. By "somewhat hidden" I don't mean it does not exist but an average user won't come across the documentation or the valueable information (why should I tweak that) that is related to it. It is very difficult to find the right resource / doc in the CS ecosystem. There is datastax, there is the official CS site (which contains a lot of TODOs and empty pages), wiki.apache.org (looks very outdated) and there are zillions of distributed and spread resources like blogs all over the net. Finding the right information (as a new user) is the famous needle in the haystack. You are a user / developer from the early ages and know every corner of the CS universe but for new users it is hardly overseeable and 'somewhat hidden'. To be honest: When I first installed and tested CS, I was totally lost. I had to test a lot, read many many many different resources, go through the hell of trial and error, analyzing, debugging, compiling and testing again with a lot of pain to get the knowledge I have to day. Tweaking chunk_size was quite the same. I tried a lot of stuff, posted on lists, ... and after some days I was like "Wait, there was this setting in DevCenter with that 'chunk_size', what does it exactly do and what happens if ... AH it works!". How about creating a structure in the official cassandra docs with use cases and Q for performance tuning? Sth. like a structured version of Al Tobeys tuning guide with a Problem > Solution section. > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888036#comment-15888036 ] Sylvain Lebresne commented on CASSANDRA-13279: -- I'm actually kind of -1 on this. Putting default table settings in a file is asking for trouble if the file ends up differing on each node. So you'd have to gossip something to make sure that doesn't happen, but the complexity of that isn't worth it imo. I also suspect this would likely be annoying code-wise as we'd probably end up with many places to update for every new setting (arguably you can avoid this by getting fancy, but that also feels like more complexity than is worth). >From the description, I really see 2 different "problems" being raised: # the fact default settings are only so good but that having to set every setting on every new table to suit your need can be annoying. # the argument that table settings would be "somewhat hidden". On the first point, I agree that having some convenience could be useful, especially for large installations. I think however than something along the lines of the "profile" idea that was suggested on CASSANDRA-11408 would be more flexible (I don't exactly buy the exact syntax on that ticket, but I like the general idea). In particular, we could very well provide a few sensible "profiles" by default. On the second point, I'm not really sure I agree. Table options are described in the documentation [here|http://cassandra.apache.org/doc/latest/cql/ddl.html#table-options], and while that can certainly be improved (patches to the doc are welcome), I don't think this is particularly hidden. I think we'd be better off improving the existing doc rather than trying to disperse information in too many places. > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily
[ https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887961#comment-15887961 ] Benjamin Roth edited comment on CASSANDRA-13226 at 2/28/17 1:12 PM: Sorry for that many comments, just another thought: Flushes can be optimized very easily in that way that a flush is only executed if the memtable contains mutations for the requested range OR if the memtable exceeds a certain size, so that the check is still cheap. I implemented this just for fun some months ago but did never create a ticket for it. See patch here https://github.com/Jaumo/cassandra/commit/983514b0d3e15cea042533273ead5ea33c00bacf Just saw it also disabled pre-repair flush as proposed before. was (Author: brstgt): Sorry for that many comments, just another thought: Flushes can be optimized very easily in that way that a flush is only executed if the memtable contains mutations for the requested range OR if the memtable exceeds a certain size, so that the check is still cheap. I implemented this just for fun some months ago but did never create a ticket for it. > StreamPlan for incremental repairs flushing memtables unnecessarily > --- > > Key: CASSANDRA-13226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13226 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > Since incremental repairs are run against a fixed dataset, there's no need to > flush memtables when streaming for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Murukesh Mohanan updated CASSANDRA-13001: - Assignee: Murukesh Mohanan Fix Version/s: 4.0 Status: Patch Available (was: Open) In this patch, I add three new configuration options in {{cassandra.yaml}} to decide how slow queries are logged: - {{slow_query_log_methods}} - a string of space-separated logging methods ({{log}}, {{command}} and {{table}}). 1. {{log}} is what's already happening, 2. {{command}} will run an external command and send JSON-encoded slow-query data to its input. The script I wrote for CASSANDRA-13000 can be a consumer here. A script can parse the JSON and send data to external servers. 3. {{table}} will save the entries to a specified Cassandra table - {{slow_query_log_command}} - an array of strings, specifying the path to the command and any arguments - {{slow_query_log_table}} - the table where the queries should be logged (in {{keyspace.table}} format) Multiple methods can be specified. > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Murukesh Mohanan updated CASSANDRA-13001: - Attachment: 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily
[ https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887961#comment-15887961 ] Benjamin Roth commented on CASSANDRA-13226: --- Sorry for that many comments, just another thought: Flushes can be optimized very easily in that way that a flush is only executed if the memtable contains mutations for the requested range OR if the memtable exceeds a certain size, so that the check is still cheap. I implemented this just for fun some months ago but did never create a ticket for it. > StreamPlan for incremental repairs flushing memtables unnecessarily > --- > > Key: CASSANDRA-13226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13226 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > Since incremental repairs are run against a fixed dataset, there's no need to > flush memtables when streaming for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily
[ https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887945#comment-15887945 ] Benjamin Roth commented on CASSANDRA-13226: --- I am referring to this "stacktrace": RepairMessageVerbHandler.doVerb (case VALIDATION_REQUEST) CompactionManager.instance.submitValidation(store, validator) CompactionManager.doValidationCompaction => StorageService.instance.forceKeyspaceFlush After that merkle trees are calculated and based on that streams are triggered. Thats why all data that is electable for transfer has already been flushed. Also avoiding a flush locally is only the half way. Streams REQUESTED by a stream plan also cause a flush on the sender side. But that sender also has already validated (and so flushed) the requested data. Maybe I missed sth but from what I can see, a REPAIR stream never requires a flush. > StreamPlan for incremental repairs flushing memtables unnecessarily > --- > > Key: CASSANDRA-13226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13226 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > Since incremental repairs are run against a fixed dataset, there's no need to > flush memtables when streaming for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13000) slow query log analysis tool
[ https://issues.apache.org/jira/browse/CASSANDRA-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Murukesh Mohanan updated CASSANDRA-13000: - Assignee: Murukesh Mohanan Fix Version/s: 4.0 Status: Patch Available (was: Open) I have wrapped the previously uploaded script in a patch, placing it in {{tools/bin}} and creating the bat/shell wrappers for it. I also added a {{-g/--grep}} option for filtering queries with regex. > slow query log analysis tool > > > Key: CASSANDRA-13000 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13000 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Adds-a-cqldumpslow-tool-which-analyses-the-debug-log.patch, > csqldumpslow.py > > > As a follow up to CASSANDRA-12403, it would be very helpful to have a tool to > process the slow queries that are logged. In the MySQL world, there's a tool > called mysqldumpslow, which processes a slow query log, abstracts the > parameters to prepared statements, and shows the queries which are causing > problems based on frequency. The {{mysqldumpslow}} utillity shows an > aggregated count & time statistics spent on slow queries. For instance: > {code}shell> mysqldumpslow > Reading mysql slow query log from > /usr/local/mysql/data/mysqld51-apple-slow.log > Count: 1 Time=4.32s (4s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t2 select * from t1 > Count: 3 Time=2.53s (7s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t2 select * from t1 limit N > Count: 3 Time=2.13s (6s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t1 select * from t1{code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13000) slow query log analysis tool
[ https://issues.apache.org/jira/browse/CASSANDRA-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Murukesh Mohanan updated CASSANDRA-13000: - Attachment: 0001-Adds-a-cqldumpslow-tool-which-analyses-the-debug-log.patch > slow query log analysis tool > > > Key: CASSANDRA-13000 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13000 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Jon Haddad > Attachments: > 0001-Adds-a-cqldumpslow-tool-which-analyses-the-debug-log.patch, > csqldumpslow.py > > > As a follow up to CASSANDRA-12403, it would be very helpful to have a tool to > process the slow queries that are logged. In the MySQL world, there's a tool > called mysqldumpslow, which processes a slow query log, abstracts the > parameters to prepared statements, and shows the queries which are causing > problems based on frequency. The {{mysqldumpslow}} utillity shows an > aggregated count & time statistics spent on slow queries. For instance: > {code}shell> mysqldumpslow > Reading mysql slow query log from > /usr/local/mysql/data/mysqld51-apple-slow.log > Count: 1 Time=4.32s (4s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t2 select * from t1 > Count: 3 Time=2.53s (7s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t2 select * from t1 limit N > Count: 3 Time=2.13s (6s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost > insert into t1 select * from t1{code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13272) "nodetool bootstrap resume" does not exit
[ https://issues.apache.org/jira/browse/CASSANDRA-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13272: --- Labels: lhf (was: ) > "nodetool bootstrap resume" does not exit > - > > Key: CASSANDRA-13272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13272 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle, Streaming and Messaging >Reporter: Tom van der Woerdt > Labels: lhf > > I have a script that calls "nodetool bootstrap resume" after a failed join > (in my environment some streams sometimes fail due to mis-tuning of stream > bandwidth settings). However, if the streams fail again, nodetool won't exit. > Last lines before it just hangs forever : > {noformat} > [2017-02-26 07:02:42,287] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12670-big-Data.db > (progress: 1112%) > [2017-02-26 07:02:42,287] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12670-big-Data.db > (progress: 1112%) > [2017-02-26 07:02:59,843] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12671-big-Data.db > (progress: 1112%) > [2017-02-26 09:25:51,000] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:33:45,017] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:39:27,216] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:53:33,084] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:55:07,115] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 10:06:49,557] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 10:40:55,880] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 11:09:21,025] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 12:44:35,755] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 12:49:18,867] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 13:23:50,611] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 13:23:50,612] Stream failed > {noformat} > At that point ("Stream failed") I would expect nodetool to exit with a > non-zero exit code. Instead, it just wants me to ^C it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13272) "nodetool bootstrap resume" does not exit
[ https://issues.apache.org/jira/browse/CASSANDRA-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13272: --- Component/s: Streaming and Messaging Lifecycle > "nodetool bootstrap resume" does not exit > - > > Key: CASSANDRA-13272 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13272 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle, Streaming and Messaging >Reporter: Tom van der Woerdt > Labels: lhf > > I have a script that calls "nodetool bootstrap resume" after a failed join > (in my environment some streams sometimes fail due to mis-tuning of stream > bandwidth settings). However, if the streams fail again, nodetool won't exit. > Last lines before it just hangs forever : > {noformat} > [2017-02-26 07:02:42,287] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12670-big-Data.db > (progress: 1112%) > [2017-02-26 07:02:42,287] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12670-big-Data.db > (progress: 1112%) > [2017-02-26 07:02:59,843] received file > /var/lib/cassandra/data/keyspace/table-63d5d42009fa11e5879ebd9463bffdac/mc-12671-big-Data.db > (progress: 1112%) > [2017-02-26 09:25:51,000] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:33:45,017] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:39:27,216] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:53:33,084] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 09:55:07,115] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 10:06:49,557] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 10:40:55,880] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 11:09:21,025] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 12:44:35,755] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 12:49:18,867] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 13:23:50,611] session with /10.x.y.z complete (progress: 1112%) > [2017-02-26 13:23:50,612] Stream failed > {noformat} > At that point ("Stream failed") I would expect nodetool to exit with a > non-zero exit code. Instead, it just wants me to ^C it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13153: --- Reproduced In: 2.2.8, 2.2.7 (was: 2.2.7, 2.2.8) Status: Patch Available (was: Open) > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13042) The two cassandra nodes suddenly encounter hints each other and failed replaying.
[ https://issues.apache.org/jira/browse/CASSANDRA-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887808#comment-15887808 ] Stefan Podkowinski commented on CASSANDRA-13042: Greg, is 1.1.1.1 the node of the provided logs? > The two cassandra nodes suddenly encounter hints each other and failed > replaying. > - > > Key: CASSANDRA-13042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13042 > Project: Cassandra > Issue Type: Bug >Reporter: YheonHo.Choi >Priority: Critical > Attachments: out_2.2.2.1.txt, out_2.2.2.2.txt > > > Although there are no changes to cassandra, two node suddenly encounter hints > and failed replaying. > Any commands like disablethrift, disablegossip can not solve the above > problem and the only way was restart. > When we check the status of cluster, all nodes are looks UN but > describecluster show unreachable each other. > Here's the state of the cassandra during the above problem occurred. > IP addresses in report anonymized: > cassandra version: 2.2.5 > node 1 = 1.1.1.1 > node 2 = 1.1.1.2 > others = x.x.x.x > system.log > {code} > ## result of nodetool status on 1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:15:07,969 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:15:09,969 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [HintedHandoff:2] 2016-11-24 06:25:09,736 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:25:11,738 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [MemtableFlushWriter:55270] 2016-11-24 06:25:12,625 > BigTableWriter.java:184 - Writing large partition > system/hints:d640677d-f354-aa8c-be89-d2a1648c24b2 (109029803 bytes) > WARN [CompactionExecutor:37908] 2016-11-24 06:35:23,682 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (250651758 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:35:23,727 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:35:25,728 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37909] 2016-11-24 06:45:53,615 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (340801514 bytes) > INFO [HintedHandoff:2] 2016-11-24 06:45:53,718 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 06:45:55,719 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37912] 2016-11-24 06:56:20,884 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (472465093 bytes) > INFO [HintedHandoff:1] 2016-11-24 06:56:20,966 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:56:22,967 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > WARN [CompactionExecutor:37911] 2016-11-24 07:07:12,568 > BigTableWriter.java:184 - Writing large partition > system/hints:8caa54f3-7d67-40d6-b224-8c64a1d289be (577392172 bytes) > INFO [HintedHandoff:2] 2016-11-24 07:07:12,643 HintedHandOffManager.java:367 > - Started hinted handoff for host: 8caa54f3-7d67-40d6-b224-8c64a1d289be with > IP: /1.1.1.2 > INFO [HintedHandoff:2] 2016-11-24 07:07:14,643 HintedHandOffManager.java:486 > - Failed replaying hints to /1.1.1.2; aborting (0 delivered), error : > Operation timed out - received only 0 responses. > INFO [IndexSummaryManager:1] 2016-11-24 07:09:15,929 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > ## result of nodetool status on 1.1.1.2 > INFO [HintedHandoff:1] 2016-11-24 06:11:37,300 HintedHandOffManager.java:367 > - Started hinted handoff for host: b79124e9-394c-4400-a8e7-a0c94aec6878 with > IP: /1.1.1.1 > INFO [HintedHandoff:1] 2016-11-24 06:11:39,301
[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887728#comment-15887728 ] Romain Hardouin edited comment on CASSANDRA-13241 at 2/28/17 10:25 AM: --- I created CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB of compression metadata per TB would be a good trade-off. was (Author: rha): I created CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB would be a good trade-off. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887728#comment-15887728 ] Romain Hardouin edited comment on CASSANDRA-13241 at 2/28/17 10:24 AM: --- I created CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB would be a good trade-off. was (Author: rha): I created https://issues.apache.org/jira/browse/CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB would be a good trade-off. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887733#comment-15887733 ] Benjamin Roth commented on CASSANDRA-13279: --- Great idea! +1 > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887728#comment-15887728 ] Romain Hardouin commented on CASSANDRA-13241: - I created https://issues.apache.org/jira/browse/CASSANDRA-13279 because it's a broader problem IMHO. I don't say we should stay with 64KB. Maybe 8KB i.e. 1GB per TB would be a good trade-off. > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13279) Table default settings file
[ https://issues.apache.org/jira/browse/CASSANDRA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain Hardouin updated CASSANDRA-13279: Summary: Table default settings file (was: Table settings file) > Table default settings file > --- > > Key: CASSANDRA-13279 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 > Project: Cassandra > Issue Type: Wish > Components: Configuration >Reporter: Romain Hardouin >Priority: Minor > Labels: config, documentation > > Following CASSANDRA-13241 we often see that there is no one-size-fits-all > value for settings. We can't find a sweet spot for every use cases. > It's true for settings in cassandra.yaml but as [~brstgt] said for > {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". > Many table settings are somewhat hidden for the average user. Some people > will think RTFM but if a file - say tables.yaml - contains default values for > table settings, more people would pay attention to them. And of course this > file could contain useful comments and guidance. > Example with SSTable compression options: > {code} > # General comments about sstable compression > compression: > # First of all: explain what is it. We split each SSTable into chunks, > etc. > # Explain when users should lower this value (e.g. 4) or when a higher > value like 64 or 128 are recommended. > # Explain the trade-off between read latency and off-heap compression > metadata size. > chunk_length_in_kb: 16 > > # List of available compressor: LZ4Compressor, SnappyCompressor, and > DeflateCompressor > # Explain trade-offs, some specific use cases (e.g. archives), etc. > class: 'LZ4Compressor' > > # If you want to disable compression by default, uncomment the following > line > #enabled: false > {code} > So instead of hard coded values we would end up with something like > TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13279) Table settings file
Romain Hardouin created CASSANDRA-13279: --- Summary: Table settings file Key: CASSANDRA-13279 URL: https://issues.apache.org/jira/browse/CASSANDRA-13279 Project: Cassandra Issue Type: Wish Components: Configuration Reporter: Romain Hardouin Priority: Minor Following CASSANDRA-13241 we often see that there is no one-size-fits-all value for settings. We can't find a sweet spot for every use cases. It's true for settings in cassandra.yaml but as [~brstgt] said for {{chunk_length_in_kb}}: "this is somewhat hidden for the average user". Many table settings are somewhat hidden for the average user. Some people will think RTFM but if a file - say tables.yaml - contains default values for table settings, more people would pay attention to them. And of course this file could contain useful comments and guidance. Example with SSTable compression options: {code} # General comments about sstable compression compression: # First of all: explain what is it. We split each SSTable into chunks, etc. # Explain when users should lower this value (e.g. 4) or when a higher value like 64 or 128 are recommended. # Explain the trade-off between read latency and off-heap compression metadata size. chunk_length_in_kb: 16 # List of available compressor: LZ4Compressor, SnappyCompressor, and DeflateCompressor # Explain trade-offs, some specific use cases (e.g. archives), etc. class: 'LZ4Compressor' # If you want to disable compression by default, uncomment the following line #enabled: false {code} So instead of hard coded values we would end up with something like TableConfig + TableDescriptor à la Config + DatabaseDescriptor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken edited comment on CASSANDRA-13265 at 2/28/17 10:10 AM: --- I see your argument. On larger clusters this may get problematic. I will try to summarize the alternative solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... -- ... let the other Threads continue (1) -- ... let the other Threads wait (2) - Use an "Expiration Thread Pool" (3). I am not (currently) in favor for it, and if I understood you correctly then it is also not your preference. I will implement option (1) today. Please see the attached Thread Dump to see which Threads are blocking. Here are two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). I think in the Threaddump there is also a HintDispatcher Thread that is parking on the same lock. java.util.concurrent.LinkedBlockingQueue$Itr.remove: {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} java.util.concurrent.LinkedBlockingQueue$Itr.next: {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at
[jira] [Resolved] (CASSANDRA-13273) Test case : DatabaseDescriptorRefTest failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-13273. -- Resolution: Invalid Fix Version/s: (was: 3.10) These mentioned classes are *not* part of the project's code base. > Test case : DatabaseDescriptorRefTest failing > - > > Key: CASSANDRA-13273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13273 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: amit@p006n03:~/cassandra$ lscpu > Architecture: ppc64le > Byte Order:Little Endian > CPU(s):160 > On-line CPU(s) list: 0-159 > Thread(s) per core:8 > Core(s) per socket:5 > Socket(s): 4 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name:POWER8E (raw), altivec supported > CPU max MHz: 3690. > CPU min MHz: 2061. > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-39 > NUMA node1 CPU(s): 40-79 > NUMA node16 CPU(s):80-119 > NUMA node17 CPU(s):120-159 > amit@p006n03:~/cassandra$ cat /etc/os-release > NAME="Ubuntu" > VERSION="16.04.1 LTS (Xenial Xerus)" > ID=ubuntu > ID_LIKE=debian > PRETTY_NAME="Ubuntu 16.04.1 LTS" > VERSION_ID="16.04" > HOME_URL="http://www.ubuntu.com/; > SUPPORT_URL="http://help.ubuntu.com/; > BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; > VERSION_CODENAME=xenial > UBUNTU_CODENAME=xenial > amit@p006n03:~/cassandra$ bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] > Use HELP for help. > cqlsh> exit >Reporter: Amitkumar Ghatwal > > Hi All, > I am getting test case failures for "DatabaseDescriptorRefTest" > > amit@p006n03:~/cassandra$ ant test -Dtest.name=DatabaseDescriptorRefTest > Buildfile: /home/amit/cassandra/build.xml > init: > maven-ant-tasks-localrepo: > maven-ant-tasks-download: > maven-ant-tasks-init: > maven-declare-dependencies: > maven-ant-tasks-retrieve-build: > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies.xml > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies-sources.xml > [unzip] Expanding: > /home/amit/cassandra/build/lib/jars/org.jacoco.agent-0.7.5.201505241946.jar > into /home/amit/cassandra/build/lib/jars > check-gen-cql3-grammar: > gen-cql3-grammar: > generate-cql-html: > generate-jflex-java: > build-project: > [echo] apache-cassandra: /home/amit/cassandra/build.xml > createVersionPropFile: > [propertyfile] Updating property file: > /home/amit/cassandra/src/resources/org/apache/cassandra/config/version.properties > [copy] Copying 1 file to /home/amit/cassandra/build/classes/main > build: > build-test: > test: > [junit] WARNING: multiple versions of ant detected in path for junit > [junit] > jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class > [junit] and > jar:file:/home/amit/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.006 sec > [junit] > [junit] - Standard Output --- > [junit] ERROR [main] 2017-02-26 09:56:05,169 ?:? - SLF4J: stderr > [junit] - --- > [junit] - Standard Error - > [junit] > [junit] > [junit] VIOLATION: > org.apache.cassandra.config.Config$CAPIFlashCommitlogChunkManagerType > [junit] java.lang.Exception > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest$1.findClass(DatabaseDescriptorRefTest.java:169) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > [junit] at java.lang.Class.getDeclaredMethods0(Native Method) > [junit] at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > [junit] at java.lang.Class.getDeclaredMethod(Class.java:2128) > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:215) > [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >
[jira] [Comment Edited] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken edited comment on CASSANDRA-13265 at 2/28/17 9:53 AM: -- I see your argument. On larger clusters this may get problematic. Lets evaluate different solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... --- ... let the other Threads continue --- ... let the other Threads wait - Go with your idea of an "Expiration Thread Pool" Please see the attached Thread Dump to see which Threads are blocking. Here are two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). I think in the Threaddump there is also a HintDispatcher Thread that is parking on the same lock. java.util.concurrent.LinkedBlockingQueue$Itr.remove: {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} java.util.concurrent.LinkedBlockingQueue$Itr.next: {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at
[jira] [Reopened] (CASSANDRA-13273) Test case : DatabaseDescriptorRefTest failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp reopened CASSANDRA-13273: -- > Test case : DatabaseDescriptorRefTest failing > - > > Key: CASSANDRA-13273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13273 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: amit@p006n03:~/cassandra$ lscpu > Architecture: ppc64le > Byte Order:Little Endian > CPU(s):160 > On-line CPU(s) list: 0-159 > Thread(s) per core:8 > Core(s) per socket:5 > Socket(s): 4 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name:POWER8E (raw), altivec supported > CPU max MHz: 3690. > CPU min MHz: 2061. > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-39 > NUMA node1 CPU(s): 40-79 > NUMA node16 CPU(s):80-119 > NUMA node17 CPU(s):120-159 > amit@p006n03:~/cassandra$ cat /etc/os-release > NAME="Ubuntu" > VERSION="16.04.1 LTS (Xenial Xerus)" > ID=ubuntu > ID_LIKE=debian > PRETTY_NAME="Ubuntu 16.04.1 LTS" > VERSION_ID="16.04" > HOME_URL="http://www.ubuntu.com/; > SUPPORT_URL="http://help.ubuntu.com/; > BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; > VERSION_CODENAME=xenial > UBUNTU_CODENAME=xenial > amit@p006n03:~/cassandra$ bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] > Use HELP for help. > cqlsh> exit >Reporter: Amitkumar Ghatwal > Fix For: 3.10 > > > Hi All, > I am getting test case failures for "DatabaseDescriptorRefTest" > > amit@p006n03:~/cassandra$ ant test -Dtest.name=DatabaseDescriptorRefTest > Buildfile: /home/amit/cassandra/build.xml > init: > maven-ant-tasks-localrepo: > maven-ant-tasks-download: > maven-ant-tasks-init: > maven-declare-dependencies: > maven-ant-tasks-retrieve-build: > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies.xml > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies-sources.xml > [unzip] Expanding: > /home/amit/cassandra/build/lib/jars/org.jacoco.agent-0.7.5.201505241946.jar > into /home/amit/cassandra/build/lib/jars > check-gen-cql3-grammar: > gen-cql3-grammar: > generate-cql-html: > generate-jflex-java: > build-project: > [echo] apache-cassandra: /home/amit/cassandra/build.xml > createVersionPropFile: > [propertyfile] Updating property file: > /home/amit/cassandra/src/resources/org/apache/cassandra/config/version.properties > [copy] Copying 1 file to /home/amit/cassandra/build/classes/main > build: > build-test: > test: > [junit] WARNING: multiple versions of ant detected in path for junit > [junit] > jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class > [junit] and > jar:file:/home/amit/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.006 sec > [junit] > [junit] - Standard Output --- > [junit] ERROR [main] 2017-02-26 09:56:05,169 ?:? - SLF4J: stderr > [junit] - --- > [junit] - Standard Error - > [junit] > [junit] > [junit] VIOLATION: > org.apache.cassandra.config.Config$CAPIFlashCommitlogChunkManagerType > [junit] java.lang.Exception > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest$1.findClass(DatabaseDescriptorRefTest.java:169) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > [junit] at java.lang.Class.getDeclaredMethods0(Native Method) > [junit] at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > [junit] at java.lang.Class.getDeclaredMethod(Class.java:2128) > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:215) > [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken commented on CASSANDRA-13265: - I see your argument. On larger clusters this may get problematic. Lets evaluate different solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... --- ... let the other Threads continue --- ... let the other Threads wait - Go with your idea of an "Expiration Thread Pool" Please see the attached Thread Dump to see which Threads are blocking. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
[jira] [Resolved] (CASSANDRA-13273) Test case : DatabaseDescriptorRefTest failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitkumar Ghatwal resolved CASSANDRA-13273. --- Resolution: Fixed > Test case : DatabaseDescriptorRefTest failing > - > > Key: CASSANDRA-13273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13273 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: amit@p006n03:~/cassandra$ lscpu > Architecture: ppc64le > Byte Order:Little Endian > CPU(s):160 > On-line CPU(s) list: 0-159 > Thread(s) per core:8 > Core(s) per socket:5 > Socket(s): 4 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name:POWER8E (raw), altivec supported > CPU max MHz: 3690. > CPU min MHz: 2061. > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-39 > NUMA node1 CPU(s): 40-79 > NUMA node16 CPU(s):80-119 > NUMA node17 CPU(s):120-159 > amit@p006n03:~/cassandra$ cat /etc/os-release > NAME="Ubuntu" > VERSION="16.04.1 LTS (Xenial Xerus)" > ID=ubuntu > ID_LIKE=debian > PRETTY_NAME="Ubuntu 16.04.1 LTS" > VERSION_ID="16.04" > HOME_URL="http://www.ubuntu.com/; > SUPPORT_URL="http://help.ubuntu.com/; > BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; > VERSION_CODENAME=xenial > UBUNTU_CODENAME=xenial > amit@p006n03:~/cassandra$ bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] > Use HELP for help. > cqlsh> exit >Reporter: Amitkumar Ghatwal > Fix For: 3.10 > > > Hi All, > I am getting test case failures for "DatabaseDescriptorRefTest" > > amit@p006n03:~/cassandra$ ant test -Dtest.name=DatabaseDescriptorRefTest > Buildfile: /home/amit/cassandra/build.xml > init: > maven-ant-tasks-localrepo: > maven-ant-tasks-download: > maven-ant-tasks-init: > maven-declare-dependencies: > maven-ant-tasks-retrieve-build: > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies.xml > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies-sources.xml > [unzip] Expanding: > /home/amit/cassandra/build/lib/jars/org.jacoco.agent-0.7.5.201505241946.jar > into /home/amit/cassandra/build/lib/jars > check-gen-cql3-grammar: > gen-cql3-grammar: > generate-cql-html: > generate-jflex-java: > build-project: > [echo] apache-cassandra: /home/amit/cassandra/build.xml > createVersionPropFile: > [propertyfile] Updating property file: > /home/amit/cassandra/src/resources/org/apache/cassandra/config/version.properties > [copy] Copying 1 file to /home/amit/cassandra/build/classes/main > build: > build-test: > test: > [junit] WARNING: multiple versions of ant detected in path for junit > [junit] > jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class > [junit] and > jar:file:/home/amit/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.006 sec > [junit] > [junit] - Standard Output --- > [junit] ERROR [main] 2017-02-26 09:56:05,169 ?:? - SLF4J: stderr > [junit] - --- > [junit] - Standard Error - > [junit] > [junit] > [junit] VIOLATION: > org.apache.cassandra.config.Config$CAPIFlashCommitlogChunkManagerType > [junit] java.lang.Exception > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest$1.findClass(DatabaseDescriptorRefTest.java:169) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > [junit] at java.lang.Class.getDeclaredMethods0(Native Method) > [junit] at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > [junit] at java.lang.Class.getDeclaredMethod(Class.java:2128) > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest.testDatabaseDescriptorRef(DatabaseDescriptorRefTest.java:215) > [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit] at >
[jira] [Commented] (CASSANDRA-13273) Test case : DatabaseDescriptorRefTest failing
[ https://issues.apache.org/jira/browse/CASSANDRA-13273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887680#comment-15887680 ] Amitkumar Ghatwal commented on CASSANDRA-13273: --- Hi All i managed to resolve the below failing test case .. The issue was that in "test/unit/org/apache/cassandra/config/DatabaseDescriptorRefTest.java" , for class "DatabaseDescriptorRefTest" it was missing below entries for "validClasses" "org.apache.cassandra.config.Config$CAPIFlashCommitlogChunkManagerType", "org.apache.cassandra.config.Config$CAPIFlashCommitlogBufferAllocationStrategyType", "org.apache.cassandra.config.Config$CommitLogType", Post adding above lines , reran the individual test again and passed successfully. > Test case : DatabaseDescriptorRefTest failing > - > > Key: CASSANDRA-13273 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13273 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: amit@p006n03:~/cassandra$ lscpu > Architecture: ppc64le > Byte Order:Little Endian > CPU(s):160 > On-line CPU(s) list: 0-159 > Thread(s) per core:8 > Core(s) per socket:5 > Socket(s): 4 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name:POWER8E (raw), altivec supported > CPU max MHz: 3690. > CPU min MHz: 2061. > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-39 > NUMA node1 CPU(s): 40-79 > NUMA node16 CPU(s):80-119 > NUMA node17 CPU(s):120-159 > amit@p006n03:~/cassandra$ cat /etc/os-release > NAME="Ubuntu" > VERSION="16.04.1 LTS (Xenial Xerus)" > ID=ubuntu > ID_LIKE=debian > PRETTY_NAME="Ubuntu 16.04.1 LTS" > VERSION_ID="16.04" > HOME_URL="http://www.ubuntu.com/; > SUPPORT_URL="http://help.ubuntu.com/; > BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; > VERSION_CODENAME=xenial > UBUNTU_CODENAME=xenial > amit@p006n03:~/cassandra$ bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4] > Use HELP for help. > cqlsh> exit >Reporter: Amitkumar Ghatwal > Fix For: 3.10 > > > Hi All, > I am getting test case failures for "DatabaseDescriptorRefTest" > > amit@p006n03:~/cassandra$ ant test -Dtest.name=DatabaseDescriptorRefTest > Buildfile: /home/amit/cassandra/build.xml > init: > maven-ant-tasks-localrepo: > maven-ant-tasks-download: > maven-ant-tasks-init: > maven-declare-dependencies: > maven-ant-tasks-retrieve-build: > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies.xml > init-dependencies: > [echo] Loading dependency paths from file: > /home/amit/cassandra/build/build-dependencies-sources.xml > [unzip] Expanding: > /home/amit/cassandra/build/lib/jars/org.jacoco.agent-0.7.5.201505241946.jar > into /home/amit/cassandra/build/lib/jars > check-gen-cql3-grammar: > gen-cql3-grammar: > generate-cql-html: > generate-jflex-java: > build-project: > [echo] apache-cassandra: /home/amit/cassandra/build.xml > createVersionPropFile: > [propertyfile] Updating property file: > /home/amit/cassandra/src/resources/org/apache/cassandra/config/version.properties > [copy] Copying 1 file to /home/amit/cassandra/build/classes/main > build: > build-test: > test: > [junit] WARNING: multiple versions of ant detected in path for junit > [junit] > jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class > [junit] and > jar:file:/home/amit/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > [junit] Testsuite: org.apache.cassandra.config.DatabaseDescriptorRefTest > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.006 sec > [junit] > [junit] - Standard Output --- > [junit] ERROR [main] 2017-02-26 09:56:05,169 ?:? - SLF4J: stderr > [junit] - --- > [junit] - Standard Error - > [junit] > [junit] > [junit] VIOLATION: > org.apache.cassandra.config.Config$CAPIFlashCommitlogChunkManagerType > [junit] java.lang.Exception > [junit] at > org.apache.cassandra.config.DatabaseDescriptorRefTest$1.findClass(DatabaseDescriptorRefTest.java:169) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > [junit] at
[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily
[ https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887654#comment-15887654 ] Benjamin Roth commented on CASSANDRA-13226: --- Isn't this also true for non-incremental repairs? Merkle tree calculation also triggers a flush and any repair begins with a merkle tree. So there is no need to flush as the inconsistent dataset to be streamed for repair is always contained in SSTables flushed by MT calculation before. > StreamPlan for incremental repairs flushing memtables unnecessarily > --- > > Key: CASSANDRA-13226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13226 > Project: Cassandra > Issue Type: Bug >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > Since incremental repairs are run against a fixed dataset, there's no need to > flush memtables when streaming for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12965) StreamReceiveTask causing high CPU utilization during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-12965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887642#comment-15887642 ] liangsibin commented on CASSANDRA-12965: maybe we can add -Dcassandra.available_processors=20 to lower the StreamReceiveTask threads when cassandra startup. > StreamReceiveTask causing high CPU utilization during repair > > > Key: CASSANDRA-12965 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12965 > Project: Cassandra > Issue Type: Bug >Reporter: Randy Fradin > > During a full repair run, I observed one node in my cluster using 100% cpu > (100% of all cores on a 48-core machine). When I took a stack trace I found > exactly 48 running StreamReceiveTask threads. Each was in the same block of > code in StreamReceiveTask.OnCompletionRunnable: > {noformat} > "StreamReceiveTask:8077" #1511134 daemon prio=5 os_prio=0 > tid=0x7f01520a8800 nid=0x6e77 runnable [0x7f020dfae000] >java.lang.Thread.State: RUNNABLE > at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258) > at java.util.ComparableTimSort.sort(ComparableTimSort.java:203) > at java.util.Arrays.sort(Arrays.java:1312) > at java.util.Arrays.sort(Arrays.java:1506) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:141) > at > org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:257) > at > org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:280) > at > org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:72) > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:590) > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:584) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:565) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:761) > at > org.apache.cassandra.db.DataTracker.addSSTablesToTracker(DataTracker.java:428) > at > org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:283) > at > org.apache.cassandra.db.ColumnFamilyStore.addSSTables(ColumnFamilyStore.java:1422) > at > org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:148) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > All 48 threads were in ColumnFamilyStore.addSSTables(), and specifically in > the IntervalNode constructor called from the IntervalTree constructor. > It stayed this way for maybe an hour before we restarted the node. The repair > was also generating thousands (20,000+) of tiny SSTables in a table that > previously had just 20. > I don't know enough about SSTables and ColumnFamilyStore to know if all this > CPU work is necessary or a bug, but I did notice that these tasks are run on > a thread pool constructed in StreamReceiveTask.java, so perhaps this pool > should have a thread count max less than the number of processors on the > machine, at least for machines with a lot of processors. Any reason not to do > that? Any ideas for a reasonable # or formula to cap the thread count? > Some additional info: We have never run incremental repair on this cluster, > so that is not a factor. All our tables use LCS. Unfortunately I don't have > the log files from the period saved. -- This message was sent by Atlassian JIRA (v6.3.15#6346)