[jira] [Commented] (CASSANDRA-3829) make seeds *only* be seeds, not special in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206744#comment-13206744 ] Peter Schuller commented on CASSANDRA-3829: --- I'm not sure what's making it sound like I want a free lunch :) Let me start with what I hope are the less controversial bits. # If you apply normal bootstrapping process when inserting a node into the cluster, and it happens to be a seed according to its own configuration, it will just jump into the cluster w/o streaming data. # You currently have to do rolling restarts to change the seed list. In order to make clusters easier to operate, and make it more difficult to shoot yourself in the foot, I propose that the behavior of (1) be removed. I think it makes more sense to require a special setting (such as a system property) when performing the very unusual (in production) task of setting up a new cluster from scratch. For single-node cases, we could support a mode where a node is alone and never tries to bootstrap if we are concerned with maintaining simple ./bin/cassandra -f type running of lone nodes. Fixing (2) so that the seed list is reloadable makes sense if seeds are kept relevant other than on start-up, and would in particular be even more important if we cannot agree on (1). Asking users for rolling restarts to do maintenance on a seed is IMO clearly not a good thing, even if we were to disagree about eliminating the behavior in (1). Ok - so far what I've said in this comment doesn't change the notion of seeds as something which is continually used throughout the life-time of a node. Now, if we make seeds be dynamic during runtime (minor changes in the code are needed to support this cleanly, but it's not a big deal) everyone is of course free to do whatever they want in terms of seed sources. I described an example zookeeper/serversets based case in the original filing of this ticket. For someone with infrastructure in place for multiple clusters and where these things aren't manually maintained, it's not really an issue once we reach the point of never having to do rolling restarts. But I would really like to go further and make the seed concept simpler for *everyone*. I am not proposing to *remove* seeds; only to make them *seeds only*, in the sense of initially seeding a node with information about it's cluster when it starts up for the first time (*not* as a list of special nodes that are always gossiped to). Even if we make seeds reload:able, and provide an out-of-the-box implementation that e.g. loads from a property file, it still means operators (or their tools) have to actively be aware of seeds and the fact that special action is required during some tasks, if an affected node happens to be a seed. I believe that for operational simplicity, it would be better if seeds would only enter the consciousness (or tool) on initial bootstrap where they are *fundamentally* absolutely required no matter what (for obvious reason there must be *some* source, as you point out, pointing a node to the appropriate cluster). As discussed, this *would* be a slight regression in terms of partitioning in the sense that if a node goes down for a while, and goes back up, and all nodes it knows about have either changed IP addresses or are down - then yes, you would introduce a partition. But look at this this way; in my opinion this can easily be considered operator error. If you point clients to a set of nodes in a cluster and make hugely significant topology changes while a node being used by clients is down, that's a mistake. It's worth nothing however that it's only slightly more easy to make that mistake than the potential for a mistake *already there right now* - in the exact same scenario, you are already in trouble if if all of the nodes listed in your seeds list are among those either down or having changed IP address. Nor granted if you change the IP address of the seed you will deploy that change; but what if the node that went down just booted up (never got deployed to)? You still have, in practice, the partitioning of the cluster. So in short, I believe that for practical use-cases, removing the significance of seeds in all but initial seeding has minimal negative consequences, while the positive consequences in terms of operational simplicity are very much significant. That said, if am I truly the only person who thinks this would be an important improvement, then we can at least make seeds dynamic and provide a simple out-of-the-box way of using that feature (property file based seeds probably). I'll submit the necessary patches in a separate ticket if so. If so, I will also try to make time to empirically test how the propagation time in the cluster is affected by cluster size (because of CASSANDRA-3830). make seeds *only* be seeds, not special in
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206745#comment-13206745 ] Peter Schuller commented on CASSANDRA-3830: --- I had written a response here, but I assume I must have failed to submit it and lost track of the browser tab or something. What you describe is not the behavior of the Gossiper. It picks a random node to gossip to. Then, unless the node *happened* to also be a seed node, it picks a random *seed node* to gossip to *as well*. The less than number of seeds you're mentioning is presumably due to the comments in the code before the gossip to seed: {code} /* Gossip to a seed if we did not do so above, or we have seen less nodes than there are seeds. This prevents partitions where each group of nodes is only gossiping to a subset of the seeds. The most straightforward check would be to check that all the seeds have been verified either as live or unreachable. To avoid that computation each round, we reason that: either all the live nodes are seeds, in which case non-seeds that come online will introduce themselves to a member of the ring by definition, or there is at least one non-seed node in the list, in which case eventually someone will gossip to it, and then do a gossip to a random seed from the gossipedToSeed check. See CASSANDRA-150 for more exposition. */ if (!gossipedToSeed || liveEndpoints.size() seeds.size()) doGossipToSeed(prod); {code} If you look carefully though, you'll see that the number of live endpoints is *only* relevant in the sense that it forces *always* gossiping to a seed even if we already did. In the normal case of almost all cases, we have more live endpoints than seeds, and we'll still gossip to seeds because {{!gossipedToSeed}}. gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In
Git Push Summary
Updated Tags: refs/tags/cassandra-0.8.10 [created] c45a17cd0
Git Push Summary
Updated Tags: refs/tags/0.8.10-tentative [deleted] 038b8f212
svn commit: r1243474 - in /cassandra/site: publish/download/index.html src/settings.py
Author: slebresne Date: Mon Feb 13 10:38:58 2012 New Revision: 1243474 URL: http://svn.apache.org/viewvc?rev=1243474view=rev Log: Update website for 0.8.10 release Modified: cassandra/site/publish/download/index.html cassandra/site/src/settings.py Modified: cassandra/site/publish/download/index.html URL: http://svn.apache.org/viewvc/cassandra/site/publish/download/index.html?rev=1243474r1=1243473r2=1243474view=diff == --- cassandra/site/publish/download/index.html (original) +++ cassandra/site/publish/download/index.html Mon Feb 13 10:38:58 2012 @@ -103,16 +103,16 @@ p Previous stable branches of Cassandra continue to see periodic maintenance for some time after a new major release is made. The lastest release on the - 0.8 branch is 0.8.9 (released on - 2011-12-14). + 0.8 branch is 0.8.10 (released on + 2012-02-13). /p ul li -a class=filename href=http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.9/apache-cassandra-0.8.9-bin.tar.gz;apache-cassandra-0.8.9-bin.tar.gz/a -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-bin.tar.gz.asc;PGP/a] -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-bin.tar.gz.md5;MD5/a] -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-bin.tar.gz.sha1;SHA1/a] +a class=filename href=http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.10/apache-cassandra-0.8.10-bin.tar.gz;apache-cassandra-0.8.10-bin.tar.gz/a +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-bin.tar.gz.asc;PGP/a] +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-bin.tar.gz.md5;MD5/a] +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-bin.tar.gz.sha1;SHA1/a] /li /ul @@ -157,10 +157,10 @@ /li li -a class=filename href=http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.9/apache-cassandra-0.8.9-src.tar.gz;apache-cassandra-0.8.9-src.tar.gz/a -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-src.tar.gz.asc;PGP/a] -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-src.tar.gz.md5;MD5/a] -[a href=http://www.apache.org/dist/cassandra/0.8.9/apache-cassandra-0.8.9-src.tar.gz.sha1;SHA1/a] +a class=filename href=http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.8.10/apache-cassandra-0.8.10-src.tar.gz;apache-cassandra-0.8.10-src.tar.gz/a +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-src.tar.gz.asc;PGP/a] +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-src.tar.gz.md5;MD5/a] +[a href=http://www.apache.org/dist/cassandra/0.8.10/apache-cassandra-0.8.10-src.tar.gz.sha1;SHA1/a] /li Modified: cassandra/site/src/settings.py URL: http://svn.apache.org/viewvc/cassandra/site/src/settings.py?rev=1243474r1=1243473r2=1243474view=diff == --- cassandra/site/src/settings.py (original) +++ cassandra/site/src/settings.py Mon Feb 13 10:38:58 2012 @@ -92,8 +92,8 @@ SITE_POST_PROCESSORS = { } class CassandraDef(object): -oldstable_version = '0.8.9' -oldstable_release_date = '2011-12-14' +oldstable_version = '0.8.10' +oldstable_release_date = '2012-02-13' oldstable_exists = True veryoldstable_version = '0.7.10' veryoldstable_release_date = '2011-10-31'
[jira] [Commented] (CASSANDRA-3872) Sub-columns removal is broken in 1.1
[ https://issues.apache.org/jira/browse/CASSANDRA-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206806#comment-13206806 ] Sylvain Lebresne commented on CASSANDRA-3872: - I do not pretend this reduce line of codes, but I do think that it makes it easier to not make subtle mistakes. Currently, there is a mismatch between how Column (the class) and the two IColumnContainer classes (CF and SC) handles getLocalDeletionTime() for non-deleted. The former uses MAX_VALUE, the latter uses MIN_VALUE. The lack of consistency alone is annoying but as long as SC lives it is made much worst by the fact that SC is both a IColumn and a IColumnContainer. The attached patch tries to make things more consistent. The localDeletionTime is here for the purpose of tombstone garbage collection, so it seems to me that it is cleaner to use it for that purpose and that purpose only. In other words, with this patch, {{(getLocalDeletionTime() gcbefore)}} tells you without ambiguity if you're dealing with a gcable tombstone or not. Now there is the fact that live but empty containers are not returned to the user. I believe that was one of the reason of using MIN_VALUE for live containers. But imho this is a hack and it's much more clear in removeDeleted to read: {noformat} if (cf.getColumnCount() == 0 (!cf.isMarkedForDelete() || cf.getLocalDeletionTime() gcBefore)) {noformat} which directly translate into: if the cf is empty and it's either a gcable tombstone or a live cf, we can skip it, rather that having to check the code of ColumnFamily to understand why that does skip live empty CF *and* to have to remember each time you use CF.localDeletionTime that it may be MIN_VALUE for non-deleted CF and assert if it matters or not. Sub-columns removal is broken in 1.1 Key: CASSANDRA-3872 URL: https://issues.apache.org/jira/browse/CASSANDRA-3872 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.0 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 1.1.0 Attachments: 3872.patch CASSANDRA-3716 actually broke sub-columns deletion. The reason is that in QueryFilter.isRelevant, we've switched in checking getLocalDeletionTime() only (without looking for isMarkedForDelete). But for columns containers (in this case SuperColumn), the default local deletion time when not deleted is Integer.MIN_VALUE. In other words, a SC with only non-gcable tombstones will be considered as not relevant (while it should). This is caught by two unit tests (RemoveSuperColumnTest and RemoveSubColumnTest) that are failing currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3555) Bootstrapping to handle more failure
[ https://issues.apache.org/jira/browse/CASSANDRA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Low updated CASSANDRA-3555: --- Attachment: 3555-bootstrap-with-down-node-test.txt 3555-bootstrap-with-down-node.txt Bootstrapping to handle more failure Key: CASSANDRA-3555 URL: https://issues.apache.org/jira/browse/CASSANDRA-3555 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.5 Reporter: Vijay Assignee: Vijay Fix For: 1.2 Attachments: 3555-bootstrap-with-down-node-test.txt, 3555-bootstrap-with-down-node.txt We might want to handle failures in bootstrapping: 1) When none of the Seeds are available to communicate then throw exception 2) When any one of the node which it is bootstrapping fails then try next in the list (and if the list is exhausted then throw exception). 3) Clean all the existing files in the data Dir before starting just in case we retry. 4) Currently when one node is down in the cluster the bootstrapping will fail, because the bootstrapping node doesnt understand which one is actually down. Also print the nt ring in the logs so we can troubleshoot later if it fails. Currently if any one of the above happens the node is skipping the bootstrap or hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3555) Bootstrapping to handle more failure
[ https://issues.apache.org/jira/browse/CASSANDRA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206810#comment-13206810 ] Richard Low edited comment on CASSANDRA-3555 at 2/13/12 11:21 AM: -- 3555-bootstrap-with-down-node.txt contains a fix for 4, to stop a bootstrapping node choosing an unavailable node. 3555-bootstrap-with-down-node-test.txt is required to make BootStrapperTest pass with the patch. Patches against trunk; same fix also works on 1.0 branch. was (Author: richardlow): A fix for 4, to stop a bootstrapping node choosing an unavailable node. Bootstrapping to handle more failure Key: CASSANDRA-3555 URL: https://issues.apache.org/jira/browse/CASSANDRA-3555 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.5 Reporter: Vijay Assignee: Vijay Fix For: 1.2 Attachments: 3555-bootstrap-with-down-node-test.txt, 3555-bootstrap-with-down-node.txt We might want to handle failures in bootstrapping: 1) When none of the Seeds are available to communicate then throw exception 2) When any one of the node which it is bootstrapping fails then try next in the list (and if the list is exhausted then throw exception). 3) Clean all the existing files in the data Dir before starting just in case we retry. 4) Currently when one node is down in the cluster the bootstrapping will fail, because the bootstrapping node doesnt understand which one is actually down. Also print the nt ring in the logs so we can troubleshoot later if it fails. Currently if any one of the above happens the node is skipping the bootstrap or hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
git commit: Have secondary indexes inherit compression and compaction properties from parent CF
Updated Branches: refs/heads/cassandra-1.1 5f5e00bc9 - 6a6bf3cf1 Have secondary indexes inherit compression and compaction properties from parent CF patch by slebresne; reviewed by xedin for CASSANDRA-3877 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a6bf3cf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a6bf3cf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a6bf3cf Branch: refs/heads/cassandra-1.1 Commit: 6a6bf3cf1aac6099c38c50b8d9d46f4ddea5a323 Parents: 5f5e00b Author: Sylvain Lebresne sylv...@datastax.com Authored: Thu Feb 9 16:11:57 2012 +0100 Committer: Sylvain Lebresne sylv...@datastax.com Committed: Mon Feb 13 12:22:40 2012 +0100 -- CHANGES.txt|5 + .../org/apache/cassandra/config/CFMetaData.java| 15 --- .../cassandra/db/index/SecondaryIndexManager.java |6 ++ 3 files changed, 23 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 5d9eaf9..e115a2a 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -71,6 +71,11 @@ * CQL support for altering key_validation_class in ALTER TABLE (CASSANDRA-3781) * turn compression on by default (CASSANDRA-3871) * make hexToBytes refuse invalid input (CASSANDRA-2851) + * Make secondary indexes CF inherit compression and compaction from their + parent CF (CASSANDRA-3877) +Merged from 1.0: + * Only snapshot CF being compacted for snapshot_before_compaction + (CASSANDRA-3803) 1.0.8 http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/src/java/org/apache/cassandra/config/CFMetaData.java -- diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java b/src/java/org/apache/cassandra/config/CFMetaData.java index defa6cf..06b1adf 100644 --- a/src/java/org/apache/cassandra/config/CFMetaData.java +++ b/src/java/org/apache/cassandra/config/CFMetaData.java @@ -279,9 +279,18 @@ public final class CFMetaData .keyValidator(info.getValidator()) .readRepairChance(0.0) .dclocalReadRepairChance(0.0) - .gcGraceSeconds(parent.gcGraceSeconds) - .minCompactionThreshold(parent.minCompactionThreshold) - .maxCompactionThreshold(parent.maxCompactionThreshold); + .reloadSecondaryIndexMetadata(parent); +} + +public CFMetaData reloadSecondaryIndexMetadata(CFMetaData parent) +{ +gcGraceSeconds(parent.gcGraceSeconds); +minCompactionThreshold(parent.minCompactionThreshold); +maxCompactionThreshold(parent.maxCompactionThreshold); +compactionStrategyClass(parent.compactionStrategyClass); +compactionStrategyOptions(parent.compactionStrategyOptions); +compressionParameters(parent.compressionParameters);; +return this; } // Create a new CFMD by changing just the cfName http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java -- diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java index 3758e9b..aa16db2 100644 --- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java +++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java @@ -93,6 +93,12 @@ public class SecondaryIndexManager for (ColumnDefinition cdef : baseCfs.metadata.getColumn_metadata().values()) if (cdef.getIndexType() != null !indexedColumnNames.contains(cdef.name)) addIndexedColumn(cdef); + +for (ColumnFamilyStore cfs : getIndexesBackedByCfs()) +{ +cfs.metadata.reloadSecondaryIndexMetadata(baseCfs.metadata); +cfs.reload(); +} }
[1/2] git commit: Merge branch 'cassandra-1.1' into trunk
Updated Branches: refs/heads/trunk 30ee8337e - ddc771dc5 Merge branch 'cassandra-1.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ddc771dc Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ddc771dc Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ddc771dc Branch: refs/heads/trunk Commit: ddc771dc5f1af98dde42d494f91cc398929c8515 Parents: 30ee833 6a6bf3c Author: Sylvain Lebresne sylv...@datastax.com Authored: Mon Feb 13 12:27:00 2012 +0100 Committer: Sylvain Lebresne sylv...@datastax.com Committed: Mon Feb 13 12:27:00 2012 +0100 -- CHANGES.txt|5 + .../org/apache/cassandra/config/CFMetaData.java| 15 --- .../cassandra/db/index/SecondaryIndexManager.java |6 ++ 3 files changed, 23 insertions(+), 3 deletions(-) --
[2/2] git commit: Have secondary indexes inherit compression and compaction properties from parent CF
Have secondary indexes inherit compression and compaction properties from parent CF patch by slebresne; reviewed by xedin for CASSANDRA-3877 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6a6bf3cf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6a6bf3cf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6a6bf3cf Branch: refs/heads/trunk Commit: 6a6bf3cf1aac6099c38c50b8d9d46f4ddea5a323 Parents: 5f5e00b Author: Sylvain Lebresne sylv...@datastax.com Authored: Thu Feb 9 16:11:57 2012 +0100 Committer: Sylvain Lebresne sylv...@datastax.com Committed: Mon Feb 13 12:22:40 2012 +0100 -- CHANGES.txt|5 + .../org/apache/cassandra/config/CFMetaData.java| 15 --- .../cassandra/db/index/SecondaryIndexManager.java |6 ++ 3 files changed, 23 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 5d9eaf9..e115a2a 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -71,6 +71,11 @@ * CQL support for altering key_validation_class in ALTER TABLE (CASSANDRA-3781) * turn compression on by default (CASSANDRA-3871) * make hexToBytes refuse invalid input (CASSANDRA-2851) + * Make secondary indexes CF inherit compression and compaction from their + parent CF (CASSANDRA-3877) +Merged from 1.0: + * Only snapshot CF being compacted for snapshot_before_compaction + (CASSANDRA-3803) 1.0.8 http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/src/java/org/apache/cassandra/config/CFMetaData.java -- diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java b/src/java/org/apache/cassandra/config/CFMetaData.java index defa6cf..06b1adf 100644 --- a/src/java/org/apache/cassandra/config/CFMetaData.java +++ b/src/java/org/apache/cassandra/config/CFMetaData.java @@ -279,9 +279,18 @@ public final class CFMetaData .keyValidator(info.getValidator()) .readRepairChance(0.0) .dclocalReadRepairChance(0.0) - .gcGraceSeconds(parent.gcGraceSeconds) - .minCompactionThreshold(parent.minCompactionThreshold) - .maxCompactionThreshold(parent.maxCompactionThreshold); + .reloadSecondaryIndexMetadata(parent); +} + +public CFMetaData reloadSecondaryIndexMetadata(CFMetaData parent) +{ +gcGraceSeconds(parent.gcGraceSeconds); +minCompactionThreshold(parent.minCompactionThreshold); +maxCompactionThreshold(parent.maxCompactionThreshold); +compactionStrategyClass(parent.compactionStrategyClass); +compactionStrategyOptions(parent.compactionStrategyOptions); +compressionParameters(parent.compressionParameters);; +return this; } // Create a new CFMD by changing just the cfName http://git-wip-us.apache.org/repos/asf/cassandra/blob/6a6bf3cf/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java -- diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java index 3758e9b..aa16db2 100644 --- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java +++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java @@ -93,6 +93,12 @@ public class SecondaryIndexManager for (ColumnDefinition cdef : baseCfs.metadata.getColumn_metadata().values()) if (cdef.getIndexType() != null !indexedColumnNames.contains(cdef.name)) addIndexedColumn(cdef); + +for (ColumnFamilyStore cfs : getIndexesBackedByCfs()) +{ +cfs.metadata.reloadSecondaryIndexMetadata(baseCfs.metadata); +cfs.reload(); +} }
[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206830#comment-13206830 ] Dave Brosius commented on CASSANDRA-3772: - With 10,000 inserts i'm seeing the same ratios, which i'm having a hard time describing why as again the hash function itself is about the same time. Evaluate Murmur3-based partitioner -- Key: CASSANDRA-3772 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772 Project: Cassandra Issue Type: New Feature Reporter: Jonathan Ellis Fix For: 1.2 Attachments: try_murmur3.diff MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206857#comment-13206857 ] Brandon Williams commented on CASSANDRA-3830: - bq. What you describe is not the behavior of the Gossiper. It picks a random node to gossip to. Then, unless the node happened to also be a seed node, it picks a random seed node to gossip to as well. Right. bq. The less than number of seeds you're mentioning What I meant to say is this is the only special-case for seeds; gossiping to at least one seed every round is the normal case, as you said. gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In both cases, the mean interval is around 1500 milliseconds. I don't feel I have a good grasp of whether this is incidental or guaranteed, and it would be good to at least empirically test propagation time w/o seeds at differnet cluster sizes; it's supposed to be un-affected by cluster size ({{RING_DELAY}} is static for this reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-3862: Attachment: 3862_v3.patch RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Daniel Doubleday Attachments: 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206870#comment-13206870 ] Sylvain Lebresne commented on CASSANDRA-3862: - Attaching v3. This mostly fix a but of the previous version where sentinels were not handled correctly in cacheRow(). I've also switch back to getRawCachedRow. I'm not fully sure what you proposed to split exactly, but v3 does split cacheRow() in the hope of increasing clarity. RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Daniel Doubleday Attachments: 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206881#comment-13206881 ] Jeremy Hanna commented on CASSANDRA-3843: - We'll be upgrading to 1.0.8 as soon as we can, but this seems like a significant issue for anyone doing range scans - does it make sense to backport to 0.8.x? Unnecessary ReadRepair request during RangeScan Key: CASSANDRA-3843 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Philip Andronov Assignee: Jonathan Ellis Fix For: 1.0.8 Attachments: 3843-v2.txt, 3843.txt During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that. With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :( It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch. Code explanations: {code:title=RangeSliceResponseResolver.java|borderStyle=solid} class RangeSliceResponseResolver { // private class Reducer extends MergeIterator.ReducerPairRow,InetAddress, Row { // protected Row getReduced() { ColumnFamily resolved = versions.size() 1 ? RowRepairResolver.resolveSuperset(versions) : versions.get(0); if (versions.size() sources.size()) { for (InetAddress source : sources) { if (!versionSources.contains(source)) { // [PA] Here we are adding null ColumnFamily. // later it will be compared with the desired // version and will give us fake difference which // forces Cassandra to send ReadRepair to a given source versions.add(null); versionSources.add(source); } } } // if (resolved != null) repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources)); // } } } {code} {code:title=RowRepairResolver.java|borderStyle=solid} public class RowRepairResolver extends AbstractRowResolver { // public static ListIAsyncResult scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey? key, ListColumnFamily versions, ListInetAddress endpoints) { ListIAsyncResult results = new ArrayListIAsyncResult(versions.size()); for (int i = 0; i versions.size(); i++) { // On some iteration we have to compare null and resolved which are obviously // not equals, so it will fire a ReadRequest, however it is not needed here ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved); if (diffCf == null) continue; // {code} Imagine the following situation: NodeA has X.1 // row X with the version 1 NodeB has X.2 NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2 During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content: NodeA has X.12 NodeB has X.12 which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206888#comment-13206888 ] Brandon Williams commented on CASSANDRA-3883: - My original description here is incorrect; I can't repro the 198 count (not sure what happened there) but now the wide row tests counts 1033 'word1' items. As far as I can tell, WordCountSetup actually inserts a total of 2002 'word1' matches, one in each of text1 and text2, and a thousand in each of text3 and text4. I'm not sure what is causing the count discrepancy, but in any case 1033 is far above the batch size of 99, and and the 4th word count test using a secondary index is counting 197 items, so I think something may be fundamentally wrong with word count. That said, I've been adding wide row support to pig and testing with that, and the problem of not being able to completely paginate wide rows is a definite problem. CFIF WideRowIterator only returns batch size columns Key: CASSANDRA-3883 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Brandon Williams Fix For: 1.1.0 Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3894) StorageService.getBootstrapToken() should check all endpoints/tokens for collissions
[ https://issues.apache.org/jira/browse/CASSANDRA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3894: -- Reviewer: thepaul StorageService.getBootstrapToken() should check all endpoints/tokens for collissions Key: CASSANDRA-3894 URL: https://issues.apache.org/jira/browse/CASSANDRA-3894 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor It currently checks all endpoints that are either bootstrapping or part of the endpoint map. That covers leaving nodes, but doesn't cover moving nodes in the sense that the token a node is moving to is not checked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3900) StorageService.handleStateNormal() should not have to deal with removal of an endpoint from moving
[ https://issues.apache.org/jira/browse/CASSANDRA-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3900: -- Reviewer: thepaul StorageService.handleStateNormal() should not have to deal with removal of an endpoint from moving -- Key: CASSANDRA-3900 URL: https://issues.apache.org/jira/browse/CASSANDRA-3900 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Removing an endpoint from the moving endpoints should be internal to TokenMetadata and implied by declaring that an endpoint turned normal. Need to consider whether simply making this change is introducing a bug in some subtle edge case where {{handleStateNormal()}} otherwise ignores the request but we should still remove from moving. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3417) InvocationTargetException ConcurrentModificationException at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206935#comment-13206935 ] Jonathan Ellis commented on CASSANDRA-3417: --- I'm getting 3 of 5 hunks failing to apply to 1.0, did you switch branches? InvocationTargetException ConcurrentModificationException at startup Key: CASSANDRA-3417 URL: https://issues.apache.org/jira/browse/CASSANDRA-3417 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.0 Reporter: Joaquin Casares Assignee: Peter Schuller Priority: Minor Fix For: 1.0.8 Attachments: 3417-2.txt, 3417-3.txt, 3417.txt, CASSANDRA-3417-tokenmap-v2.txt, CASSANDRA-3417-tokenmap-v3.txt, CASSANDRA-3417-tokenmap.txt I was starting up the new DataStax AMI where the seed starts first and 34 nodes would latch on together. So far things have been working decently for launching, but right now I just got this during startup. {CODE} ubuntu@ip-10-40-190-143:~$ sudo cat /var/log/cassandra/output.log INFO 09:24:38,453 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 09:24:38,456 Heap size: 1936719872/1937768448 INFO 09:24:38,457 Classpath: /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.3.jar:/usr/share/cassandra/apache-cassandra-1.0.0.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.0.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar INFO 09:24:39,891 JNA mlockall successful INFO 09:24:39,901 Loading settings from file:/etc/cassandra/cassandra.yaml INFO 09:24:40,057 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 09:24:40,069 Global memtable threshold is enabled at 616MB INFO 09:24:40,159 EC2Snitch using region: us-east, zone: 1d. INFO 09:24:40,475 Creating new commitlog segment /raid0/cassandra/commitlog/CommitLog-1319793880475.log INFO 09:24:40,486 Couldn't detect any schema definitions in local storage. INFO 09:24:40,486 Found table data in data directories. Consider using the CLI to define your schema. INFO 09:24:40,497 No commitlog files found; skipping replay INFO 09:24:40,501 Cassandra version: 1.0.0 INFO 09:24:40,502 Thrift API version: 19.18.0 INFO 09:24:40,502 Loading persisted ring state INFO 09:24:40,506 Starting up server gossip INFO 09:24:40,529 Enqueuing flush of Memtable-LocationInfo@1388314661(190/237 serialized/live bytes, 4 ops) INFO 09:24:40,530 Writing Memtable-LocationInfo@1388314661(190/237 serialized/live bytes, 4 ops) INFO 09:24:40,600 Completed flushing /raid0/cassandra/data/system/LocationInfo-h-1-Data.db (298 bytes) INFO 09:24:40,613 Ec2Snitch adding ApplicationState ec2region=us-east ec2zone=1d INFO 09:24:40,621 Starting Messaging Service on /10.40.190.143:7000 INFO 09:24:40,628 Joining: waiting for ring and schema information INFO 09:24:43,389 InetAddress /10.194.29.156 is now dead. INFO 09:24:43,391 InetAddress /10.85.11.38 is now dead. INFO 09:24:43,392 InetAddress /10.34.42.28 is now dead. INFO 09:24:43,393 InetAddress /10.77.63.49 is now dead. INFO 09:24:43,394 InetAddress /10.194.22.191 is now dead. INFO 09:24:43,395 InetAddress /10.34.74.58 is now dead. INFO 09:24:43,395 Node /10.34.33.16 is now part of the cluster INFO 09:24:43,396 InetAddress /10.34.33.16 is now UP INFO 09:24:43,397 Enqueuing flush of Memtable-LocationInfo@1629818866(20/25 serialized/live bytes, 1 ops) INFO 09:24:43,398 Writing Memtable-LocationInfo@1629818866(20/25 serialized/live bytes, 1 ops) INFO 09:24:43,417 Completed flushing /raid0/cassandra/data/system/LocationInfo-h-2-Data.db (74 bytes) INFO
[jira] [Commented] (CASSANDRA-3829) make seeds *only* be seeds, not special in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206944#comment-13206944 ] Jonathan Ellis commented on CASSANDRA-3829: --- bq. I propose that the behavior of (1) be removed Okay, I'm with you so far. But as you note, this impacts the usability of single-node clusters which is where virtually *everybody* starts. So, I'll need to see a solution that doesn't make life more confusion for that overwhelming majority. I get that you don't like the current tradeoffs but I haven't seen a better proposal yet. (I'll go ahead and pre-emptively -1 pecial environment variables...) bq. Fixing (2) so that the seed list is reloadable I still haven't seen a case when this, or special-casing seeds to prevent gossip partitions, causes real problems. Whereas I was around when we added the gossip-partition-prevention code, so I *do* know the problems that *prevents*. make seeds *only* be seeds, not special in gossip -- Key: CASSANDRA-3829 URL: https://issues.apache.org/jira/browse/CASSANDRA-3829 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor First, a little bit of framing on how seeds work: The concept of seed hosts makes fundamental sense; you need to seed a new node with some information required in order to join a cluster. Seed hosts is the information Cassandra uses for this purpose. But seed hosts play a role even after the initial start-up of a new node in a ring. Specifically, seed hosts continue to be gossiped to separately by the Gossiper throughout the life of a node and the cluster. Generally, operators must be careful to ensure that all nodes in a cluster are appropriately configured to refer to an overlapping set of seed hosts. Strictly speaking this should not be necessary (see further down though), but is the general recommendation. An unfortunate side-effect of this is that whenever you are doing ring management, such as replacing nodes, removing nodes, etc, you have to keep in mind which nodes are seeds. For example, if you bring a new node into the cluster, doing everything right with token assignment and auto_bootstrap=true, it will just enter the cluster without bootstrap - causing inconsistent reads. This is dangerous. And worse - changing the notion of which nodes are seeds across a cluster requires a *rolling restart*. It can be argued that it should actually be okay for nodes other than the one being fiddled with to incorrectly treat the fiddled-with node as a seed node, but this fact is highly opaque to most users that are not intimately familiar with Cassandra internals. This adds additional complexity to operations, as it introduces a reason why you cannot view the ring as completely homogeneous, despite the fundamental idea of Cassandra that all nodes should be equal. Now, fast forward a bit to what we are doing over here to avoid this problem: We have a zookeeper based systems for keeping track of hosts in a cluster, which is used by our Cassandra client to discover nodes to talk to. This works well. In order to avoid the need to manually keep track of seeds, we wanted to make seeds be automatically discoverable in order to eliminate as an operational concern. We have implemented a seed provider that does this for us, based on the data we keep in zookeeper. We could see essentially three ways of plugging this in: * (1) We could simply rely on not needing overlapping seeds and grab whatever we have when a node starts. * (2) We could do something like continually treat all other nodes as seeds by dynamically changing the seed list (involves some other changes like having the Gossiper update it's notion of seeds. * (3) We could completely eliminate the use of seeds *except* for the very specific purpose of initial start-up of an unbootstrapped node, and keep using a static (for the duration of the node's uptime) seed list. (3) was attractive because it felt like this was the original intent of seeds; that they be used for *seeding*, and not be constantly required during cluster operation once nodes are already joined. Now before I make the suggestion, let me explain how we are currently (though not yet in production) handling seeds and start-up. First, we have the following relevant cases to consider during a normal start-up: * (a) we are starting up a cluster for the very first time * (b) we are starting up a new clean node in order to join it to a pre-existing cluster * (c) we are starting up a pre-existing already joined node in a pre-existing cluster First, we proceeded on the assumption that we wanted to remove the use of
[3/3] git commit: clean up redundant state lookups patch by Dave Brosius; reviewed by jbellis for CASSANDRA-3891
clean up redundant state lookups patch by Dave Brosius; reviewed by jbellis for CASSANDRA-3891 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9a842c7b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9a842c7b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9a842c7b Branch: refs/heads/cassandra-1.1 Commit: 9a842c7b317e6f1e6e156ccb531e34bb769c979f Parents: 6a6bf3c Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 09:57:34 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 09:57:34 2012 -0600 -- src/java/org/apache/cassandra/db/SystemTable.java |4 +- .../apache/cassandra/thrift/CassandraServer.java | 121 --- 2 files changed, 69 insertions(+), 56 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a842c7b/src/java/org/apache/cassandra/db/SystemTable.java -- diff --git a/src/java/org/apache/cassandra/db/SystemTable.java b/src/java/org/apache/cassandra/db/SystemTable.java index b0b5c60..3ced0a2 100644 --- a/src/java/org/apache/cassandra/db/SystemTable.java +++ b/src/java/org/apache/cassandra/db/SystemTable.java @@ -277,12 +277,12 @@ public class SystemTable SortedSetByteBuffer cols = new TreeSetByteBuffer(BytesType.instance); cols.add(CLUSTERNAME); QueryFilter filter = QueryFilter.getNamesFilter(decorate(LOCATION_KEY), new QueryPath(STATUS_CF), cols); -ColumnFamily cf = table.getColumnFamilyStore(STATUS_CF).getColumnFamily(filter); +ColumnFamilyStore cfs = table.getColumnFamilyStore(STATUS_CF); +ColumnFamily cf = cfs.getColumnFamily(filter); if (cf == null) { // this is a brand new node -ColumnFamilyStore cfs = table.getColumnFamilyStore(STATUS_CF); if (!cfs.getSSTables().isEmpty()) throw new ConfigurationException(Found system table files, but they couldn't be loaded!); http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a842c7b/src/java/org/apache/cassandra/thrift/CassandraServer.java -- diff --git a/src/java/org/apache/cassandra/thrift/CassandraServer.java b/src/java/org/apache/cassandra/thrift/CassandraServer.java index f30a130..4e141b4 100644 --- a/src/java/org/apache/cassandra/thrift/CassandraServer.java +++ b/src/java/org/apache/cassandra/thrift/CassandraServer.java @@ -92,21 +92,16 @@ public class CassandraServer implements Cassandra.Iface public ClientState state() { SocketAddress remoteSocket = SocketSessionManagementService.remoteSocket.get(); -ClientState retval = null; -if (null != remoteSocket) -{ -retval = SocketSessionManagementService.instance.get(remoteSocket); -if (null == retval) -{ -retval = new ClientState(); -SocketSessionManagementService.instance.put(remoteSocket, retval); -} -} -else +if (remoteSocket == null) +return clientState.get(); + +ClientState cState = SocketSessionManagementService.instance.get(remoteSocket); +if (cState == null) { -retval = clientState.get(); +cState = new ClientState(); +SocketSessionManagementService.instance.put(remoteSocket, cState); } -return retval; +return cState; } protected MapDecoratedKey, ColumnFamily readColumnFamily(ListReadCommand commands, ConsistencyLevel consistency_level) @@ -318,8 +313,9 @@ public class CassandraServer implements Cassandra.Iface { logger.debug(get_slice); -state().hasColumnFamilyAccess(column_parent.column_family, Permission.READ); -return multigetSliceInternal(state().getKeyspace(), Collections.singletonList(key), column_parent, predicate, consistency_level).get(key); +ClientState cState = state(); +cState.hasColumnFamilyAccess(column_parent.column_family, Permission.READ); +return multigetSliceInternal(cState.getKeyspace(), Collections.singletonList(key), column_parent, predicate, consistency_level).get(key); } public MapByteBuffer, ListColumnOrSuperColumn multiget_slice(ListByteBuffer keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) @@ -327,8 +323,9 @@ public class CassandraServer implements Cassandra.Iface { logger.debug(multiget_slice); -state().hasColumnFamilyAccess(column_parent.column_family, Permission.READ); -return multigetSliceInternal(state().getKeyspace(), keys, column_parent, predicate,
[2/3] git commit: clean up redundant state lookups patch by Dave Brosius; reviewed by jbellis for CASSANDRA-3891
clean up redundant state lookups patch by Dave Brosius; reviewed by jbellis for CASSANDRA-3891 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9a842c7b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9a842c7b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9a842c7b Branch: refs/heads/trunk Commit: 9a842c7b317e6f1e6e156ccb531e34bb769c979f Parents: 6a6bf3c Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 09:57:34 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 09:57:34 2012 -0600 -- src/java/org/apache/cassandra/db/SystemTable.java |4 +- .../apache/cassandra/thrift/CassandraServer.java | 121 --- 2 files changed, 69 insertions(+), 56 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a842c7b/src/java/org/apache/cassandra/db/SystemTable.java -- diff --git a/src/java/org/apache/cassandra/db/SystemTable.java b/src/java/org/apache/cassandra/db/SystemTable.java index b0b5c60..3ced0a2 100644 --- a/src/java/org/apache/cassandra/db/SystemTable.java +++ b/src/java/org/apache/cassandra/db/SystemTable.java @@ -277,12 +277,12 @@ public class SystemTable SortedSetByteBuffer cols = new TreeSetByteBuffer(BytesType.instance); cols.add(CLUSTERNAME); QueryFilter filter = QueryFilter.getNamesFilter(decorate(LOCATION_KEY), new QueryPath(STATUS_CF), cols); -ColumnFamily cf = table.getColumnFamilyStore(STATUS_CF).getColumnFamily(filter); +ColumnFamilyStore cfs = table.getColumnFamilyStore(STATUS_CF); +ColumnFamily cf = cfs.getColumnFamily(filter); if (cf == null) { // this is a brand new node -ColumnFamilyStore cfs = table.getColumnFamilyStore(STATUS_CF); if (!cfs.getSSTables().isEmpty()) throw new ConfigurationException(Found system table files, but they couldn't be loaded!); http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a842c7b/src/java/org/apache/cassandra/thrift/CassandraServer.java -- diff --git a/src/java/org/apache/cassandra/thrift/CassandraServer.java b/src/java/org/apache/cassandra/thrift/CassandraServer.java index f30a130..4e141b4 100644 --- a/src/java/org/apache/cassandra/thrift/CassandraServer.java +++ b/src/java/org/apache/cassandra/thrift/CassandraServer.java @@ -92,21 +92,16 @@ public class CassandraServer implements Cassandra.Iface public ClientState state() { SocketAddress remoteSocket = SocketSessionManagementService.remoteSocket.get(); -ClientState retval = null; -if (null != remoteSocket) -{ -retval = SocketSessionManagementService.instance.get(remoteSocket); -if (null == retval) -{ -retval = new ClientState(); -SocketSessionManagementService.instance.put(remoteSocket, retval); -} -} -else +if (remoteSocket == null) +return clientState.get(); + +ClientState cState = SocketSessionManagementService.instance.get(remoteSocket); +if (cState == null) { -retval = clientState.get(); +cState = new ClientState(); +SocketSessionManagementService.instance.put(remoteSocket, cState); } -return retval; +return cState; } protected MapDecoratedKey, ColumnFamily readColumnFamily(ListReadCommand commands, ConsistencyLevel consistency_level) @@ -318,8 +313,9 @@ public class CassandraServer implements Cassandra.Iface { logger.debug(get_slice); -state().hasColumnFamilyAccess(column_parent.column_family, Permission.READ); -return multigetSliceInternal(state().getKeyspace(), Collections.singletonList(key), column_parent, predicate, consistency_level).get(key); +ClientState cState = state(); +cState.hasColumnFamilyAccess(column_parent.column_family, Permission.READ); +return multigetSliceInternal(cState.getKeyspace(), Collections.singletonList(key), column_parent, predicate, consistency_level).get(key); } public MapByteBuffer, ListColumnOrSuperColumn multiget_slice(ListByteBuffer keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) @@ -327,8 +323,9 @@ public class CassandraServer implements Cassandra.Iface { logger.debug(multiget_slice); -state().hasColumnFamilyAccess(column_parent.column_family, Permission.READ); -return multigetSliceInternal(state().getKeyspace(), keys, column_parent, predicate,
[1/3] git commit: Merge branch 'cassandra-1.1' into trunk
Updated Branches: refs/heads/cassandra-1.1 6a6bf3cf1 - 9a842c7b3 refs/heads/trunk ddc771dc5 - 232da8248 Merge branch 'cassandra-1.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/232da824 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/232da824 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/232da824 Branch: refs/heads/trunk Commit: 232da8248072991dc521d1fe579a55679ea6c735 Parents: ddc771d 9a842c7 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 09:57:53 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 09:57:53 2012 -0600 -- src/java/org/apache/cassandra/db/SystemTable.java |4 +- .../apache/cassandra/thrift/CassandraServer.java | 121 --- 2 files changed, 69 insertions(+), 56 deletions(-) --
[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-3883: Attachment: 3883-v1.txt v1 isn't perfect but it's a start; if the batch starts on a wide row, we reuse the token and iterate until we're done. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time. CFIF WideRowIterator only returns batch size columns Key: CASSANDRA-3883 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Brandon Williams Fix For: 1.1.0 Attachments: 3883-v1.txt Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3859) Add Progress Reporting to Cassandra OutputFormats
[ https://issues.apache.org/jira/browse/CASSANDRA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206968#comment-13206968 ] Brandon Williams commented on CASSANDRA-3859: - Samarth, how is this patching working for you? Add Progress Reporting to Cassandra OutputFormats - Key: CASSANDRA-3859 URL: https://issues.apache.org/jira/browse/CASSANDRA-3859 Project: Cassandra Issue Type: Improvement Components: Hadoop, Tools Affects Versions: 1.1.0 Reporter: Samarth Gahire Assignee: Brandon Williams Priority: Minor Labels: bulkloader, hadoop, mapreduce, sstableloader Fix For: 1.1.0 Attachments: 0001-add-progress-reporting-to-BOF.txt, 0002-Add-progress-to-CFOF.txt Original Estimate: 48h Remaining Estimate: 48h When we are using the BulkOutputFormat to load the data to cassandra. We should use the progress reporting to Hadoop Job within Sstable loader because while loading the data for particular task if streaming is taking more time and progress is not reported to Job it may kill the task with timeout exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3867) Disablethrift and Enablethrift can leaves behind zombie connections on THSHA server
[ https://issues.apache.org/jira/browse/CASSANDRA-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206971#comment-13206971 ] Brandon Williams commented on CASSANDRA-3867: - +1 Disablethrift and Enablethrift can leaves behind zombie connections on THSHA server --- Key: CASSANDRA-3867 URL: https://issues.apache.org/jira/browse/CASSANDRA-3867 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Vijay Assignee: Vijay Fix For: 1.0.8 Attachments: 0001-CASSANDRA-3867.patch While doing nodetool disable thrift we disable selecting threads and close them... but the connections are still active... Enable thrift creates a new Selector threads because we create new ThriftServer() which will cause the old connections to be zombies. I think the right fix will be to call server.interrupt(); and then close the connections when they are done selecting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3740) While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
[ https://issues.apache.org/jira/browse/CASSANDRA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206969#comment-13206969 ] Brandon Williams edited comment on CASSANDRA-3740 at 2/13/12 4:36 PM: -- Samarth/Erik, How does this patch look? was (Author: brandon.williams): Samarth/Eric, How does this patch look? While using BulkOutputFormat unneccessarily look for the cassandra.yaml file. -- Key: CASSANDRA-3740 URL: https://issues.apache.org/jira/browse/CASSANDRA-3740 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Samarth Gahire Assignee: Brandon Williams Labels: cassandra, hadoop, mapreduce Fix For: 1.1.0 Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt, 0002-Prevent-loading-from-yaml.txt, 0003-use-output-partitioner.txt, 0004-update-BOF-for-new-dir-layout.txt, 0005-BWR-uses-any-if.txt I am trying to use BulkOutputFormat to stream the data from map of Hadoop job. I have set the cassandra related configuration using ConfigHelper ,Also have looked into Cassandra code seems Cassandra has taken care that it should not look for the cassandra.yaml file. But still when I run the job i get the following error: { 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015 12/01/13 11:30:05 INFO mapred.JobClient: map 0% reduce 0% 12/01/13 11:30:23 INFO mapred.JobClient: Task Id : attempt_201201130910_0015_m_00_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201201130910_0015_m_00_0: Cannot locate cassandra.yaml attempt_201201130910_0015_m_00_0: Fatal configuration error; unable to start server. } Also let me know how can i make this cassandra.yaml file available to Hadoop mapreduce job? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3740) While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
[ https://issues.apache.org/jira/browse/CASSANDRA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206969#comment-13206969 ] Brandon Williams commented on CASSANDRA-3740: - Samarth/Eric, How does this patch look? While using BulkOutputFormat unneccessarily look for the cassandra.yaml file. -- Key: CASSANDRA-3740 URL: https://issues.apache.org/jira/browse/CASSANDRA-3740 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Samarth Gahire Assignee: Brandon Williams Labels: cassandra, hadoop, mapreduce Fix For: 1.1.0 Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt, 0002-Prevent-loading-from-yaml.txt, 0003-use-output-partitioner.txt, 0004-update-BOF-for-new-dir-layout.txt, 0005-BWR-uses-any-if.txt I am trying to use BulkOutputFormat to stream the data from map of Hadoop job. I have set the cassandra related configuration using ConfigHelper ,Also have looked into Cassandra code seems Cassandra has taken care that it should not look for the cassandra.yaml file. But still when I run the job i get the following error: { 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015 12/01/13 11:30:05 INFO mapred.JobClient: map 0% reduce 0% 12/01/13 11:30:23 INFO mapred.JobClient: Task Id : attempt_201201130910_0015_m_00_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201201130910_0015_m_00_0: Cannot locate cassandra.yaml attempt_201201130910_0015_m_00_0: Fatal configuration error; unable to start server. } Also let me know how can i make this cassandra.yaml file available to Hadoop mapreduce job? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3772: -- Reviewer: yukim Component/s: Core Assignee: Dave Brosius Yuki, can you take a look? Evaluate Murmur3-based partitioner -- Key: CASSANDRA-3772 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Dave Brosius Fix For: 1.2 Attachments: try_murmur3.diff MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of HadoopSupport by jeremyhanna
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The HadoopSupport page has been changed by jeremyhanna: http://wiki.apache.org/cassandra/HadoopSupport?action=diffrev1=46rev2=47 Comment: removed redundant hadoop property. value20/value /property property - namemapred.max.tracker.failures/name - value20/value - /property - property namemapred.map.max.attempts/name value20/value /property
[jira] [Commented] (CASSANDRA-3740) While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
[ https://issues.apache.org/jira/browse/CASSANDRA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206986#comment-13206986 ] Samarth Gahire commented on CASSANDRA-3740: --- First 4 patches working fine. About the patch related to CASSANDRA-3839 Erik can explain properly. While using BulkOutputFormat unneccessarily look for the cassandra.yaml file. -- Key: CASSANDRA-3740 URL: https://issues.apache.org/jira/browse/CASSANDRA-3740 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Samarth Gahire Assignee: Brandon Williams Labels: cassandra, hadoop, mapreduce Fix For: 1.1.0 Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt, 0002-Prevent-loading-from-yaml.txt, 0003-use-output-partitioner.txt, 0004-update-BOF-for-new-dir-layout.txt, 0005-BWR-uses-any-if.txt I am trying to use BulkOutputFormat to stream the data from map of Hadoop job. I have set the cassandra related configuration using ConfigHelper ,Also have looked into Cassandra code seems Cassandra has taken care that it should not look for the cassandra.yaml file. But still when I run the job i get the following error: { 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015 12/01/13 11:30:05 INFO mapred.JobClient: map 0% reduce 0% 12/01/13 11:30:23 INFO mapred.JobClient: Task Id : attempt_201201130910_0015_m_00_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201201130910_0015_m_00_0: Cannot locate cassandra.yaml attempt_201201130910_0015_m_00_0: Fatal configuration error; unable to start server. } Also let me know how can i make this cassandra.yaml file available to Hadoop mapreduce job? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3712) Can't cleanup after I moved a token.
[ https://issues.apache.org/jira/browse/CASSANDRA-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207021#comment-13207021 ] Yuki Morishita commented on CASSANDRA-3712: --- I ran my unit test enough and I see no error. +1 Can't cleanup after I moved a token. Key: CASSANDRA-3712 URL: https://issues.apache.org/jira/browse/CASSANDRA-3712 Project: Cassandra Issue Type: Bug Components: Core Environment: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) Ubuntu 10.04.2 LTS 64-Bit RAM: 2GB / 1GB free Data partition: 80% free on the most used server. Reporter: Herve Nicol Assignee: Yuki Morishita Fix For: 1.0.8 Attachments: 0001-Add-flush-and-cleanup-race-test.patch, 0002-Acquire-lock-when-updating-index.patch, 3712-v3.txt Before cleanup failed, I moved one node's token. My cluster had 10GB data on 2 nodes. Data repartition was bad, tokens were 165[...] and 155[...]. I moved 155 to 075[...], then adjusted to 076[...]. The moves were correctly processed, with no exception. But then, when I wanted to cleanup, it failed and keeps failing, on both nodes. Other maintenance procedures like repair, compact or scrub work. All the data is in the URLs CF. Example session log: nodetool cleanup fails: $ ./nodetool --host cnode1 cleanup Error occured during cleanup java.util.concurrent.ExecutionException: java.lang.AssertionError at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:203) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:237) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:958) at org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1604) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.AssertionError at org.apache.cassandra.db.Memtable.put(Memtable.java:136) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:780) at org.apache.cassandra.db.index.keys.KeysIndex.deleteColumn(KeysIndex.java:82) at
[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-2975: - Attachment: 0001-CASSANDRA-2975.patch Attached is the refactor which includes fixes as per the suggestions. Added a factory to make adding newer hashesh easier but left the Legacy alone but it will be fairly trivial and more cleaner if we want to refactor a little more. Let me know thanks! Tests passed and the long test shows significant improvement Thanks Brian! Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Lindauer Assignee: Vijay Priority: Trivial Labels: lhf Fix For: 1.2 Attachments: 0001-CASSANDRA-2975.patch, 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch, Murmur3Benchmark.java MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3862: -- Attachment: 3862-cleanup.txt Attached cleanup patch that applies on top of v3. Most of the changes are adding docstrings/comments and cleaning up typos. A minor change to the code was to make cacheRow take just cfId and filter, removing the redundant filter.key as a parameter. I also renamed cacheRow to getThroughCache. Still not 100% happy with that, but my goal is to make the distinction between readAndCache more obvious. Finally, I've modified the logic in invalidateCachedRow according to the reasoning in this comment: {noformat} . // This method is used to (1) drop obsolete entries from a copying cache after the row in question was updated // and to (2) make sure we're not wasting cache space on rows that don't exist anymore post-compaction. // Sentinels complicate this because it means we've caught a read thread in the process of loading // the cache, and we don't know (in case 2) if it will do so with rows from before the compaction or after, // so we need to loop until the load completes. {noformat} (I also negated the loop condition, which looked like an oversight.) RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Daniel Doubleday Attachments: 3862-cleanup.txt, 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3862: -- Attachment: (was: 3862-cleanup.txt) RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Daniel Doubleday Attachments: 3862-cleanup.txt, 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3862: -- Attachment: 3862-cleanup.txt RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.7 Reporter: Daniel Doubleday Attachments: 3862-cleanup.txt, 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3862) RowCache misses Updates
[ https://issues.apache.org/jira/browse/CASSANDRA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3862: -- Reviewer: jbellis Affects Version/s: (was: 1.0.7) 0.6 Fix Version/s: 1.1.0 Assignee: Sylvain Lebresne RowCache misses Updates --- Key: CASSANDRA-3862 URL: https://issues.apache.org/jira/browse/CASSANDRA-3862 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6 Reporter: Daniel Doubleday Assignee: Sylvain Lebresne Fix For: 1.1.0 Attachments: 3862-cleanup.txt, 3862-v2.patch, 3862.patch, 3862_v3.patch, include_memtables_in_rowcache_read.patch While performing stress tests to find any race problems for CASSANDRA-2864 I guess I (re-)found one for the standard on-heap row cache. During my stress test I hava lots of threads running with some of them only reading other writing and re-reading the value. This seems to happen: - Reader tries to read row A for the first time doing a getTopLevelColumns - Row A which is not in the cache yet is updated by Writer. The row is not eagerly read during write (because we want fast writes) so the writer cannot perform a cache update - Reader puts the row in the cache which is now missing the update I already asked this some time ago on the mailing list but unfortunately didn't dig after I got no answer since I assumed that I just missed something. In a way I still do but haven't found any locking mechanism that makes sure that this should not happen. The problem can be reproduced with every run of my stress test. When I restart the server the expected column is there. It's just missing from the cache. To test I have created a patch that merges memtables with the row cache. With the patch the problem is gone. I can also reproduce in 0.8. Haven't checked 1.1 but I haven't found any relevant change their either so I assume the same aplies there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3417) InvocationTargetException ConcurrentModificationException at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3417: -- Attachment: CASSANDRA-3417-tokenmap-1.0-v1.txt Attaching {{CASSANDRA\-3417\-tokenmap\-1.0\-v1.txt}} which is for 1.0. Apologies for the confusion; I only ever triggered and tested this on 1.1/trunk since that's what I was testing, despite this bug originally being against 1.0. I haven't done real testing with this patch for 1.0. Right now I can't use the cluster I was testing with to easily go to 1.0 to test either. But, the fix seems correct to me regardless of branch given that the iteration is clearly over a map that is getting modified. The biggest risk is a typo or similar mistake which is more easily spotted by review anyway. InvocationTargetException ConcurrentModificationException at startup Key: CASSANDRA-3417 URL: https://issues.apache.org/jira/browse/CASSANDRA-3417 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.0 Reporter: Joaquin Casares Assignee: Peter Schuller Priority: Minor Fix For: 1.0.8 Attachments: 3417-2.txt, 3417-3.txt, 3417.txt, CASSANDRA-3417-tokenmap-1.0-v1.txt, CASSANDRA-3417-tokenmap-v2.txt, CASSANDRA-3417-tokenmap-v3.txt, CASSANDRA-3417-tokenmap.txt I was starting up the new DataStax AMI where the seed starts first and 34 nodes would latch on together. So far things have been working decently for launching, but right now I just got this during startup. {CODE} ubuntu@ip-10-40-190-143:~$ sudo cat /var/log/cassandra/output.log INFO 09:24:38,453 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 09:24:38,456 Heap size: 1936719872/1937768448 INFO 09:24:38,457 Classpath: /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.3.jar:/usr/share/cassandra/apache-cassandra-1.0.0.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.0.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar INFO 09:24:39,891 JNA mlockall successful INFO 09:24:39,901 Loading settings from file:/etc/cassandra/cassandra.yaml INFO 09:24:40,057 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 09:24:40,069 Global memtable threshold is enabled at 616MB INFO 09:24:40,159 EC2Snitch using region: us-east, zone: 1d. INFO 09:24:40,475 Creating new commitlog segment /raid0/cassandra/commitlog/CommitLog-1319793880475.log INFO 09:24:40,486 Couldn't detect any schema definitions in local storage. INFO 09:24:40,486 Found table data in data directories. Consider using the CLI to define your schema. INFO 09:24:40,497 No commitlog files found; skipping replay INFO 09:24:40,501 Cassandra version: 1.0.0 INFO 09:24:40,502 Thrift API version: 19.18.0 INFO 09:24:40,502 Loading persisted ring state INFO 09:24:40,506 Starting up server gossip INFO 09:24:40,529 Enqueuing flush of Memtable-LocationInfo@1388314661(190/237 serialized/live bytes, 4 ops) INFO 09:24:40,530 Writing Memtable-LocationInfo@1388314661(190/237 serialized/live bytes, 4 ops) INFO 09:24:40,600 Completed flushing /raid0/cassandra/data/system/LocationInfo-h-1-Data.db (298 bytes) INFO 09:24:40,613 Ec2Snitch adding ApplicationState ec2region=us-east ec2zone=1d INFO 09:24:40,621 Starting Messaging Service on /10.40.190.143:7000 INFO 09:24:40,628 Joining: waiting for ring and schema information INFO 09:24:43,389 InetAddress /10.194.29.156 is now dead. INFO 09:24:43,391 InetAddress /10.85.11.38 is now dead. INFO 09:24:43,392 InetAddress /10.34.42.28 is now dead. INFO 09:24:43,393 InetAddress /10.77.63.49 is now dead.
[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-2975: - Attachment: 0001-CASSANDRA-2975.patch updating the patch because old one missed the new files created. Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Lindauer Assignee: Vijay Priority: Trivial Labels: lhf Fix For: 1.2 Attachments: 0001-CASSANDRA-2975.patch, 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch, Murmur3Benchmark.java MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2975) Upgrade MurmurHash to version 3
[ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-2975: - Attachment: (was: 0001-CASSANDRA-2975.patch) Upgrade MurmurHash to version 3 --- Key: CASSANDRA-2975 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Lindauer Assignee: Vijay Priority: Trivial Labels: lhf Fix For: 1.2 Attachments: 0001-CASSANDRA-2975.patch, 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch, Murmur3Benchmark.java MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here: http://code.google.com/p/smhasher/ The reference implementation is here: http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136r=136 I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation: https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93 Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207046#comment-13207046 ] Jonathan Ellis commented on CASSANDRA-3843: --- It's a relatively small patch, but StorageProxy and its callbacks can be fragile... I almost didn't commit it to 1.0 either. Tell you what though, I'll post a backported patch here and if you want you can run with it. :) Unnecessary ReadRepair request during RangeScan Key: CASSANDRA-3843 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Philip Andronov Assignee: Jonathan Ellis Fix For: 1.0.8 Attachments: 3843-v2.txt, 3843.txt During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that. With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :( It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch. Code explanations: {code:title=RangeSliceResponseResolver.java|borderStyle=solid} class RangeSliceResponseResolver { // private class Reducer extends MergeIterator.ReducerPairRow,InetAddress, Row { // protected Row getReduced() { ColumnFamily resolved = versions.size() 1 ? RowRepairResolver.resolveSuperset(versions) : versions.get(0); if (versions.size() sources.size()) { for (InetAddress source : sources) { if (!versionSources.contains(source)) { // [PA] Here we are adding null ColumnFamily. // later it will be compared with the desired // version and will give us fake difference which // forces Cassandra to send ReadRepair to a given source versions.add(null); versionSources.add(source); } } } // if (resolved != null) repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources)); // } } } {code} {code:title=RowRepairResolver.java|borderStyle=solid} public class RowRepairResolver extends AbstractRowResolver { // public static ListIAsyncResult scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey? key, ListColumnFamily versions, ListInetAddress endpoints) { ListIAsyncResult results = new ArrayListIAsyncResult(versions.size()); for (int i = 0; i versions.size(); i++) { // On some iteration we have to compare null and resolved which are obviously // not equals, so it will fire a ReadRequest, however it is not needed here ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved); if (diffCf == null) continue; // {code} Imagine the following situation: NodeA has X.1 // row X with the version 1 NodeB has X.2 NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2 During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content: NodeA has X.12 NodeB has X.12 which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207050#comment-13207050 ] Jonathan Ellis commented on CASSANDRA-3843: --- Looks to me like the 1.0 code changes from v2 apply cleanly to 0.8. (CHANGES diff does not apply but can be ignored.) Unnecessary ReadRepair request during RangeScan Key: CASSANDRA-3843 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Philip Andronov Assignee: Jonathan Ellis Fix For: 1.0.8 Attachments: 3843-v2.txt, 3843.txt During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that. With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :( It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch. Code explanations: {code:title=RangeSliceResponseResolver.java|borderStyle=solid} class RangeSliceResponseResolver { // private class Reducer extends MergeIterator.ReducerPairRow,InetAddress, Row { // protected Row getReduced() { ColumnFamily resolved = versions.size() 1 ? RowRepairResolver.resolveSuperset(versions) : versions.get(0); if (versions.size() sources.size()) { for (InetAddress source : sources) { if (!versionSources.contains(source)) { // [PA] Here we are adding null ColumnFamily. // later it will be compared with the desired // version and will give us fake difference which // forces Cassandra to send ReadRepair to a given source versions.add(null); versionSources.add(source); } } } // if (resolved != null) repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources)); // } } } {code} {code:title=RowRepairResolver.java|borderStyle=solid} public class RowRepairResolver extends AbstractRowResolver { // public static ListIAsyncResult scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey? key, ListColumnFamily versions, ListInetAddress endpoints) { ListIAsyncResult results = new ArrayListIAsyncResult(versions.size()); for (int i = 0; i versions.size(); i++) { // On some iteration we have to compare null and resolved which are obviously // not equals, so it will fire a ReadRequest, however it is not needed here ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved); if (diffCf == null) continue; // {code} Imagine the following situation: NodeA has X.1 // row X with the version 1 NodeB has X.2 NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2 During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content: NodeA has X.12 NodeB has X.12 which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207055#comment-13207055 ] Jonathan Ellis commented on CASSANDRA-3883: --- bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time. If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * rows per split) doesn't seem like a big deal to me. CFIF WideRowIterator only returns batch size columns Key: CASSANDRA-3883 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Brandon Williams Fix For: 1.1.0 Attachments: 3883-v1.txt Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207055#comment-13207055 ] Jonathan Ellis edited comment on CASSANDRA-3883 at 2/13/12 6:41 PM: bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time. If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * pages per row in split) doesn't seem like a big deal to me. was (Author: jbellis): bq. Unfortunately if we don't start on one, I'm not sure if there's a way to detect that we're in a wide row without making an extra rpc against the last row seen every time. If we can easily address this w/ some extra logic in get_paged_slice then great, otherwise doing one extra rpc call out of (split size * rows per split) doesn't seem like a big deal to me. CFIF WideRowIterator only returns batch size columns Key: CASSANDRA-3883 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Brandon Williams Fix For: 1.1.0 Attachments: 3883-v1.txt Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3883: -- Reviewer: tjake Assignee: Brandon Williams CFIF WideRowIterator only returns batch size columns Key: CASSANDRA-3883 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1.0 Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 1.1.0 Attachments: 3883-v1.txt Most evident with the word count, where there are 1250 'word1' items in two rows (1000 in one, 250 in another) and it counts 198 with the batch size set to 99. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207060#comment-13207060 ] Peter Schuller commented on CASSANDRA-3830: --- {quote} What I meant to say is this is the only special-case for seeds; gossiping to at least one seed every round is the normal case, as you said. {quote} Ah. So what I mean by gossip being a special case, is the fact that we have the gossip-to-seed logic at all. Part of the core aspects of gossip is the propagation delay and whether and to what extent it is affected by things like cluster size. My concern is that all production clusters that follow the recommendation w.r.t. seeds are all working well potentially only because of the fact that we are gossiping to seeds. It's trivial to see that if we have a bunch of N servers all gossiping to a small set of 2-4 servers, propagation delay is not going to be a major problem as long as at least one of those are up. Anyways, I'll try to get to graphing average propagation delay as a function of cluster size (along with p99:s or something) and see if there seems to be a correlation or not. gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In both cases, the mean interval is around 1500 milliseconds. I don't feel I have a good grasp of whether this is incidental or guaranteed, and it would be good to at least empirically test propagation time w/o seeds at differnet cluster sizes; it's supposed to be un-affected by cluster size ({{RING_DELAY}} is static for this reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207061#comment-13207061 ] Peter Schuller commented on CASSANDRA-3830: --- To clarify, the relation to failure detector isn't the absolute propagation delay - I am concerned with a sudden *change* in propagation delay (either average or outliers). gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In both cases, the mean interval is around 1500 milliseconds. I don't feel I have a good grasp of whether this is incidental or guaranteed, and it would be good to at least empirically test propagation time w/o seeds at differnet cluster sizes; it's supposed to be un-affected by cluster size ({{RING_DELAY}} is static for this reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3829) make seeds *only* be seeds, not special in gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207084#comment-13207084 ] Peter Schuller commented on CASSANDRA-3829: --- {quote} Okay, I'm with you so far. But as you note, this impacts the usability of single-node clusters which is where virtually everybody starts. So, I'll need to see a solution that doesn't make life more confusion for that overwhelming majority. I get that you don't like the current tradeoffs but I haven't seen a better proposal yet. (I'll go ahead and pre-emptively -1 pecial environment variables...) {quote} I haven't been able to come up with a solution that avoids the initial setup requiring special actions. While I am personally fine with this (any software that doesn't would cause me to wonder what? what if this wasn't an initial setup?) I understand that 99% of users would probably not be fond of this behavior and it would just turn people off of Cassandra. So, what about an opt-in setting which explicitly says the inverse - this *is* a production cluster that is not being set up? The recommendation could be that everyone uses this setting after a cluster is in production, but things keep working if they don't (subject to the risks associated with re-bootstrapping someone on the seed list, a problem we already have). This could be either a {{cassandra.yaml}} option or, if that is deemed too visible/confusing, a not-so-prominently-documented environment variable. However, if a documented {{cassandra.yaml}} option in the default config is not acceptable, I think I'd still prefer a {{cassandra.yaml}} setting that wasn't in the default configuration to an environment variable above an environment variable. (This is another case where it doesn't really matter *to me*. We can easily just patch in the env variable and run with it on our end, it's not like that patch will be a maintenance problem for us. I really just want to try to make this safer for all users.) {quote} I still haven't seen a case when this, or special-casing seeds to prevent gossip partitions, causes real problems. Whereas I was around when we added the gossip-partition-prevention code, so I do know the problems that prevents. {quote} Jumping into clusters/rolling restarts: So I can give anecdotal stories about seeing people, multiple times, being unaware and/or confused about a node jumping into a cluster without bootstrapping and not realizing what's going on, or tell you that a long time ago before I knew enough about gossip I was feeling the pains of rolling restarts whenever maintenance was done on clusters. But in this case it seems better to just have it flow from actual facts because it's not really that subjective. Consider the combination of: * Restarts are in fact required in change seeds. * A restart can easily be very very slow due to index sampling (until the samples-on-disk patch is in), row cache pre-load, commit log replay (not if you drained properly though), etc. * A restart can also be problematic if it e.g. causes page cache eviction and thus necessitates rate limiting rolling restarts. * Completing rolling restarts in a safe manner is prevented by pre-existing nodes being down in the cluster depending (e.g., RF=3 QUORUM, one node already down - can't restart neighbors). * In addition, all forms of restarts carry with it some risk, even if we were to only consider the risk involved in terms of adding additional windows of potential double failures. Having to do a full rolling restart on a production cluster, particularly if the cluster has a lot of data (- slower restarts, more sensitive to page caches, etc), is a *huge* operation to do just because you needed to e.g. replace a broken disk in and rebootstrap a node that just happened to be a seed. And clearly, the probability that *some* other node in the cluster is currently down for whatever reason in a large cluster is non-trivial, and would cause the inability to not be able to complete a orlling restart. Of course one might again argue that there is no real need to be that strict on maintaining the seed list, but again the circumstances under which this is safe is very opaque to people not intimately familiar with the code - and not being strict about it kind of takes away the protection against partitions it was supposed to give you from the start. So, while I realize changing the role of seeds is more controversial, I have a hard time understanding how it cannot be obviously better to allow seeds to be reloadable? Pushing a .yaml configuration file vs. a *complete rolling restart of the entire cluster* - that's a huge difference in impact, effort and risk for most production clusters. make seeds *only* be seeds, not special in gossip -- Key:
[jira] [Created] (CASSANDRA-3903) Intermittent unexpected errors: possibly race condition around CQL parser?
Intermittent unexpected errors: possibly race condition around CQL parser? -- Key: CASSANDRA-3903 URL: https://issues.apache.org/jira/browse/CASSANDRA-3903 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Environment: Mac OS X 10.7 with Sun/Oracle Java 1.6.0_29 Debian GNU/Linux 6.0.3 (squeeze) with Sun/Oracle Java 1.6.0_26 several recent commits on cassandra-1.1 branch. at least: 0183dc0b36e684082832de43a21b3dc0a9716d48, 3eefbac133c838db46faa6a91ba1f114192557ae, 9a842c7b317e6f1e6e156ccb531e34bb769c979f Running cassandra under ccm with one node Reporter: paul cannon When running multiple simultaneous instances of the test_cql.py piece of the python-cql test suite, I can reliably reproduce intermittent and unpredictable errors in the tests. The failures often occur at the point of keyspace creation during test setup, with a CQL statement of the form: {code} CREATE KEYSPACE 'asnvzpot' WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor = 1 {code} An InvalidRequestException is returned to the cql driver, which re-raises it as a cql.ProgrammingError. The message: {code} ProgrammingError: Bad Request: line 2:24 no viable alternative at input 'asnvzpot' {code} In a few cases, Cassandra threw an ArrayIndexOutOfBoundsException and this traceback, closing the thrift connection: {code} ERROR [Thrift:244] 2012-02-10 15:51:46,815 CustomTThreadPoolServer.java (line 205) Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:1520) at org.apache.cassandra.thrift.ThriftValidation.validateCfDef(ThriftValidation.java:634) at org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:744) at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:898) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1245) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3458) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3446) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) {code} Sometimes I see an ArrayOutOfBoundsError with no traceback: {code} ERROR [Thrift:858] 2012-02-13 12:04:01,537 CustomTThreadPoolServer.java (line 205) Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException {code} Sometimes I get this: {code} ERROR [MigrationStage:1] 2012-02-13 12:04:46,077 AbstractCassandraDaemon.java (line 134) Fatal exception in thread Thread[MigrationStage:1,5,main] java.lang.IllegalArgumentException: value already present: 1558 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:392) at org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:284) at org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:209) at org.apache.cassandra.db.migration.AddColumnFamily.applyImpl(AddColumnFamily.java:49) at org.apache.cassandra.db.migration.Migration.apply(Migration.java:66) at org.apache.cassandra.cql.QueryProcessor$1.call(QueryProcessor.java:334) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} Again, around 99% of the instances of this {{CREATE KEYSPACE}} statement work fine, so it's a little hard to git bisect out, but I guess I'll see what I can do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207110#comment-13207110 ] Brandon Williams commented on CASSANDRA-3830: - CASSANDRA-617 may be of interest then (though this is when gossip was old and busted; udp and whatnot) bq. It's trivial to see that if we have a bunch of N servers all gossiping to a small set of 2-4 servers, propagation delay is not going to be a major problem as long as at least one of those are up Right, gossiping to a seed every round actually becomes a bit of an optimization in this regard, but isn't strictly necessary. gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In both cases, the mean interval is around 1500 milliseconds. I don't feel I have a good grasp of whether this is incidental or guaranteed, and it would be good to at least empirically test propagation time w/o seeds at differnet cluster sizes; it's supposed to be un-affected by cluster size ({{RING_DELAY}} is static for this reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3903) Intermittent unexpected errors: possibly race condition around CQL parser?
[ https://issues.apache.org/jira/browse/CASSANDRA-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207115#comment-13207115 ] paul cannon commented on CASSANDRA-3903: I should mention that I adjusted the python-cql tests to be able to run cleanly in parallel, in the parallel-tests branch. Intermittent unexpected errors: possibly race condition around CQL parser? -- Key: CASSANDRA-3903 URL: https://issues.apache.org/jira/browse/CASSANDRA-3903 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0 Environment: Mac OS X 10.7 with Sun/Oracle Java 1.6.0_29 Debian GNU/Linux 6.0.3 (squeeze) with Sun/Oracle Java 1.6.0_26 several recent commits on cassandra-1.1 branch. at least: 0183dc0b36e684082832de43a21b3dc0a9716d48, 3eefbac133c838db46faa6a91ba1f114192557ae, 9a842c7b317e6f1e6e156ccb531e34bb769c979f Running cassandra under ccm with one node Reporter: paul cannon When running multiple simultaneous instances of the test_cql.py piece of the python-cql test suite, I can reliably reproduce intermittent and unpredictable errors in the tests. The failures often occur at the point of keyspace creation during test setup, with a CQL statement of the form: {code} CREATE KEYSPACE 'asnvzpot' WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor = 1 {code} An InvalidRequestException is returned to the cql driver, which re-raises it as a cql.ProgrammingError. The message: {code} ProgrammingError: Bad Request: line 2:24 no viable alternative at input 'asnvzpot' {code} In a few cases, Cassandra threw an ArrayIndexOutOfBoundsException and this traceback, closing the thrift connection: {code} ERROR [Thrift:244] 2012-02-10 15:51:46,815 CustomTThreadPoolServer.java (line 205) Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:1520) at org.apache.cassandra.thrift.ThriftValidation.validateCfDef(ThriftValidation.java:634) at org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:744) at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:898) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1245) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3458) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3446) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) {code} Sometimes I see an ArrayOutOfBoundsError with no traceback: {code} ERROR [Thrift:858] 2012-02-13 12:04:01,537 CustomTThreadPoolServer.java (line 205) Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException {code} Sometimes I get this: {code} ERROR [MigrationStage:1] 2012-02-13 12:04:46,077 AbstractCassandraDaemon.java (line 134) Fatal exception in thread Thread[MigrationStage:1,5,main] java.lang.IllegalArgumentException: value already present: 1558 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:392) at org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:284) at org.apache.cassandra.db.migration.MigrationHelper.addColumnFamily(MigrationHelper.java:209) at org.apache.cassandra.db.migration.AddColumnFamily.applyImpl(AddColumnFamily.java:49) at org.apache.cassandra.db.migration.Migration.apply(Migration.java:66) at org.apache.cassandra.cql.QueryProcessor$1.call(QueryProcessor.java:334) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207132#comment-13207132 ] Yuki Morishita commented on CASSANDRA-3772: --- Dave, Patch needs rebase, but looking at the patch, I noticed the following: {code} private static byte[] hashMurmur3(ByteBuffer... data) { HashFunction hashFunction = murmur3HF.get(); Hasher hasher = hashFunction.newHasher(); // snip } {code} Isn't that slow if you instantiate every time? I looked up guava source code but I saw no way to reset, so I guess the above is the only thing you could do... I also note that CASSANDRA-2975 will implement MurmurHash3, so I think it is better not to introduce external library. What do you think? Evaluate Murmur3-based partitioner -- Key: CASSANDRA-3772 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Dave Brosius Fix For: 1.2 Attachments: try_murmur3.diff MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
git commit: Fix misplaced 'new' keyword
Updated Branches: refs/heads/cassandra-1.0 cb0efd09c - 651ca528d Fix misplaced 'new' keyword Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/651ca528 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/651ca528 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/651ca528 Branch: refs/heads/cassandra-1.0 Commit: 651ca528d24f088581055cfbd4c70115e04899ea Parents: cb0efd0 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 13:41:03 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 13:41:03 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/651ca528/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index 63758ab..b9977a5 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -491,7 +491,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o == null) return (ByteBuffer)o; if (o instanceof java.lang.String) -return new ByteBuffer.wrap(DataByteArray((String)o).get()); +return ByteBuffer.wrap(new DataByteArray((String)o).get()); if (o instanceof Integer) return IntegerType.instance.decompose((BigInteger)o); if (o instanceof Long)
[jira] [Updated] (CASSANDRA-3412) make nodetool ring ownership smarter
[ https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3412: -- Assignee: Vijay (was: paul cannon) Vijay, do you have time to take a stab at this? make nodetool ring ownership smarter Key: CASSANDRA-3412 URL: https://issues.apache.org/jira/browse/CASSANDRA-3412 Project: Cassandra Issue Type: Improvement Reporter: Jackson Chung Assignee: Vijay Priority: Minor just a thought.. the ownership info currently just look at the token and calculate the % between nodes. It would be nice if it could do more, such as discriminate nodes of each DC, replica set, etc. ticket is open for suggestion... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
git commit: Integer corresponds to Int32Type
Updated Branches: refs/heads/cassandra-1.0 651ca528d - 4bd3f8d86 Integer corresponds to Int32Type Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4bd3f8d8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4bd3f8d8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4bd3f8d8 Branch: refs/heads/cassandra-1.0 Commit: 4bd3f8d86fcc29259dd0d508873125f88ce588e4 Parents: 651ca52 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 13:48:20 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 13:48:20 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4bd3f8d8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index b9977a5..76a291a 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -493,7 +493,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o instanceof java.lang.String) return ByteBuffer.wrap(new DataByteArray((String)o).get()); if (o instanceof Integer) -return IntegerType.instance.decompose((BigInteger)o); +return Int32Type.instance.decompose((Integer)o); if (o instanceof Long) return LongType.instance.decompose((Long)o); if (o instanceof Float)
[jira] [Commented] (CASSANDRA-3412) make nodetool ring ownership smarter
[ https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207153#comment-13207153 ] Vijay commented on CASSANDRA-3412: -- Will do! Jackson, It is fairly trivial to change the own's The problem with this last time i looked at this, was that if we show % in the ring and if we show % per DC it will add up to be more than 100% and hence will cause some confusions for the staters (I wish it was color coded or something like that)... What do you think? Will it make sense to rename OWNS to OWNS-PER-DC or a better name and do the above? make nodetool ring ownership smarter Key: CASSANDRA-3412 URL: https://issues.apache.org/jira/browse/CASSANDRA-3412 Project: Cassandra Issue Type: Improvement Reporter: Jackson Chung Assignee: Vijay Priority: Minor just a thought.. the ownership info currently just look at the token and calculate the % between nodes. It would be nice if it could do more, such as discriminate nodes of each DC, replica set, etc. ticket is open for suggestion... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3901) write endpoints are not treated correctly, breaking consistency guarantees
[ https://issues.apache.org/jira/browse/CASSANDRA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207156#comment-13207156 ] paul cannon commented on CASSANDRA-3901: I don't believe that the proposed fix for CASSANDRA-2434 covers these concerns at all. I guess you could say that the *scope* of 2434 covers this, but I think it's separate enough to deserve its own ticket, as you've done. write endpoints are not treated correctly, breaking consistency guarantees -- Key: CASSANDRA-3901 URL: https://issues.apache.org/jira/browse/CASSANDRA-3901 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Critical I had a nagging feeling this was the case ever since I started wanting CASSANDRA-3833 and thinking about hot to handle the association between nodes in the read set and nodes in the write set. I may be wrong (please point me in the direct direction if so), but I see no code anywhere that tries to (1) apply consistency level to currently normal endpoints only, and (2) connect a given read endpoint with a future write endpoint such that they are tied together for consistency purposes (parts of these concerns probably is covered by CASSANDRA-2434 but that ticket is more general). To be more clear about the problem: Suppose we have a ring of nodes, with a single node bootstrapping. Now, for a given row key suppose reads are served by A, B and C while writes are to go to A, B, C and D. In other words, D is the node bootstrapping. Suppose RF is 3 and A,B,C,D is ring order. There are a few things required for correct behavior: * Writes acked by D must never be treated as sufficient to satisfy consistency level since until it is part of the read set it does not count towards CL on reads. * Writes acked by B must *not* be treated as sufficient to satisfy consistency level *unless* the same write is *also* acked by D, because once D enters the ring, B will no longer be counting towards CL on reads. The only alternative is to make the read succeed and disallow D from entering the ring. We don't seem to be handling this at all (and it becomes more complicated with arbitrary transitions). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207159#comment-13207159 ] Jonathan Ellis commented on CASSANDRA-3772: --- bq. I looked up guava source code but I saw no way to reset, so I guess the above is the only thing you could do It looks like you're right: http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/MessageDigestHashFunction.java So using the standalone MH3 library is probably the way to go. Evaluate Murmur3-based partitioner -- Key: CASSANDRA-3772 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Dave Brosius Fix For: 1.2 Attachments: try_murmur3.diff MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3412) make nodetool ring ownership smarter
[ https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207169#comment-13207169 ] Peter Schuller commented on CASSANDRA-3412: --- Our internal tool (external, in python, based on describe_ring) simply uses {{describe_ring}} and looks at each range and their responsible nodes and just adds them all up. The ownership we report for a node is the total amount of ringspace (regardless of primary/secondary/dc/etc concerns) that the node has, compared to the overall total. It ends up giving you the real number while completely blackboxing why we got there - whether it be due to rack awareness (CASSANDRA-3810) or DC:s. FWIW, here is the code for that. It's not self-contained and won't run, but it's an FYI. The topology_xref is just post-processing the describe ring results to yield the map of range - nodes_responsible. {code} def cmd_effective_ownership(opts, args): Print effective ownership of nodes in a cluster. Effective ownership means the actual amount of the ring for which it has data, whether or not it is because it is the primary or secondary (etc) owner of the ring segment. This is essentially the ownership you would want nodetool ring to print but doesn't. if not args and not opts.all: return node_ranges, range_nodes = topology_xref(describe_ring(*((opts,) + split_hostport(seed(opts, 'localhost') if opts.all else args[0] if opts.all: args = node_ranges.keys() # acrobatics to handle wrap-around max_token = 0 min_token = 2**127 for r in range_nodes.keys(): if r[0] min_token: min_token = r[0] if r[1] max_token: max_token = r[1] def ownership(start_token, end_token): start_token, end_token = int(start_token), int(end_token) if end_token start_token: # wrap-around return end_token + (2**127 - start_token) else: return end_token - start_token toprint = [] # list of (owned, ranges), later to be sorted for node in (hostnames.normalize_hostname(arg) for arg in args): if not node in node_ranges: raise cmdline.UserError('node %s not in ring' % (node,)) ranges = node_ranges[node] owned = reduce(lambda a, b: a + b, [ownership(r[0], r[1]) for r in ranges], 0) toprint.append((owned, node, ranges)) toprint = sorted(toprint, reverse=True) for owned, node, ranges in toprint: print '%s %f%%' % (node, float(owned) / 2**127 * 100.0) if opts.print_ranges: for r in
[jira] [Commented] (CASSANDRA-3901) write endpoints are not treated correctly, breaking consistency guarantees
[ https://issues.apache.org/jira/browse/CASSANDRA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207175#comment-13207175 ] Peter Schuller commented on CASSANDRA-3901: --- You're probably right. I didn't re-read 2434 again (it's long and it takes careful reading to follow the discussion), and mostly wanted to give a nod towards it in case it did. write endpoints are not treated correctly, breaking consistency guarantees -- Key: CASSANDRA-3901 URL: https://issues.apache.org/jira/browse/CASSANDRA-3901 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Critical I had a nagging feeling this was the case ever since I started wanting CASSANDRA-3833 and thinking about hot to handle the association between nodes in the read set and nodes in the write set. I may be wrong (please point me in the direct direction if so), but I see no code anywhere that tries to (1) apply consistency level to currently normal endpoints only, and (2) connect a given read endpoint with a future write endpoint such that they are tied together for consistency purposes (parts of these concerns probably is covered by CASSANDRA-2434 but that ticket is more general). To be more clear about the problem: Suppose we have a ring of nodes, with a single node bootstrapping. Now, for a given row key suppose reads are served by A, B and C while writes are to go to A, B, C and D. In other words, D is the node bootstrapping. Suppose RF is 3 and A,B,C,D is ring order. There are a few things required for correct behavior: * Writes acked by D must never be treated as sufficient to satisfy consistency level since until it is part of the read set it does not count towards CL on reads. * Writes acked by B must *not* be treated as sufficient to satisfy consistency level *unless* the same write is *also* acked by D, because once D enters the ring, B will no longer be counting towards CL on reads. The only alternative is to make the read succeed and disallow D from entering the ring. We don't seem to be handling this at all (and it becomes more complicated with arbitrary transitions). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3830) gossip-to-seeds is not obviously independent of failure detection algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207176#comment-13207176 ] Peter Schuller commented on CASSANDRA-3830: --- Correct, and the concern is that when the optimization is removed (e.g., by seeds being down), that might affect the failure detector if the average heartbeat interval ends up being affected. gossip-to-seeds is not obviously independent of failure detection algorithm Key: CASSANDRA-3830 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830 Project: Cassandra Issue Type: Task Components: Core Reporter: Peter Schuller Priority: Minor The failure detector, ignoring all the theory, boils down to an extremely simple algorithm. The FD keeps track of a sliding window (of 1000 currently) intervals of heartbeat for a given host. Meaning, we have a track record of the last 1000 times we saw an updated heartbeat for a host. At any given moment, a host has a score which is simply the time since the last heartbeat, over the *mean* interval in the sliding window. For historical reasons a simple scaling factor is applied to this prior to checking the phi conviction threshold. (CASSANDRA-2597 has details, but thanks to Paul's work there it's now trivial to understand what it does based on gut feeling) So in effect, a host is considered down if we haven't heard from it in some time which is significantly longer than the average time we expect to hear from it. This seems reasonable, but it does assume that under normal conditions the average time between heartbeats does not change for reasons other than those that would be plausible reasons to think a node is unhealthy. This assumption *could* be violated by the gossip-to-seed feature. There is an argument to avoid gossip-to-seed for other reasons (see CASSANDRA-3829), but this is a concrete case in which the gossip-to-seed could cause a negative side-effect of the general kind mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds not being continuously tested). Normally, due to gossip to seed, everyone essentially sees latest information within very few hart beats (assuming only 2-3 seeds). But should all seeds be down, suddenly we flip a switch and start relying on generalized propagation in the gossip system, rather than the seed special case. The potential problem I forese here is that if the average propagation time suddenly spikes when all seeds become available, it could cause bogus flapping of nodes into down state. In order to test this, I deployeda ~ 180 node cluster with a version that logs heartbet information on each interpret(), similar to: INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean is 1557.27778 It turns out that, at least at 180 nodes, with 4 seed nodes, whether or not seeds are running *does not* seem to matter significantly. In both cases, the mean interval is around 1500 milliseconds. I don't feel I have a good grasp of whether this is incidental or guaranteed, and it would be good to at least empirically test propagation time w/o seeds at differnet cluster sizes; it's supposed to be un-affected by cluster size ({{RING_DELAY}} is static for this reason, is my understanding). Would be nice to see this be the case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
git commit: CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867
Updated Branches: refs/heads/trunk 232da8248 - c49a1497e CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c49a1497 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c49a1497 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c49a1497 Branch: refs/heads/trunk Commit: c49a1497eafc5ab5c16b03b3f97842c5ab1e64c8 Parents: 232da82 Author: Vijay Parthasarathy vijay2...@gmail.com Authored: Mon Feb 13 12:37:22 2012 -0800 Committer: Vijay Parthasarathy vijay2...@gmail.com Committed: Mon Feb 13 12:37:22 2012 -0800 -- .../apache/cassandra/thrift/CustomTHsHaServer.java |8 1 files changed, 8 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c49a1497/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java -- diff --git a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java index 4921678..9bfb4f7 100644 --- a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java +++ b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java @@ -177,6 +177,14 @@ public class CustomTHsHaServer extends TNonblockingServer { select(); } +try +{ +selector.close(); // CASSANDRA-3867 +} +catch (IOException e) +{ +// ignore this exception. +} } catch (Throwable t) {
git commit: CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867
Updated Branches: refs/heads/cassandra-1.0 4bd3f8d86 - 2a5547981 CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2a554798 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2a554798 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2a554798 Branch: refs/heads/cassandra-1.0 Commit: 2a5547981dad7e59be2c26aeb52f5d49d2195b9c Parents: 4bd3f8d Author: Vijay Parthasarathy vijay2...@gmail.com Authored: Mon Feb 13 12:42:29 2012 -0800 Committer: Vijay Parthasarathy vijay2...@gmail.com Committed: Mon Feb 13 12:42:29 2012 -0800 -- .../apache/cassandra/thrift/CustomTHsHaServer.java |8 1 files changed, 8 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/2a554798/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java -- diff --git a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java index 4921678..9bfb4f7 100644 --- a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java +++ b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java @@ -177,6 +177,14 @@ public class CustomTHsHaServer extends TNonblockingServer { select(); } +try +{ +selector.close(); // CASSANDRA-3867 +} +catch (IOException e) +{ +// ignore this exception. +} } catch (Throwable t) {
[jira] [Reopened] (CASSANDRA-3886) Pig can't store some types after loading them
[ https://issues.apache.org/jira/browse/CASSANDRA-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams reopened CASSANDRA-3886: - We actually do need the catch-all: {noformat} return ByteBuffer.wrap(((DataByteArray) o).get()); {noformat} To cast all the pig-native types like CharArray, but these are all guaranteed to be castable to DataByteArray. Pig can't store some types after loading them - Key: CASSANDRA-3886 URL: https://issues.apache.org/jira/browse/CASSANDRA-3886 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.7 Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 1.0.8 Attachments: 3886.txt In CASSANDRA-2810, we removed the decompose methods in putNext instead relying on objToBB, however it cannot sufficiently handle all types. For instance, if longs are loaded and then an attempt to store them is made, this causes a cast exception: java.io.IOException: java.io.IOException: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.DataByteArray Output must be (key, {(column,value)...}) for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for SuperColumnFamily -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams
[ https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207189#comment-13207189 ] Peter Schuller commented on CASSANDRA-3569: --- For the record, while CASSANDRA-2433 did make the changes originally claimed in my initial post here, it's CASSANDRA-3216 which is causing non-AES streams to get killed as well, but only on the sender side (if the receiver goes down according to the sender). It also generates an NPE: {code} java.lang.NullPointerException at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:97) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)] {code} It should be harmless, but not very pretty. Failure detector downs should not break streams --- Key: CASSANDRA-3569 URL: https://issues.apache.org/jira/browse/CASSANDRA-3569 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller CASSANDRA-2433 introduced this behavior just to get repairs to don't sit there waiting forever. In my opinion the correct fix to that problem is to use TCP keep alive. Unfortunately the TCP keep alive period is insanely high by default on a modern Linux, so just doing that is not entirely good either. But using the failure detector seems non-sensicle to me. We have a communication method which is the TCP transport, that we know is used for long-running processes that you don't want to incorrectly be killed for no good reason, and we are using a failure detector tuned to detecting when not to send real-time sensitive request to nodes in order to actively kill a working connection. So, rather than add complexity with protocol based ping/pongs and such, I propose that we simply just use TCP keep alive for streaming connections and instruct operators of production clusters to tweak net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent on their OS). I can submit the patch. Awaiting opinions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3886) Pig can't store some types after loading them
[ https://issues.apache.org/jira/browse/CASSANDRA-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207192#comment-13207192 ] Pavel Yaskevich commented on CASSANDRA-3886: +1 Pig can't store some types after loading them - Key: CASSANDRA-3886 URL: https://issues.apache.org/jira/browse/CASSANDRA-3886 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.7 Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 1.0.8 Attachments: 3886.txt In CASSANDRA-2810, we removed the decompose methods in putNext instead relying on objToBB, however it cannot sufficiently handle all types. For instance, if longs are loaded and then an attempt to store them is made, this causes a cast exception: java.io.IOException: java.io.IOException: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.DataByteArray Output must be (key, {(column,value)...}) for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for SuperColumnFamily -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3886) Pig can't store some types after loading them
[ https://issues.apache.org/jira/browse/CASSANDRA-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams resolved CASSANDRA-3886. - Resolution: Fixed Committed. Pig can't store some types after loading them - Key: CASSANDRA-3886 URL: https://issues.apache.org/jira/browse/CASSANDRA-3886 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.7 Reporter: Brandon Williams Assignee: Brandon Williams Fix For: 1.0.8 Attachments: 3886.txt In CASSANDRA-2810, we removed the decompose methods in putNext instead relying on objToBB, however it cannot sufficiently handle all types. For instance, if longs are loaded and then an attempt to store them is made, this causes a cast exception: java.io.IOException: java.io.IOException: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.pig.data.DataByteArray Output must be (key, {(column,value)...}) for ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for SuperColumnFamily -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3904) do not generate NPE on aborted stream-out sessions
do not generate NPE on aborted stream-out sessions -- Key: CASSANDRA-3904 URL: https://issues.apache.org/jira/browse/CASSANDRA-3904 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Fix For: 1.1.0 https://issues.apache.org/jira/browse/CASSANDRA-3569?focusedCommentId=13207189page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13207189 Attaching patch to make this a friendlier log entry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
git commit: Add catch-all cast back to CassandraStorage. Patch by brandonwilliams reviewed by xedin for CASSANDRA-3886
Updated Branches: refs/heads/cassandra-1.0 2a5547981 - 104791412 Add catch-all cast back to CassandraStorage. Patch by brandonwilliams reviewed by xedin for CASSANDRA-3886 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/10479141 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/10479141 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/10479141 Branch: refs/heads/cassandra-1.0 Commit: 10479141285c885fcd77571a9b2397d684ecf826 Parents: 2a55479 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 14:45:48 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 14:50:52 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/10479141/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index 76a291a..975d5ba 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -502,7 +502,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo return DoubleType.instance.decompose((Double)o); if (o instanceof UUID) return ByteBuffer.wrap(UUIDGen.decompose((UUID) o)); -return null; +return ByteBuffer.wrap(((DataByteArray) o).get()); } public void putNext(Tuple t) throws ExecException, IOException
[jira] [Updated] (CASSANDRA-3904) do not generate NPE on aborted stream-out sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3904: -- Attachment: CASSANDRA-3904-1.1.txt Attaching patch again 1.1. It replaces NPE with a friendlier message, and also augments the original stream out session message to clarify that streams may still be going in the background. do not generate NPE on aborted stream-out sessions -- Key: CASSANDRA-3904 URL: https://issues.apache.org/jira/browse/CASSANDRA-3904 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Fix For: 1.1.0 Attachments: CASSANDRA-3904-1.1.txt https://issues.apache.org/jira/browse/CASSANDRA-3569?focusedCommentId=13207189page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13207189 Attaching patch to make this a friendlier log entry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams
[ https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207224#comment-13207224 ] Peter Schuller commented on CASSANDRA-3569: --- NPE followed up in CASSANDRA-3904. Failure detector downs should not break streams --- Key: CASSANDRA-3569 URL: https://issues.apache.org/jira/browse/CASSANDRA-3569 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller CASSANDRA-2433 introduced this behavior just to get repairs to don't sit there waiting forever. In my opinion the correct fix to that problem is to use TCP keep alive. Unfortunately the TCP keep alive period is insanely high by default on a modern Linux, so just doing that is not entirely good either. But using the failure detector seems non-sensicle to me. We have a communication method which is the TCP transport, that we know is used for long-running processes that you don't want to incorrectly be killed for no good reason, and we are using a failure detector tuned to detecting when not to send real-time sensitive request to nodes in order to actively kill a working connection. So, rather than add complexity with protocol based ping/pongs and such, I propose that we simply just use TCP keep alive for streaming connections and instruct operators of production clusters to tweak net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent on their OS). I can submit the patch. Awaiting opinions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3904) do not generate NPE on aborted stream-out sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3904: -- Reviewer: yukim do not generate NPE on aborted stream-out sessions -- Key: CASSANDRA-3904 URL: https://issues.apache.org/jira/browse/CASSANDRA-3904 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Fix For: 1.1.0 Attachments: CASSANDRA-3904-1.1.txt https://issues.apache.org/jira/browse/CASSANDRA-3569?focusedCommentId=13207189page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13207189 Attaching patch to make this a friendlier log entry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3371) Cassandra inferred schema and actual data don't match
[ https://issues.apache.org/jira/browse/CASSANDRA-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-3371: Attachment: smoke_test.txt 3371-v6.txt v6 is rebased and contains minor cleanups, smoke_test contains a file to be replayed by the cli and a pig script to exercise loading/storing every cassandra type. Cassandra inferred schema and actual data don't match - Key: CASSANDRA-3371 URL: https://issues.apache.org/jira/browse/CASSANDRA-3371 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.7 Reporter: Pete Warden Assignee: Brandon Williams Attachments: 0001-Rework-pig-schema.txt, 0002-Output-support-to-match-input.txt, 3371-v2.txt, 3371-v3.txt, 3371-v4.txt, 3371-v5-rebased.txt, 3371-v5.txt, 3371-v6.txt, pig.diff, smoke_test.txt It's looking like there may be a mismatch between the schema that's being reported by the latest CassandraStorage.java, and the data that's actually returned. Here's an example: rows = LOAD 'cassandra://Frap/PhotoVotes' USING CassandraStorage(); DESCRIBE rows; rows: {key: chararray,columns: {(name: chararray,value: bytearray,photo_owner: chararray,value_photo_owner: bytearray,pid: chararray,value_pid: bytearray,matched_string: chararray,value_matched_string: bytearray,src_big: chararray,value_src_big: bytearray,time: chararray,value_time: bytearray,vote_type: chararray,value_vote_type: bytearray,voter: chararray,value_voter: bytearray)}} DUMP rows; (691831038_1317937188.48955,{(photo_owner,1596090180),(pid,6855155124568798560),(matched_string,),(src_big,),(time,Thu Oct 06 14:39:48 -0700 2011),(vote_type,album_dislike),(voter,691831038)}) getSchema() is reporting the columns as an inner bag of tuples, each of which contains 16 values. In fact, getNext() seems to return an inner bag containing 7 tuples, each of which contains two values. It appears that things got out of sync with this change: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?r1=1177083r2=1177082pathrev=1177083 See more discussion at: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/pig-cassandra-problem-quot-Incompatible-field-schema-quot-error-tc6882703.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[3/5] git commit: Merge branch '3886' into cassandra-1.0
Merge branch '3886' into cassandra-1.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/cb0efd09 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/cb0efd09 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/cb0efd09 Branch: refs/heads/cassandra-1.1 Commit: cb0efd09cf077799f4934d900089f87b4db06d9e Parents: c3dc789 742648c Author: Brandon Williams brandonwilli...@apache.org Authored: Fri Feb 10 12:01:18 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Fri Feb 10 12:01:18 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) --
[4/5] git commit: Pig's objToBB should handle all types. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-3886
Pig's objToBB should handle all types. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-3886 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/742648c8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/742648c8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/742648c8 Branch: refs/heads/cassandra-1.1 Commit: 742648c821bb5922018423ff5f360233017a08ba Parents: 22b8a97 Author: Brandon Williams brandonwilli...@apache.org Authored: Fri Feb 10 10:07:53 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Fri Feb 10 12:00:07 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/742648c8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index b1af1b5..63758ab 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -491,8 +491,18 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o == null) return (ByteBuffer)o; if (o instanceof java.lang.String) -o = new DataByteArray((String)o); -return ByteBuffer.wrap(((DataByteArray) o).get()); +return new ByteBuffer.wrap(DataByteArray((String)o).get()); +if (o instanceof Integer) +return IntegerType.instance.decompose((BigInteger)o); +if (o instanceof Long) +return LongType.instance.decompose((Long)o); +if (o instanceof Float) +return FloatType.instance.decompose((Float)o); +if (o instanceof Double) +return DoubleType.instance.decompose((Double)o); +if (o instanceof UUID) +return ByteBuffer.wrap(UUIDGen.decompose((UUID) o)); +return null; } public void putNext(Tuple t) throws ExecException, IOException
[1/5] git commit: merge from 1.0
Updated Branches: refs/heads/cassandra-1.1 9a842c7b3 - c5986871c merge from 1.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c5986871 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c5986871 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c5986871 Branch: refs/heads/cassandra-1.1 Commit: c5986871c007f8c552ff624d1fcf064ce6a45c92 Parents: 9a842c7 b55ab4f Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:41:30 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:41:30 2012 -0600 -- CHANGES.txt|3 -- .../cassandra/hadoop/pig/CassandraStorage.java | 14 ++- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |6 ++-- 5 files changed, 37 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c5986871/CHANGES.txt -- diff --cc CHANGES.txt index e115a2a,0875da5..359e699 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,83 -1,3 +1,80 @@@ +1.1-dev + * add nodetool rebuild_index (CASSANDRA-3583) + * add nodetool rangekeysample (CASSANDRA-2917) + * Fix streaming too much data during move operations (CASSANDRA-3639) + * Nodetool and CLI connect to localhost by default (CASSANDRA-3568) + * Reduce memory used by primary index sample (CASSANDRA-3743) + * (Hadoop) separate input/output configurations (CASSANDRA-3197, 3765) + * avoid returning internal Cassandra classes over JMX (CASSANDRA-2805) + * add row-level isolation via SnapTree (CASSANDRA-2893) + * Optimize key count estimation when opening sstable on startup + (CASSANDRA-2988) + * multi-dc replication optimization supporting CL ONE (CASSANDRA-3577) + * add command to stop compactions (CASSANDRA-1740, 3566, 3582) + * multithreaded streaming (CASSANDRA-3494) + * removed in-tree redhat spec (CASSANDRA-3567) + * defragment rows for name-based queries under STCS, again (CASSANDRA-2503) + * Recycle commitlog segments for improved performance + (CASSANDRA-3411, 3543, 3557, 3615) + * update size-tiered compaction to prioritize small tiers (CASSANDRA-2407) + * add message expiration logic to OutboundTcpConnection (CASSANDRA-3005) + * off-heap cache to use sun.misc.Unsafe instead of JNA (CASSANDRA-3271) + * EACH_QUORUM is only supported for writes (CASSANDRA-3272) + * replace compactionlock use in schema migration by checking CFS.isValid + (CASSANDRA-3116) + * recognize that SELECT first ... * isn't really SELECT * (CASSANDRA-3445) + * Use faster bytes comparison (CASSANDRA-3434) + * Bulk loader is no longer a fat client, (HADOOP) bulk load output format + (CASSANDRA-3045) + * (Hadoop) add support for KeyRange.filter + * remove assumption that keys and token are in bijection + (CASSANDRA-1034, 3574, 3604) + * always remove endpoints from delevery queue in HH (CASSANDRA-3546) + * fix race between cf flush and its 2ndary indexes flush (CASSANDRA-3547) + * fix potential race in AES when a repair fails (CASSANDRA-3548) + * Remove columns shadowed by a deleted container even when we cannot purge + (CASSANDRA-3538) + * Improve memtable slice iteration performance (CASSANDRA-3545) + * more efficient allocation of small bloom filters (CASSANDRA-3618) + * Use separate writer thread in SSTableSimpleUnsortedWriter (CASSANDRA-3619) + * fsync the directory after new sstable or commitlog segment are created (CASSANDRA-3250) + * fix minor issues reported by FindBugs (CASSANDRA-3658) + * global key/row caches (CASSANDRA-3143, 3849) + * optimize memtable iteration during range scan (CASSANDRA-3638) + * introduce 'crc_check_chance' in CompressionParameters to support + a checksum percentage checking chance similarly to read-repair (CASSANDRA-3611) + * a way to deactivate global key/row cache on per-CF basis (CASSANDRA-3667) + * fix LeveledCompactionStrategy broken because of generation pre-allocation + in LeveledManifest (CASSANDRA-3691) + * finer-grained control over data directories (CASSANDRA-2749) + * Fix ClassCastException during hinted handoff (CASSANDRA-3694) + * Upgrade Thrift to 0.7 (CASSANDRA-3213) + * Make stress.java insert operation to use microseconds (CASSANDRA-3725) + * Allows (internally) doing a range query with a limit of columns instead of + rows (CASSANDRA-3742) + * Allow rangeSlice queries to be start/end inclusive/exclusive (CASSANDRA-3749) + * Fix BulkLoader to support new SSTable layout and add stream + throttling to prevent an NPE when there is no yaml config (CASSANDRA-3752) + * Allow concurrent schema
[5/5] git commit: avoid including non-queried nodes in rangeslice read repair patch by jbellis; reviewed by Vijay for CASSANDRA-3843
avoid including non-queried nodes in rangeslice read repair patch by jbellis; reviewed by Vijay for CASSANDRA-3843 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3dc7894 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3dc7894 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3dc7894 Branch: refs/heads/cassandra-1.1 Commit: c3dc7894159ad413f9c8fa0cc0024c6ed0984831 Parents: 22b8a97 Author: Jonathan Ellis jbel...@apache.org Authored: Wed Feb 8 22:28:47 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Thu Feb 9 15:33:31 2012 -0600 -- CHANGES.txt|7 +++ .../service/RangeSliceResponseResolver.java| 10 +++--- .../org/apache/cassandra/service/StorageProxy.java |6 -- 3 files changed, 14 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3dc7894/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index cca24a9..0875da5 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,9 +1,8 @@ -1.0.9 +1.0.8 + * avoid including non-queried nodes in rangeslice read repair + (CASSANDRA-3843) * Only snapshot CF being compacted for snapshot_before_compaction (CASSANDRA-3803) - - -1.0.8 * Log active compactions in StatusLogger (CASSANDRA-3703) * Compute more accurate compaction score per level (CASSANDRA-3790) * Return InvalidRequest when using a keyspace that doesn't exist http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3dc7894/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java -- diff --git a/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java b/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java index 3be61d1..a870d5c 100644 --- a/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java +++ b/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java @@ -56,16 +56,20 @@ public class RangeSliceResponseResolver implements IResponseResolverIterableRo }; private final String table; -private final ListInetAddress sources; +private ListInetAddress sources; protected final CollectionMessage responses = new LinkedBlockingQueueMessage();; public final ListIAsyncResult repairResults = new ArrayListIAsyncResult(); -public RangeSliceResponseResolver(String table, ListInetAddress sources) +public RangeSliceResponseResolver(String table) { -this.sources = sources; this.table = table; } +public void setSources(ListInetAddress endpoints) +{ +this.sources = endpoints; +} + public ListRow getData() throws IOException { Message response = responses.iterator().next(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3dc7894/src/java/org/apache/cassandra/service/StorageProxy.java -- diff --git a/src/java/org/apache/cassandra/service/StorageProxy.java b/src/java/org/apache/cassandra/service/StorageProxy.java index 0672b3f..27db551 100644 --- a/src/java/org/apache/cassandra/service/StorageProxy.java +++ b/src/java/org/apache/cassandra/service/StorageProxy.java @@ -814,9 +814,10 @@ public class StorageProxy implements StorageProxyMBean RangeSliceCommand c2 = new RangeSliceCommand(command.keyspace, command.column_family, command.super_column, command.predicate, range, command.max_keys); // collect replies and resolve according to consistency level -RangeSliceResponseResolver resolver = new RangeSliceResponseResolver(command.keyspace, liveEndpoints); +RangeSliceResponseResolver resolver = new RangeSliceResponseResolver(command.keyspace); ReadCallbackIterableRow handler = getReadCallback(resolver, command, consistency_level, liveEndpoints); handler.assureSufficientLiveNodes(); +resolver.setSources(handler.endpoints); for (InetAddress endpoint : handler.endpoints) { MessagingService.instance().sendRR(c2, endpoint, handler); @@ -1071,7 +1072,7 @@ public class StorageProxy implements StorageProxyMBean DatabaseDescriptor.getEndpointSnitch().sortByProximity(FBUtilities.getBroadcastAddress(), liveEndpoints); // collect replies and resolve according to consistency level -RangeSliceResponseResolver resolver = new RangeSliceResponseResolver(keyspace, liveEndpoints); +RangeSliceResponseResolver resolver = new
[2/5] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b55ab4f3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b55ab4f3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b55ab4f3 Branch: refs/heads/cassandra-1.1 Commit: b55ab4f3b23b9f3f056ffcc526d2b06989e024fb Parents: cb0efd0 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:31:43 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |4 +- 3 files changed, 24 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index 2ae0a98..b6a99b2 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -88,7 +88,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC TokenMetadata dcTokens = new TokenMetadata(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokens.updateNormalToken(tokenEntry.getKey(), tokenEntry.getValue()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index ebb094b..0942a5d 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -408,11 +408,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -713,9 +708,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 1f7a18d..f82fe32 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -854,7 +854,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapToken, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); MapToken, String mapString = new HashMapToken, String(mapInetAddress.size()); for (Map.EntryToken, InetAddress entry : mapInetAddress.entrySet()) { @@ -2074,7 +2074,7 @@ public
[3/3] git commit: Pig's objToBB should handle all types. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-3886
Pig's objToBB should handle all types. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-3886 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bcad0688 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bcad0688 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bcad0688 Branch: refs/heads/trunk Commit: bcad06883dc599c77393bc4eb2807be9da3d294a Parents: c49a149 Author: Brandon Williams brandonwilli...@apache.org Authored: Fri Feb 10 10:07:53 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:43:03 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bcad0688/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index 9c6dd30..ebd118c 100644 --- a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -561,8 +561,18 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o == null) return (ByteBuffer)o; if (o instanceof java.lang.String) -o = new DataByteArray((String)o); -return ByteBuffer.wrap(((DataByteArray) o).get()); +return new ByteBuffer.wrap(DataByteArray((String)o).get()); +if (o instanceof Integer) +return IntegerType.instance.decompose((BigInteger)o); +if (o instanceof Long) +return LongType.instance.decompose((Long)o); +if (o instanceof Float) +return FloatType.instance.decompose((Float)o); +if (o instanceof Double) +return DoubleType.instance.decompose((Double)o); +if (o instanceof UUID) +return ByteBuffer.wrap(UUIDGen.decompose((UUID) o)); +return null; } public void putNext(Tuple t) throws ExecException, IOException
[jira] [Created] (CASSANDRA-3905) fix typo in nodetool help for repair
fix typo in nodetool help for repair Key: CASSANDRA-3905 URL: https://issues.apache.org/jira/browse/CASSANDRA-3905 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller Priority: Trivial It says to use {{-rp}} instead of {{-pr}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[2/3] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/79050449 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/79050449 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/79050449 Branch: refs/heads/trunk Commit: 79050449e7e953a301e275a755a2b5f3a5b0d06a Parents: bcad068 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:44:29 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |6 ++-- 3 files changed, 25 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/79050449/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index ffbabd6..382e224 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -90,7 +90,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC; add in bulk to token meta data for computational complexity // reasons (CASSANDRA-3831). SetPairToken, InetAddress dcTokensToUpdate = new HashSetPairToken, InetAddress(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokensToUpdate.add(Pair.create(tokenEntry.getKey(), tokenEntry.getValue())); http://git-wip-us.apache.org/repos/asf/cassandra/blob/79050449/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index b02daae..4d89f92 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -436,11 +436,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -741,9 +736,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/79050449/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index c1681b9..9bcd54d 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -908,7 +908,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapString, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); // in order to preserve tokens in ascending order, we
[1/3] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
Updated Branches: refs/heads/cassandra-1.0 104791412 - 4ab6fad94 refs/heads/trunk c49a1497e - 79050449e fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4ab6fad9 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4ab6fad9 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4ab6fad9 Branch: refs/heads/cassandra-1.0 Commit: 4ab6fad945cada90497a8cf523a4c868932834c2 Parents: 1047914 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:44:50 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |4 +- 3 files changed, 24 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index 2ae0a98..b6a99b2 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -88,7 +88,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC TokenMetadata dcTokens = new TokenMetadata(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokens.updateNormalToken(tokenEntry.getKey(), tokenEntry.getValue()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index ebb094b..0942a5d 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -408,11 +408,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -713,9 +708,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 1f7a18d..f82fe32 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -854,7 +854,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapToken, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); MapToken, String mapString = new HashMapToken, String(mapInetAddress.size());
[jira] [Updated] (CASSANDRA-3905) fix typo in nodetool help for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3905: -- Attachment: CASSANDRA-3905.txt fix typo in nodetool help for repair Key: CASSANDRA-3905 URL: https://issues.apache.org/jira/browse/CASSANDRA-3905 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller Priority: Trivial Fix For: 1.1.0 Attachments: CASSANDRA-3905.txt It says to use {{-rp}} instead of {{-pr}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3905) fix typo in nodetool help for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3905: -- Fix Version/s: 1.1.0 fix typo in nodetool help for repair Key: CASSANDRA-3905 URL: https://issues.apache.org/jira/browse/CASSANDRA-3905 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller Priority: Trivial Fix For: 1.1.0 Attachments: CASSANDRA-3905.txt It says to use {{-rp}} instead of {{-pr}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3905) fix typo in nodetool help for repair
[ https://issues.apache.org/jira/browse/CASSANDRA-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207261#comment-13207261 ] Jonathan Ellis commented on CASSANDRA-3905: --- +1 fix typo in nodetool help for repair Key: CASSANDRA-3905 URL: https://issues.apache.org/jira/browse/CASSANDRA-3905 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller Priority: Trivial Fix For: 1.1.0 Attachments: CASSANDRA-3905.txt It says to use {{-rp}} instead of {{-pr}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[3/8] git commit: fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712
fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9ca84786 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9ca84786 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9ca84786 Branch: refs/heads/cassandra-1.1 Commit: 9ca84786b5be14b0a881268e3649b697f7f893b9 Parents: 4ab6fad Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 16:30:34 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 16:30:34 2012 -0600 -- CHANGES.txt|1 + src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 24 ++- 3 files changed, 18 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0875da5..500b9fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 1.0.8 + * fix race between cleanup and flush on secondary index CFSes (CASSANDRA-3712) * avoid including non-queried nodes in rangeslice read repair (CASSANDRA-3843) * Only snapshot CF being compacted for snapshot_before_compaction http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/Table.java -- diff --git a/src/java/org/apache/cassandra/db/Table.java b/src/java/org/apache/cassandra/db/Table.java index 0168f0c..f954fbc 100644 --- a/src/java/org/apache/cassandra/db/Table.java +++ b/src/java/org/apache/cassandra/db/Table.java @@ -71,7 +71,7 @@ public class Table * * (Enabling fairness in the RRWL is observed to decrease throughput, so we leave it off.) */ -static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); +public static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); // It is possible to call Table.open without a running daemon, so it makes sense to ensure // proper directories here as well as in CassandraDaemon. http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index caaf6d2..97e5067 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -729,14 +729,13 @@ public class CompactionManager implements CompactionManagerMBean } else { - cfs.invalidateCachedRow(row.getKey()); - + if (!indexedColumns.isEmpty() || isCommutative) { if (indexedColumnsInRow != null) indexedColumnsInRow.clear(); - + while (row.hasNext()) { IColumn column = row.next(); @@ -746,13 +745,24 @@ public class CompactionManager implements CompactionManagerMBean { if (indexedColumnsInRow == null) indexedColumnsInRow = new ArrayListIColumn(); - + indexedColumnsInRow.add(column); } } - + if (indexedColumnsInRow != null !indexedColumnsInRow.isEmpty()) - cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +{ +// acquire memtable lock here because secondary index deletion may cause a race. See CASSANDRA-3712 +Table.switchLock.readLock().lock(); +try +{ + cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +} +finally +{ +
[2/8] git commit: fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712
fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9ca84786 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9ca84786 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9ca84786 Branch: refs/heads/cassandra-1.0 Commit: 9ca84786b5be14b0a881268e3649b697f7f893b9 Parents: 4ab6fad Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 16:30:34 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 16:30:34 2012 -0600 -- CHANGES.txt|1 + src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 24 ++- 3 files changed, 18 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0875da5..500b9fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 1.0.8 + * fix race between cleanup and flush on secondary index CFSes (CASSANDRA-3712) * avoid including non-queried nodes in rangeslice read repair (CASSANDRA-3843) * Only snapshot CF being compacted for snapshot_before_compaction http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/Table.java -- diff --git a/src/java/org/apache/cassandra/db/Table.java b/src/java/org/apache/cassandra/db/Table.java index 0168f0c..f954fbc 100644 --- a/src/java/org/apache/cassandra/db/Table.java +++ b/src/java/org/apache/cassandra/db/Table.java @@ -71,7 +71,7 @@ public class Table * * (Enabling fairness in the RRWL is observed to decrease throughput, so we leave it off.) */ -static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); +public static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); // It is possible to call Table.open without a running daemon, so it makes sense to ensure // proper directories here as well as in CassandraDaemon. http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index caaf6d2..97e5067 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -729,14 +729,13 @@ public class CompactionManager implements CompactionManagerMBean } else { - cfs.invalidateCachedRow(row.getKey()); - + if (!indexedColumns.isEmpty() || isCommutative) { if (indexedColumnsInRow != null) indexedColumnsInRow.clear(); - + while (row.hasNext()) { IColumn column = row.next(); @@ -746,13 +745,24 @@ public class CompactionManager implements CompactionManagerMBean { if (indexedColumnsInRow == null) indexedColumnsInRow = new ArrayListIColumn(); - + indexedColumnsInRow.add(column); } } - + if (indexedColumnsInRow != null !indexedColumnsInRow.isEmpty()) - cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +{ +// acquire memtable lock here because secondary index deletion may cause a race. See CASSANDRA-3712 +Table.switchLock.readLock().lock(); +try +{ + cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +} +finally +{ +
[5/8] git commit: Add catch-all cast back to CassandraStorage. Patch by brandonwilliams reviewed by xedin for CASSANDRA-3886
Add catch-all cast back to CassandraStorage. Patch by brandonwilliams reviewed by xedin for CASSANDRA-3886 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/10479141 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/10479141 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/10479141 Branch: refs/heads/cassandra-1.1 Commit: 10479141285c885fcd77571a9b2397d684ecf826 Parents: 2a55479 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 14:45:48 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 14:50:52 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/10479141/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index 76a291a..975d5ba 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -502,7 +502,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo return DoubleType.instance.decompose((Double)o); if (o instanceof UUID) return ByteBuffer.wrap(UUIDGen.decompose((UUID) o)); -return null; +return ByteBuffer.wrap(((DataByteArray) o).get()); } public void putNext(Tuple t) throws ExecException, IOException
[4/8] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4ab6fad9 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4ab6fad9 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4ab6fad9 Branch: refs/heads/cassandra-1.1 Commit: 4ab6fad945cada90497a8cf523a4c868932834c2 Parents: 1047914 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:44:50 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |4 +- 3 files changed, 24 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index 2ae0a98..b6a99b2 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -88,7 +88,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC TokenMetadata dcTokens = new TokenMetadata(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokens.updateNormalToken(tokenEntry.getKey(), tokenEntry.getValue()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index ebb094b..0942a5d 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -408,11 +408,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -713,9 +708,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 1f7a18d..f82fe32 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -854,7 +854,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapToken, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); MapToken, String mapString = new HashMapToken, String(mapInetAddress.size()); for (Map.EntryToken, InetAddress entry : mapInetAddress.entrySet()) { @@ -2074,7 +2074,7 @@ public
[1/8] git commit: merge from 1.0
Updated Branches: refs/heads/cassandra-1.0 4ab6fad94 - 9ca84786b refs/heads/cassandra-1.1 c5986871c - c98edc3e8 merge from 1.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c98edc3e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c98edc3e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c98edc3e Branch: refs/heads/cassandra-1.1 Commit: c98edc3e81c8c1e19370802ab6c82a7e5ff00f42 Parents: c598687 9ca8478 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 16:31:41 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 16:31:41 2012 -0600 -- CHANGES.txt|1 + src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 26 +- .../cassandra/hadoop/pig/CassandraStorage.java |6 ++-- .../apache/cassandra/thrift/CustomTHsHaServer.java |8 5 files changed, 30 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c98edc3e/CHANGES.txt -- diff --cc CHANGES.txt index 359e699,500b9fb..d39c9dd --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,81 -1,5 +1,82 @@@ +1.1-dev + * add nodetool rebuild_index (CASSANDRA-3583) + * add nodetool rangekeysample (CASSANDRA-2917) + * Fix streaming too much data during move operations (CASSANDRA-3639) + * Nodetool and CLI connect to localhost by default (CASSANDRA-3568) + * Reduce memory used by primary index sample (CASSANDRA-3743) + * (Hadoop) separate input/output configurations (CASSANDRA-3197, 3765) + * avoid returning internal Cassandra classes over JMX (CASSANDRA-2805) + * add row-level isolation via SnapTree (CASSANDRA-2893) + * Optimize key count estimation when opening sstable on startup + (CASSANDRA-2988) + * multi-dc replication optimization supporting CL ONE (CASSANDRA-3577) + * add command to stop compactions (CASSANDRA-1740, 3566, 3582) + * multithreaded streaming (CASSANDRA-3494) + * removed in-tree redhat spec (CASSANDRA-3567) + * defragment rows for name-based queries under STCS, again (CASSANDRA-2503) + * Recycle commitlog segments for improved performance + (CASSANDRA-3411, 3543, 3557, 3615) + * update size-tiered compaction to prioritize small tiers (CASSANDRA-2407) + * add message expiration logic to OutboundTcpConnection (CASSANDRA-3005) + * off-heap cache to use sun.misc.Unsafe instead of JNA (CASSANDRA-3271) + * EACH_QUORUM is only supported for writes (CASSANDRA-3272) + * replace compactionlock use in schema migration by checking CFS.isValid + (CASSANDRA-3116) + * recognize that SELECT first ... * isn't really SELECT * (CASSANDRA-3445) + * Use faster bytes comparison (CASSANDRA-3434) + * Bulk loader is no longer a fat client, (HADOOP) bulk load output format + (CASSANDRA-3045) + * (Hadoop) add support for KeyRange.filter + * remove assumption that keys and token are in bijection + (CASSANDRA-1034, 3574, 3604) + * always remove endpoints from delevery queue in HH (CASSANDRA-3546) + * fix race between cf flush and its 2ndary indexes flush (CASSANDRA-3547) + * fix potential race in AES when a repair fails (CASSANDRA-3548) + * Remove columns shadowed by a deleted container even when we cannot purge + (CASSANDRA-3538) + * Improve memtable slice iteration performance (CASSANDRA-3545) + * more efficient allocation of small bloom filters (CASSANDRA-3618) + * Use separate writer thread in SSTableSimpleUnsortedWriter (CASSANDRA-3619) + * fsync the directory after new sstable or commitlog segment are created (CASSANDRA-3250) + * fix minor issues reported by FindBugs (CASSANDRA-3658) + * global key/row caches (CASSANDRA-3143, 3849) + * optimize memtable iteration during range scan (CASSANDRA-3638) + * introduce 'crc_check_chance' in CompressionParameters to support + a checksum percentage checking chance similarly to read-repair (CASSANDRA-3611) + * a way to deactivate global key/row cache on per-CF basis (CASSANDRA-3667) + * fix LeveledCompactionStrategy broken because of generation pre-allocation + in LeveledManifest (CASSANDRA-3691) + * finer-grained control over data directories (CASSANDRA-2749) + * Fix ClassCastException during hinted handoff (CASSANDRA-3694) + * Upgrade Thrift to 0.7 (CASSANDRA-3213) + * Make stress.java insert operation to use microseconds (CASSANDRA-3725) + * Allows (internally) doing a range query with a limit of columns instead of + rows (CASSANDRA-3742) + * Allow rangeSlice queries to be start/end inclusive/exclusive (CASSANDRA-3749) + * Fix BulkLoader to support new SSTable layout and add stream + throttling to prevent an NPE when there is no yaml config
[8/8] git commit: Fix misplaced 'new' keyword
Fix misplaced 'new' keyword Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/651ca528 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/651ca528 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/651ca528 Branch: refs/heads/cassandra-1.1 Commit: 651ca528d24f088581055cfbd4c70115e04899ea Parents: cb0efd0 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 13:41:03 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 13:41:03 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/651ca528/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index 63758ab..b9977a5 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -491,7 +491,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o == null) return (ByteBuffer)o; if (o instanceof java.lang.String) -return new ByteBuffer.wrap(DataByteArray((String)o).get()); +return ByteBuffer.wrap(new DataByteArray((String)o).get()); if (o instanceof Integer) return IntegerType.instance.decompose((BigInteger)o); if (o instanceof Long)
[6/8] git commit: CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867
CASSANDRA-3867 patch by Vijay; reviewed by Brandon Williams for CASSANDRA-3867 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2a554798 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2a554798 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2a554798 Branch: refs/heads/cassandra-1.1 Commit: 2a5547981dad7e59be2c26aeb52f5d49d2195b9c Parents: 4bd3f8d Author: Vijay Parthasarathy vijay2...@gmail.com Authored: Mon Feb 13 12:42:29 2012 -0800 Committer: Vijay Parthasarathy vijay2...@gmail.com Committed: Mon Feb 13 12:42:29 2012 -0800 -- .../apache/cassandra/thrift/CustomTHsHaServer.java |8 1 files changed, 8 insertions(+), 0 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/2a554798/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java -- diff --git a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java index 4921678..9bfb4f7 100644 --- a/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java +++ b/src/java/org/apache/cassandra/thrift/CustomTHsHaServer.java @@ -177,6 +177,14 @@ public class CustomTHsHaServer extends TNonblockingServer { select(); } +try +{ +selector.close(); // CASSANDRA-3867 +} +catch (IOException e) +{ +// ignore this exception. +} } catch (Throwable t) {
[7/8] git commit: Integer corresponds to Int32Type
Integer corresponds to Int32Type Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4bd3f8d8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4bd3f8d8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4bd3f8d8 Branch: refs/heads/cassandra-1.1 Commit: 4bd3f8d86fcc29259dd0d508873125f88ce588e4 Parents: 651ca52 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 13:48:20 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 13:48:20 2012 -0600 -- .../cassandra/hadoop/pig/CassandraStorage.java |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4bd3f8d8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java -- diff --git a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java index b9977a5..76a291a 100644 --- a/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java +++ b/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java @@ -493,7 +493,7 @@ public class CassandraStorage extends LoadFunc implements StoreFuncInterface, Lo if (o instanceof java.lang.String) return ByteBuffer.wrap(new DataByteArray((String)o).get()); if (o instanceof Integer) -return IntegerType.instance.decompose((BigInteger)o); +return Int32Type.instance.decompose((Integer)o); if (o instanceof Long) return LongType.instance.decompose((Long)o); if (o instanceof Float)
[2/12] git commit: fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712
fix race between cleanup and flush on secondary index CFSes patch by yukim and jbellis for CASSANDRA-3712 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9ca84786 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9ca84786 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9ca84786 Branch: refs/heads/trunk Commit: 9ca84786b5be14b0a881268e3649b697f7f893b9 Parents: 4ab6fad Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 16:30:34 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 16:30:34 2012 -0600 -- CHANGES.txt|1 + src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 24 ++- 3 files changed, 18 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0875da5..500b9fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 1.0.8 + * fix race between cleanup and flush on secondary index CFSes (CASSANDRA-3712) * avoid including non-queried nodes in rangeslice read repair (CASSANDRA-3843) * Only snapshot CF being compacted for snapshot_before_compaction http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/Table.java -- diff --git a/src/java/org/apache/cassandra/db/Table.java b/src/java/org/apache/cassandra/db/Table.java index 0168f0c..f954fbc 100644 --- a/src/java/org/apache/cassandra/db/Table.java +++ b/src/java/org/apache/cassandra/db/Table.java @@ -71,7 +71,7 @@ public class Table * * (Enabling fairness in the RRWL is observed to decrease throughput, so we leave it off.) */ -static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); +public static final ReentrantReadWriteLock switchLock = new ReentrantReadWriteLock(); // It is possible to call Table.open without a running daemon, so it makes sense to ensure // proper directories here as well as in CassandraDaemon. http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ca84786/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index caaf6d2..97e5067 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -729,14 +729,13 @@ public class CompactionManager implements CompactionManagerMBean } else { - cfs.invalidateCachedRow(row.getKey()); - + if (!indexedColumns.isEmpty() || isCommutative) { if (indexedColumnsInRow != null) indexedColumnsInRow.clear(); - + while (row.hasNext()) { IColumn column = row.next(); @@ -746,13 +745,24 @@ public class CompactionManager implements CompactionManagerMBean { if (indexedColumnsInRow == null) indexedColumnsInRow = new ArrayListIColumn(); - + indexedColumnsInRow.add(column); } } - + if (indexedColumnsInRow != null !indexedColumnsInRow.isEmpty()) - cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +{ +// acquire memtable lock here because secondary index deletion may cause a race. See CASSANDRA-3712 +Table.switchLock.readLock().lock(); +try +{ + cfs.indexManager.deleteFromIndexes(row.getKey(), indexedColumnsInRow); +} +finally +{ +Table.switchLock.readLock().unlock(); +
[3/12] git commit: Merge from 1.1
Merge from 1.1 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4d55a36a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4d55a36a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4d55a36a Branch: refs/heads/trunk Commit: 4d55a36aa2b94329507f931a3dffbc4c3547bdf0 Parents: 7905044 c98edc3 Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 13 16:29:36 2012 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 13 16:29:36 2012 -0600 -- CHANGES.txt|4 +-- src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 26 +- .../cassandra/hadoop/pig/CassandraStorage.java |6 ++-- 4 files changed, 22 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d55a36a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java --
[5/12] git commit: merge from 1.0
merge from 1.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c5986871 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c5986871 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c5986871 Branch: refs/heads/trunk Commit: c5986871c007f8c552ff624d1fcf064ce6a45c92 Parents: 9a842c7 b55ab4f Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:41:30 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:41:30 2012 -0600 -- CHANGES.txt|3 -- .../cassandra/hadoop/pig/CassandraStorage.java | 14 ++- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |6 ++-- 5 files changed, 37 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c5986871/CHANGES.txt -- diff --cc CHANGES.txt index e115a2a,0875da5..359e699 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,83 -1,3 +1,80 @@@ +1.1-dev + * add nodetool rebuild_index (CASSANDRA-3583) + * add nodetool rangekeysample (CASSANDRA-2917) + * Fix streaming too much data during move operations (CASSANDRA-3639) + * Nodetool and CLI connect to localhost by default (CASSANDRA-3568) + * Reduce memory used by primary index sample (CASSANDRA-3743) + * (Hadoop) separate input/output configurations (CASSANDRA-3197, 3765) + * avoid returning internal Cassandra classes over JMX (CASSANDRA-2805) + * add row-level isolation via SnapTree (CASSANDRA-2893) + * Optimize key count estimation when opening sstable on startup + (CASSANDRA-2988) + * multi-dc replication optimization supporting CL ONE (CASSANDRA-3577) + * add command to stop compactions (CASSANDRA-1740, 3566, 3582) + * multithreaded streaming (CASSANDRA-3494) + * removed in-tree redhat spec (CASSANDRA-3567) + * defragment rows for name-based queries under STCS, again (CASSANDRA-2503) + * Recycle commitlog segments for improved performance + (CASSANDRA-3411, 3543, 3557, 3615) + * update size-tiered compaction to prioritize small tiers (CASSANDRA-2407) + * add message expiration logic to OutboundTcpConnection (CASSANDRA-3005) + * off-heap cache to use sun.misc.Unsafe instead of JNA (CASSANDRA-3271) + * EACH_QUORUM is only supported for writes (CASSANDRA-3272) + * replace compactionlock use in schema migration by checking CFS.isValid + (CASSANDRA-3116) + * recognize that SELECT first ... * isn't really SELECT * (CASSANDRA-3445) + * Use faster bytes comparison (CASSANDRA-3434) + * Bulk loader is no longer a fat client, (HADOOP) bulk load output format + (CASSANDRA-3045) + * (Hadoop) add support for KeyRange.filter + * remove assumption that keys and token are in bijection + (CASSANDRA-1034, 3574, 3604) + * always remove endpoints from delevery queue in HH (CASSANDRA-3546) + * fix race between cf flush and its 2ndary indexes flush (CASSANDRA-3547) + * fix potential race in AES when a repair fails (CASSANDRA-3548) + * Remove columns shadowed by a deleted container even when we cannot purge + (CASSANDRA-3538) + * Improve memtable slice iteration performance (CASSANDRA-3545) + * more efficient allocation of small bloom filters (CASSANDRA-3618) + * Use separate writer thread in SSTableSimpleUnsortedWriter (CASSANDRA-3619) + * fsync the directory after new sstable or commitlog segment are created (CASSANDRA-3250) + * fix minor issues reported by FindBugs (CASSANDRA-3658) + * global key/row caches (CASSANDRA-3143, 3849) + * optimize memtable iteration during range scan (CASSANDRA-3638) + * introduce 'crc_check_chance' in CompressionParameters to support + a checksum percentage checking chance similarly to read-repair (CASSANDRA-3611) + * a way to deactivate global key/row cache on per-CF basis (CASSANDRA-3667) + * fix LeveledCompactionStrategy broken because of generation pre-allocation + in LeveledManifest (CASSANDRA-3691) + * finer-grained control over data directories (CASSANDRA-2749) + * Fix ClassCastException during hinted handoff (CASSANDRA-3694) + * Upgrade Thrift to 0.7 (CASSANDRA-3213) + * Make stress.java insert operation to use microseconds (CASSANDRA-3725) + * Allows (internally) doing a range query with a limit of columns instead of + rows (CASSANDRA-3742) + * Allow rangeSlice queries to be start/end inclusive/exclusive (CASSANDRA-3749) + * Fix BulkLoader to support new SSTable layout and add stream + throttling to prevent an NPE when there is no yaml config (CASSANDRA-3752) + * Allow concurrent schema migrations (CASSANDRA-1391, 3832) + * Add SnapshotCommand to trigger snapshot on
[4/12] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4ab6fad9 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4ab6fad9 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4ab6fad9 Branch: refs/heads/trunk Commit: 4ab6fad945cada90497a8cf523a4c868932834c2 Parents: 1047914 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:44:50 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |4 +- 3 files changed, 24 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index 2ae0a98..b6a99b2 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -88,7 +88,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC TokenMetadata dcTokens = new TokenMetadata(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokens.updateNormalToken(tokenEntry.getKey(), tokenEntry.getValue()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index ebb094b..0942a5d 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -408,11 +408,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -713,9 +708,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/4ab6fad9/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 1f7a18d..f82fe32 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -854,7 +854,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapToken, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); MapToken, String mapString = new HashMapToken, String(mapInetAddress.size()); for (Map.EntryToken, InetAddress entry : mapInetAddress.entrySet()) { @@ -2074,7 +2074,7 @@ public class
[1/12] git commit: merge from 1.0
Updated Branches: refs/heads/trunk 79050449e - 4d55a36aa merge from 1.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c98edc3e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c98edc3e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c98edc3e Branch: refs/heads/trunk Commit: c98edc3e81c8c1e19370802ab6c82a7e5ff00f42 Parents: c598687 9ca8478 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 16:31:41 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 16:31:41 2012 -0600 -- CHANGES.txt|1 + src/java/org/apache/cassandra/db/Table.java|2 +- .../cassandra/db/compaction/CompactionManager.java | 26 +- .../cassandra/hadoop/pig/CassandraStorage.java |6 ++-- .../apache/cassandra/thrift/CustomTHsHaServer.java |8 5 files changed, 30 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c98edc3e/CHANGES.txt -- diff --cc CHANGES.txt index 359e699,500b9fb..d39c9dd --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,81 -1,5 +1,82 @@@ +1.1-dev + * add nodetool rebuild_index (CASSANDRA-3583) + * add nodetool rangekeysample (CASSANDRA-2917) + * Fix streaming too much data during move operations (CASSANDRA-3639) + * Nodetool and CLI connect to localhost by default (CASSANDRA-3568) + * Reduce memory used by primary index sample (CASSANDRA-3743) + * (Hadoop) separate input/output configurations (CASSANDRA-3197, 3765) + * avoid returning internal Cassandra classes over JMX (CASSANDRA-2805) + * add row-level isolation via SnapTree (CASSANDRA-2893) + * Optimize key count estimation when opening sstable on startup + (CASSANDRA-2988) + * multi-dc replication optimization supporting CL ONE (CASSANDRA-3577) + * add command to stop compactions (CASSANDRA-1740, 3566, 3582) + * multithreaded streaming (CASSANDRA-3494) + * removed in-tree redhat spec (CASSANDRA-3567) + * defragment rows for name-based queries under STCS, again (CASSANDRA-2503) + * Recycle commitlog segments for improved performance + (CASSANDRA-3411, 3543, 3557, 3615) + * update size-tiered compaction to prioritize small tiers (CASSANDRA-2407) + * add message expiration logic to OutboundTcpConnection (CASSANDRA-3005) + * off-heap cache to use sun.misc.Unsafe instead of JNA (CASSANDRA-3271) + * EACH_QUORUM is only supported for writes (CASSANDRA-3272) + * replace compactionlock use in schema migration by checking CFS.isValid + (CASSANDRA-3116) + * recognize that SELECT first ... * isn't really SELECT * (CASSANDRA-3445) + * Use faster bytes comparison (CASSANDRA-3434) + * Bulk loader is no longer a fat client, (HADOOP) bulk load output format + (CASSANDRA-3045) + * (Hadoop) add support for KeyRange.filter + * remove assumption that keys and token are in bijection + (CASSANDRA-1034, 3574, 3604) + * always remove endpoints from delevery queue in HH (CASSANDRA-3546) + * fix race between cf flush and its 2ndary indexes flush (CASSANDRA-3547) + * fix potential race in AES when a repair fails (CASSANDRA-3548) + * Remove columns shadowed by a deleted container even when we cannot purge + (CASSANDRA-3538) + * Improve memtable slice iteration performance (CASSANDRA-3545) + * more efficient allocation of small bloom filters (CASSANDRA-3618) + * Use separate writer thread in SSTableSimpleUnsortedWriter (CASSANDRA-3619) + * fsync the directory after new sstable or commitlog segment are created (CASSANDRA-3250) + * fix minor issues reported by FindBugs (CASSANDRA-3658) + * global key/row caches (CASSANDRA-3143, 3849) + * optimize memtable iteration during range scan (CASSANDRA-3638) + * introduce 'crc_check_chance' in CompressionParameters to support + a checksum percentage checking chance similarly to read-repair (CASSANDRA-3611) + * a way to deactivate global key/row cache on per-CF basis (CASSANDRA-3667) + * fix LeveledCompactionStrategy broken because of generation pre-allocation + in LeveledManifest (CASSANDRA-3691) + * finer-grained control over data directories (CASSANDRA-2749) + * Fix ClassCastException during hinted handoff (CASSANDRA-3694) + * Upgrade Thrift to 0.7 (CASSANDRA-3213) + * Make stress.java insert operation to use microseconds (CASSANDRA-3725) + * Allows (internally) doing a range query with a limit of columns instead of + rows (CASSANDRA-3742) + * Allow rangeSlice queries to be start/end inclusive/exclusive (CASSANDRA-3749) + * Fix BulkLoader to support new SSTable layout and add stream + throttling to prevent an NPE when there is no yaml config (CASSANDRA-3752) + * Allow concurrent schema migrations (CASSANDRA-1391,
[6/12] git commit: fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417
fix unsynchronized use of TokenMetadata.entrySet patch by Peter Schuller; reviewed by jbellis for CASSANDRA-3417 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b55ab4f3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b55ab4f3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b55ab4f3 Branch: refs/heads/trunk Commit: b55ab4f3b23b9f3f056ffcc526d2b06989e024fb Parents: cb0efd0 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 13 15:31:43 2012 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 13 15:31:43 2012 -0600 -- .../cassandra/locator/NetworkTopologyStrategy.java |2 +- .../apache/cassandra/locator/TokenMetadata.java| 28 +++ .../apache/cassandra/service/StorageService.java |4 +- 3 files changed, 24 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java -- diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index 2ae0a98..b6a99b2 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -88,7 +88,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy // collect endpoints in this DC TokenMetadata dcTokens = new TokenMetadata(); -for (EntryToken, InetAddress tokenEntry : tokenMetadata.entrySet()) +for (EntryToken, InetAddress tokenEntry : tokenMetadata.getTokenToEndpointMapForReading().entrySet()) { if (snitch.getDatacenter(tokenEntry.getValue()).equals(dcName)) dcTokens.updateNormalToken(tokenEntry.getKey(), tokenEntry.getValue()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/locator/TokenMetadata.java -- diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java index ebb094b..0942a5d 100644 --- a/src/java/org/apache/cassandra/locator/TokenMetadata.java +++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java @@ -408,11 +408,6 @@ public class TokenMetadata } } -public SetMap.EntryToken,InetAddress entrySet() -{ -return tokenToEndpointMap.entrySet(); -} - public InetAddress getEndpoint(Token token) { lock.readLock().lock(); @@ -713,9 +708,28 @@ public class TokenMetadata } /** - * Return the Token to Endpoint map for all the node in the cluster, including bootstrapping ones. + * @return a token to endpoint map to consider for read operations on the cluster. + */ +public MapToken, InetAddress getTokenToEndpointMapForReading() +{ +lock.readLock().lock(); +try +{ +MapToken, InetAddress map = new HashMapToken, InetAddress(tokenToEndpointMap.size()); +map.putAll(tokenToEndpointMap); +return map; +} +finally +{ +lock.readLock().unlock(); +} +} + +/** + * @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes + * in the cluster. */ -public MapToken, InetAddress getTokenToEndpointMap() +public MapToken, InetAddress getNormalAndBootstrappingTokenToEndpointMap() { lock.readLock().lock(); try http://git-wip-us.apache.org/repos/asf/cassandra/blob/b55ab4f3/src/java/org/apache/cassandra/service/StorageService.java -- diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java index 1f7a18d..f82fe32 100644 --- a/src/java/org/apache/cassandra/service/StorageService.java +++ b/src/java/org/apache/cassandra/service/StorageService.java @@ -854,7 +854,7 @@ public class StorageService implements IEndpointStateChangeSubscriber, StorageSe public MapToken, String getTokenToEndpointMap() { -MapToken, InetAddress mapInetAddress = tokenMetadata_.getTokenToEndpointMap(); +MapToken, InetAddress mapInetAddress = tokenMetadata_.getNormalAndBootstrappingTokenToEndpointMap(); MapToken, String mapString = new HashMapToken, String(mapInetAddress.size()); for (Map.EntryToken, InetAddress entry : mapInetAddress.entrySet()) { @@ -2074,7 +2074,7 @@ public class