Removed node, jumps back into the cluster
I've tested a scenario where I wanted to reuse a removed node in a new cluster with same IP, maybe not very common but anyway, found some strange behaviour in Gossiper. Here is what I think/see happening: - Cassandra 1.1. Three node cluster A, B and C. - Shutdown node C and remove token for node C. - Everything looks ok in logs, reporting that node C is removed etc.. - Node A and B still sends Gossip digest about the removed node, but I guess that's ok since they know about it (Gossiper.endpointStateMap). - Node C has status removed when checking in JMX console. - Checked in LocationInfo that Ring only contains token/IP for node A and B. - Removed system/data tables for C. - Changed seed on C to point to itself. - Startup node C, node C only gossips itself and node A and B doesn't recognize that node C is running, which is correct. - Restart e.g. node A. Now node A will loose all gossip information (Gossiper.endpointStateMap) about node C. Node A will request information from LocationInfo and ask node B about endpoint states. Node A will receive information from node B about node C, this will trigger Gossiper.handleMajorStateChange and node C will be first marked as unreachable because it's in dead state (removed), node A will try to Gossip (unreachable endpoints) to node C, which will reply that it's up and node C becomes incorporated into the old cluster again. Is this a a bug or is it a requirement that if you take a node out of the cluster you must change IP on the removed node if you want to use it in another cluster? Please enlight me. Regards /Fredrik
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? They can't be ignored since pending compactions elevate the upper bound on number of disk seeks you need to make to read a row and you don't get the nice guarantees of leveled compaction. Cheers, Omid [1] https://issues.apache.org/jira/browse/CASSANDRA-4411 [2] https://issues.apache.org/jira/browse/CASSANDRA-4321 On Mon, Sep 10, 2012 at 6:37 PM, Rudolf van der Leeden rudolf.vanderlee...@scoreloop.com wrote: Hi, I'm getting 5 identical assertions while running 'nodetool cleanup' on a Cassandra 1.1.4 node with Load=104G and 80m keys. From system.log : ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:576,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) After 3 hours the job is done and there are 11390 compaction tasks pending. My question: Can these assertions be ignored? Or do I need to worry about it? Thanks for your help and best regards, -Rudolf.
Re: [RELEASE] Apache Cassandra 1.1.5 released
I'm also having AssertionErrors. ERROR [ReadStage:51687] 2012-09-10 14:33:54,211 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:51687,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:64) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:54) ... 14 more ERROR [ReadStage:51801] 2012-09-10 14:44:38,852 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:51801,5,main] java.lang.AssertionError: DecoratedKey(12064825934064381804725403203980154559, 0bc7e1c580001170726573656e746174696f6e5f707074200017696d6167652f782d706f727461626c652d7069786d617000420013000102487fff8001000469636f6e04c90f4f6527560007554e4b4e4f574e000a746578742f782d74657800420013000102487fff8001000469636f6e04c90f8a0b80ac0007554e4b4e4f574e00466170706c69636174696f6e2f766e642e6f70656e786d6c666f726d6174732d6f696365646f63756d656e742e70726573656e746174696f6e6d6c2e736c69646573686f77004c0013000102487fff8001000469636f6e04c90bc7e19a88001170726573656e746174696f6e5f7070732a746578742f782d632b2b00420013000102487fff8001000469636f6e04c90f4e902aaa0007554e4b4e4f574e000c696d6167652f782d78706d6900420013000102487fff8001000469636f6e04c90f4b8360f20007554e4b4e4f574e0013696d6167652f782d77696e646f77732d626d7000440013000102487fff8001000469636f6e04c90bc7de8969696d6167655f626d7000156170706c69636174696f6e2f782d646f736578656300490013000102487fff8001000469636f6e04c90bc7dd973e61706c69636174696f6e5f6578650009766964656f2f64766400430013000102487fff8001000469636f6e04c90bc7e07598746578745f766f620008746578742f63737300430013000102487fff8001000469636f6e04c90bc7e07d68746578745f637373001d6170706c69636174696f6e2f782d73686f636b776176652d666c61736800440013000102487fff8001000469636f6e04c90bc7deb079766964656f5f737766000a746578742f782d61776b00420013000102487fff8001000469636f6e04c9117d73ced50007554e4b4e4f574e00186170706c69636174696f6e2f766e642e6d732d657863656c00430013000102487fff8001000469636f6e04c90bc7df19e80008746578745f786c73000f766964656f2f717569636b74696d6500) != DecoratedKey(121031529647353036275964125031804748412, 6170706c69636174696f6e2f7a6970) in
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT leveled compaction). After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems. Then we ran more tests and created more and very big keys with millions of columns. The assertion only shows up with one particular CF containing these big keys. So, from your explanation, I don't think an offline scrub will help. Thanks, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Could you, as Aaron suggested, open a ticket? -- Omid On Tue, Sep 11, 2012 at 2:35 PM, Rudolf van der Leeden rudolf.vanderlee...@scoreloop.com wrote: Which version of Cassandra has your data been created initially with? A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT leveled compaction). After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems. Then we ran more tests and created more and very big keys with millions of columns. The assertion only shows up with one particular CF containing these big keys. So, from your explanation, I don't think an offline scrub will help. Thanks, -Rudolf.
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Could you, as Aaron suggested, open a ticket? Done: https://issues.apache.org/jira/browse/CASSANDRA-4644
Re: JVM 7, Cass 1.1.1 and G1 garbage collector
Relatedly, I'd love to learn how to reliably reproduce full GC pauses on C* 1.1+. On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin oleg.du...@gmail.com wrote: I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7. It is my feeble attempt to reduce Full GC pauses. Has anyone had any experience with this ? Anyone tried it ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/ -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: JVM 7, Cass 1.1.1 and G1 garbage collector
I was able to run IBM Java 7 with Cassandra (could not do it with 1.6 because of snappy). It has a new Garbage collection policy (called balanced) that is good for very large heap size (over 8 GB), documented http://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html herehttp://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html that is so promising with Cassandra. I have not tried it but I like to see how it is in action. Regrads Shahryar On Mon, Sep 10, 2012 at 1:37 PM, Oleg Dulin oleg.du...@gmail.com wrote: I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7. It is my feeble attempt to reduce Full GC pauses. Has anyone had any experience with this ? Anyone tried it ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? /Janne
Compound Keys: Connecting the dots between CQL3 and Java APIs
Our data architects (ex-Oracle DBA types) are jumping on the CQL3 bandwagon and creating schemas for us. That triggered me to write a quick article mapping the CQL3 schemas to how they are accessed via Java APIs (for our dev team). I hope others find this useful as well: http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) Apache Cassandra MVP mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: replace_token code?
This looks correct… INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850) Node /10.72.201.80 is now part of the cluster INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816) InetAddress /10.72.201.80 is now UP 80 joined the ring because it was in the stored ring state. INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line 1126) Nodes /10.72.201.80 and /10.190.221.204 have the same token 166594924822352415786406422619018814804. Ignoring /10.72.201.80 New node took ownership INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830) InetAddress /10.72.201.80 is now dead. INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644) FatClient /10.72.201.80 has been silent for 3ms, removing from gossip Old node marked as dead and the process to remove is started. Has the 80 node re appeared in the logs ? If it does can you include the output from nodetool gossipinfo ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/09/2012, at 5:59 AM, Yang tedd...@gmail.com wrote: Thanks Jim, looks I'll have to read into the code to understand what is happening under the hood yang On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro jcist...@netflix.com wrote: We have seen various issues from these replaced nodes hanging around. For clusters where a lot of nodes have been replaced, we see these replaced nodes having an impact on heap/GC and a lot of tcp timeouts/retransmits (because the old nodes no longer exist). As a result, we have begun cleaning these up using unsafeAssassinateEndpoint via jmx. We have only started using recently. So far no bad side effects. This also helps because those replaced nodes can appear as unreachable nodes wrt schema and sometimes prevent things like CF truncation. Using unsafeAssassinateEndpoint will clean these from unreachable nodes and will mark them as LEFT in gossip info. There is a ttl for them in gossipinfo and they should go away after 3 days. Once they are marked LEFT, you should stop seeing those up/same/dead messages. unsafeAssassinateEndpoint is unsafe in that, if you specify IP of a real node in cluster, that node will be assassinated. Otherwise, if you specify nodes that have been replaced, it is supposed to work correctly. Hope this helps, jc From: Yang tedd...@gmail.com Reply-To: user@cassandra.apache.org Date: Mon, 10 Sep 2012 01:10:56 -0700 To: user@cassandra.apache.org Subject: replace_token code? it looks that by specifying replace_token, the old owner is not removed from gossip (which I had thought it would do). Then it's understandable that the old owner would resurface later and we get some warning saying that the same token is owned by both. I ran an example with a 2-node cluster, with RF=2. host 10.72.201.80 was running for a while and had some data, then i shut it down, and booted up 10.190.221.204 with replace_token of the old token owned by the previous host. the following log sequence shows that the new host does acquire the token, but it does not at the same time remove 80 forcefully from gossip. instead, a few seconds later, it believed that .80 became live again. I don't have much understanding of the Gossip protocol, but roughly know that it's probability-based, looks we need an assertive/NOW membership control message for replace_token. thanks yang WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 166594924822352415786406422619018814804 changing ownership from /10.72.201.80 to /10.190.221.204 INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753) JOINING: Starting to bootstrap... INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java (line 109) Compacting [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'), SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-3-Data.db'), SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-4-Data.db'), SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-2-Data.db')] INFO [CompactionExecutor:2] 2012-09-10 08:00:21,979 CompactionTask.java (line 221) Compacted to [/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-5-Data.db,]. 499 to 394 (~78% of original) bytes for 3 keys at 0.003997MB/s. Time: 94ms. INFO [Thread-4] 2012-09-10 08:00:22,070 StreamInSession.java (line 214) Finished streaming session 1 from /10.72.102.61 INFO [main] 2012-09-10 08:00:22,073 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops) INFO [FlushWriter:2] 2012-09-10 08:00:22,074 Memtable.java (line 266) Writing Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops)
Re: Cassandra 1.1.1 on Java 7
So, my experiment didn't quite work out. I was hoping to use G1 collector to minimize pauses -- pauses didn't really go away, but what's worse is I think the memtable memory calculations are driven by CMS, so my memtables would fill up and cause Cass to run out of heap :( On 2012-09-09 19:04:41 +, Jeremy Hanna said: Starting with 1.6.0_34, you'll need xss set to 180k. It's updated with the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12). https://issues.apache.org/jira/browse/CASSANDRA-4631 See also the comments on https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference to what required a higher stack. On Sep 9, 2012, at 12:47 PM, Christopher Keller cnkel...@gmail.com wrote: This is necessary under the later versions of 1.6v35 as well. Nodetool will show the cluster as being down even though individual nodes will be up. --Chris On Sep 9, 2012, at 7:13 AM, dong.yajun dongt...@gmail.com wrote: ruuning for a while, you should set the -Xss to more than 160k when you using jdk1.7. On Sun, Sep 9, 2012 at 3:39 AM, Peter Schuller peter.schul...@infidyne.com wrote: Has anyone tried running 1.1.1 on Java 7? Have been running jdk 1.7 on several clusters on 1.1 for a while now. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com) -- Ric Dong Newegg Ecommerce, MIS department -- The downside of being better than everyone else is that people tend to assume you're pretentious. -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. That's true. Unsafe in the sense that your data might not be in the right shape with respect to order of keys in sstables and LCS's properties and you might need to offline-scrub when you upgrade to the latest 1.1.x. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. Cheers, Omid /Janne
Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS
Based on the steps outlined here https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13453156page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13453156it seems that LCS was not used until after 1.1.4 and they were able to do a full repair cleanup compact cycle on 1.1.4 before running into problems. I don't see any major bugfixes for LCS in 1.1.5 either, so this appears to be a legitimate bug if the timeline is correct. On Tue, Sep 11, 2012 at 2:50 PM, Omid Aladini omidalad...@gmail.com wrote: On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables and inter-level overlaps in CFs with Leveled Compaction. Your sstables generated with 1.1.3 and later should not have this issue [1] [2]. Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS. That's true. Unsafe in the sense that your data might not be in the right shape with respect to order of keys in sstables and LCS's properties and you might need to offline-scrub when you upgrade to the latest 1.1.x. In case you have old Leveled-compacted sstables (generated with 1.1.2 or earlier. including 1.0.x) you need to run offline scrub using Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix out-of-order sstables and inter-level overlaps caused by previous versions of LCS. You need to take nodes down in order to run offline scrub. The 1.1.5 README does not mention this. Should it? The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and I agree it would be helpful to have it on NEWS.txt. Cheers, Omid /Janne
Re: replace_token code?
replied in blue, Thanks Yang I thought the very first log line already acquired ownership , instead of later in the sequence? WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 166594924822352415786406422619018814804 changing ownership from / 10.72.201.80 to /10.190.221.204 On Tue, Sep 11, 2012 at 1:55 PM, aaron morton aa...@thelastpickle.comwrote: This looks correct… INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850) Node /10.72.201.80 is now part of the cluster INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816) InetAddress /10.72.201.80 is now UP 80 joined the ring because it was in the stored ring state. This is where I was having a doubt: instead of being allowed to come out from stored ring state, 80 should be immediately purged from ring membership right after the first log line, which purports to have acquired ownership. It's true that token ownership and ring membership are orthogonal things, but here an explicit taking over token operation immediately implies that the old one must be dead, and should be kicked out of the ring. Granted that the detection of duplicate ownership later will kick the old node out, I guess it maybe leaves room for uncertainty before the duplication is detected. INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line 1126) Nodes /10.72.201.80 and /10.190.221.204 have the same token 166594924822352415786406422619018814804. Ignoring /10.72.201.80 New node took ownership INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830) InetAddress /10.72.201.80 is now dead. INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644) FatClient /10.72.201.80 has been silent for 3ms, removing from gossip Old node marked as dead and the process to remove is started. Has the 80 node re appeared in the logs ? no, If it does can you include the output from nodetool gossipinfo ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/09/2012, at 5:59 AM, Yang tedd...@gmail.com wrote: Thanks Jim, looks I'll have to read into the code to understand what is happening under the hood yang On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro jcist...@netflix.com wrote: We have seen various issues from these replaced nodes hanging around. For clusters where a lot of nodes have been replaced, we see these replaced nodes having an impact on heap/GC and a lot of tcp timeouts/retransmits (because the old nodes no longer exist). As a result, we have begun cleaning these up using unsafeAssassinateEndpoint via jmx. We have only started using recently. So far no bad side effects. This also helps because those replaced nodes can appear as unreachable nodes wrt schema and sometimes prevent things like CF truncation. Using unsafeAssassinateEndpoint will clean these from unreachable nodes and will mark them as LEFT in gossip info. There is a ttl for them in gossipinfo and they should go away after 3 days. Once they are marked LEFT, you should stop seeing those up/same/dead messages. unsafeAssassinateEndpoint is unsafe in that, if you specify IP of a real node in cluster, that node will be assassinated. Otherwise, if you specify nodes that have been replaced, it is supposed to work correctly. Hope this helps, jc From: Yang tedd...@gmail.com Reply-To: user@cassandra.apache.org Date: Mon, 10 Sep 2012 01:10:56 -0700 To: user@cassandra.apache.org Subject: replace_token code? it looks that by specifying replace_token, the old owner is not removed from gossip (which I had thought it would do). Then it's understandable that the old owner would resurface later and we get some warning saying that the same token is owned by both. I ran an example with a 2-node cluster, with RF=2. host 10.72.201.80 was running for a while and had some data, then i shut it down, and booted up 10.190.221.204 with replace_token of the old token owned by the previous host. the following log sequence shows that the new host does acquire the token, but it does not at the same time remove 80 forcefully from gossip. instead, a few seconds later, it believed that .80 became live again. I don't have much understanding of the Gossip protocol, but roughly know that it's probability-based, looks we need an assertive/NOW membership control message for replace_token. thanks yang WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 166594924822352415786406422619018814804 changing ownership from / 10.72.201.80 to /10.190.221.204 INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753) JOINING: Starting to bootstrap... INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java (line 109) Compacting [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
How to replace a dead *seed* node while keeping quorum
Hi all, We just ran into an interesting and unexpected situation with restarting a downed node. If the downed node is a seed node then neither of the replace a dead node procedures work (-Dcassandra.replace_token and taking initial_token-1). The ring remains split. The host is listed as a seed in the config for the other members of the ring. If we rename the host then it will rejoin the ring. In other words, if the host name is on the seeds list then it appears that the rest of the ring refuses to bootstrap it. This leads to a problem: If the node needs to be taken out of the seeds list on every working node then that requires a restart of each node - which means that, for short periods, the ring is missing 2 nodes and a quorum read or write (RF=3) will fail. Are there any useful tricks for restarting the node with the same hostname or are we expected to rename the node? Cheers, Edward -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net *866.484.6630* New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about *Global Relay Message* http://www.globalrelay.com/services/message*--- *The Future of Collaboration in the Financial Services World * *All email sent to or from this address will be retained by Global Relay's email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners.
Re: Number of columns per row for Composite Primary Key CQL 3.0
Hi Aaron, Thanks for the suggestion, as always. :) I'll read your slides soon. What is MM stands for? million ? Thanks, Charlie On Mon, Sep 10, 2012 at 6:37 PM, aaron morton aa...@thelastpickle.com wrote: In general wider rows take a bit longer to read, however different access patterns have different performance. I did some tests here http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance and http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ I would suggest 1MM cols is fine, if you get to 10MM cols per row you probably have gone too far. Remember the byte size of the row is also important; larger rows churn memory more and take longer to compact / repair. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/09/2012, at 11:05 AM, Data Craftsman 木匠 database.crafts...@gmail.com wrote: Hello experts. Should I limit the number of rows per Composite Primary Key's leading column? I think it falls into the same wide row good practice for number of columns per row for CQL 2.0, e.g. 10M or less. Any comments will be appreciated. -- Thanks, Charlie (@mujiang) 木匠 === Data Architect Developer 汉唐 田园牧歌DBA http://mujiang.blogspot.com
how to enter float value from cassandra-cli ?
Hi all, I'm trying to manually adding some double values into a column family. From the Hector client, there's a DoubleSerializer. but looks like the cli tool is not providing a way to enter floating point values. here's the message I got: [default@video] set cateogry['1']['sport'] = float('0.5'); Function 'float' not found. Available functions: bytes, integer, long, int, lexicaluuid, timeuuid, utf8, ascii, countercolumn. Is there a way to insert floating pointer value from the cli tool? Thank you. Yuhan
Re: [RELEASE] Apache Cassandra 1.1.5 released
Hi André, That looks like something that I've run into as well on previous versions of Cassandra. Our workaround was to not drop a keyspace and the re-use it (which we were doing as part of a test suite). This is a related stackoverflow post: http://stackoverflow.com/questions/11623356/cassandra-server-throws-java-lang-assertionerror-decoratedkey-decorated Jason On Mon, Sep 10, 2012 at 11:29 PM, André Cruz andre.c...@co.sapo.pt wrote: I'm also having AssertionErrors. ERROR [ReadStage:51687] 2012-09-10 14:33:54,211 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:51687,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:64) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:54) ... 14 more ERROR [ReadStage:51801] 2012-09-10 14:44:38,852 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:51801,5,main] java.lang.AssertionError: DecoratedKey(12064825934064381804725403203980154559,
Re: nodetool connection refused
problems solved. I didn't add the jmx_host and jmx_port to vm_arguments in Eclipse. How come it is not covered in wiki http://wiki.apache.org/cassandra/RunningCassandraInEclipse ? Or is it outdated? On Mon, Sep 10, 2012 at 10:11 AM, Manu Zhang owenzhang1...@gmail.comwrote: It's more like an Eclipse issue now since I find a 0.0.0.0:7199 listener when executing bin/cassandra in terminal but none when running Cassandra in Eclipse. On Sun, Sep 9, 2012 at 12:56 PM, Manu Zhang owenzhang1...@gmail.comwrote: No, I don't find a listener whose port is 7199. Where to setup? I've been experimenting on my laptop so both of them are local. On Sun, Sep 9, 2012 at 1:28 AM, Senthilvel Rangaswamy senthil...@gmail.com wrote: What is the address for thrift listener. Did you put 0.0.0.0:7199 ? On Fri, Sep 7, 2012 at 11:53 PM, Manu Zhang owenzhang1...@gmail.comwrote: When I run Cassandra-trunk in Eclipse, nodetool fail to connect with the following error Failed to connect to '127.0.0.1:7199': Connection refused But if I run in terminal, all will be fine. -- ..Senthil If there's anything more important than my ego around, I want it caught and shot now. - Douglas Adams.
Re: Astyanax InstantiationException when accessing ColumnList
Oops, forgot to mention Cassandra version - 1.1.4 On Tue, Sep 11, 2012 at 5:54 AM, Ran User ranuse...@gmail.com wrote: Stuck for hours on this one, thanks in advance! - Scala 2.9.2 - Astyanax 1.0.6 (also tried 1.0.5) - Using CompositeRowKey, CompositeColumnName - No problem inserting into Cassandra - Can read a row, ColumnList.size() returns correct count however any attempt to access ColumnList (i.e. iterate, access iterate ColumnList, getColumnByIndex(), getColumnByName(), etc) will throw the following exception: Exception: java.lang.RuntimeException: java.lang.InstantiationException relevant stack trace: java.lang.RuntimeException: java.lang.InstantiationException: shops.integration.db.scalaquery.ReportingDao$MetricsLogFileCompositeColumn at com.netflix.astyanax.serializers.AnnotatedCompositeSerializer.fromByteBuffer(AnnotatedCompositeSerializer.java:136) at com.netflix.astyanax.serializers.AbstractSerializer.fromBytes(AbstractSerializer.java:40) at com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.constructMap(ThriftColumnOrSuperColumnListImpl.java:201) at com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumn(ThriftColumnOrSuperColumnListImpl.java:189) at com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumnByName(ThriftColumnOrSuperColumnListImpl.java:103) Relevant sample code: class TestCompositeColumn(@(Component @field) var logFileId: Long, @(Component @field) var dt: String, @(Component @field) var dk: String) extends Ordered[TestCompositeColumn] { def this() = this(0l, , ) //equals, hashCode, compare all implemented } I've also tried this variation on the class: class TestCompositeColumn(idIn: Long, key1In: String, key2In: String) extends Ordered[TestCompositeColumn] { @Component(ordinal = 0) var id: Long = idIn @Component(ordinal = 1) var key1: String = key1In @Component(ordinal = 2) var key2: String = key2In def this() = this(0, null, null) //equals, hashCode, compare all implemented } val TEST_COLUMN_FAMILY = new ColumnFamily[TestRowKey, TestCompositeColumn]( test_column_family, new AnnotatedCompositeSerializer[TestRowKey](classOf[TestRowKey]), new AnnotatedCompositeSerializer[TestCompositeColumn](classOf[TestCompositeColumn]), BytesArraySerializer.get()); var columnList = keyspace.prepareQuery(TEST_COLUMN_FAMILY) .getKey(TestRowKey(1l, 2012090100)) .execute().getResult() // OK - will return 6 for example, also verified via cassandra-cli println(columnList.size()) // ERROR - will throw exception above. Iterating, or any type of access will also throw same exception println(columnList.getColumnByIndex(0).getStringValue()) Thank you!!!