record not available on created system when other system (Node\Seed) is shutdown
Hi, I Installed Cassandra 0.8.4 on two systems configure like below System 1: IP 10.1.1.1 which is acting as seed - seeds: 10.1.1.1 listen_address: 10.1.1.1 rpc_address: 10.1.1.1 System 2: IP 10.1.1.2 which is acting as node - seeds: 10.1.1.1 listen_address: 10.1.1.2 rpc_address: 10.1.1.2 Followed the steps below 1. Now i started System 1 (Seed) and created Key Space called aspace. 2. Now Started the System 2 (Node) and use the Key Space aspace successfully. 3. Now in System 1 (Seed) created column family and maintain a record. 4. Now in System 2 (Node) tested getting the created record in seed is success. 5. Now in System 1 (Seed) tested getting the create record in itself and it is success. 6. Now i shutdown the Cassandra in system 2 (Node). 7. After shutdown of System 2 (Node) i am trying to get the record in System 1 (Seed) where the record was created, but failed to get the record as it return null. 8. Again i brought the System 2 (Node) up, then tested in System 1 (seed) to get the record, it is success. My understand with Cassandra is irrespective of system down or up the record should be available on other systems(node\seed); So when we stop the Node the records created on Seed were not available for Seed it self, also vice versa? Did I miss any configuration? Advance thanks for your help. Observation: The crazy is that i noted the record is not displayed on created system when the other system (node) is down but same record is accessible when the record created system(seed) is down also tested vice versa situation is same. Regards P. Rajashekar Reddy
memtable flush thresholds
Hi, I've checked the memtable flush (cassandra 0.8.4) and it seams to me it hapens sooner then the threshold is reached. Here's the threshould's (the default ones calculated for a heap size of -Xmx1980M): ColumnFamily: idx_graphable (Super) Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 *Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB)* GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true In the logs it seams to me none oh the thresold is reached ( definitively minutes threshold is not reached ). 9-08 20:12:30,136 MeteredFlusher.java (line 62) flushing high-traffic column family ColumnFamilyStore(table='graph', columnFamily='idx_graphable') INFO [NonPeriodicTasks:1] 2011-09-08 20:12:30,144 ColumnFamilyStore.java (line 1036) Enqueuing flush of Memtable-idx_graphable@915643571*(4671498/96780112 serialized/live bytes, 59891 ops)* INFO [FlushWriter:111] 2011-09-08 20:12:30,145 Memtable.java (line 237) Writing Memtable-idx_graphable@915643571(4671498/96780112 serialized/live bytes, 59891 ops) INFO [FlushWriter:111] 2011-09-08 20:12:30,348 Memtable.java (line 254) Completed flushing [...]/cassandra/data/graph/idx_graphable-g-23-Data.db (4673905 bytes) Could someone clarify it for me ? high-traffic column family has a special meaning ? Many thanks, Sorin
Index search in provided list of rows (list of rowKeys).
Hi, We have an issue to search over Cassandra and we are using Sphinx for indexing. Because of Sphinx architecture we can't use range queries over all fields that we need to. So we have to run Sphinx Query first to get List of rowKeys and perform additional range filtering over column values. First simple solution is to do it on Client side. That will increase network traffic and memory usage on client. Now I'm wondering if it possible to perform such filtering on Cassandra side. I wish to use some IndexExpression for range filtering in list of records (list of rowKeys returned from external Indexing Search Engine). Looking at get_indexed_slices I found out that in IndexClause is no possibility set List of rowKeys (like for multiget_slice), only start_key. So 2 questions: 1) Am I missing something and my idea is possible via some another API? 2) If not possible, can I add JIRA for this feature? Evgeny.
Re: memtable flush thresholds
see memtable_total_space_in_mb at http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ On Mon, Sep 12, 2011 at 6:55 AM, Sorin Julean sorin.jul...@gmail.com wrote: Hi, I've checked the memtable flush (cassandra 0.8.4) and it seams to me it hapens sooner then the threshold is reached. Here's the threshould's (the default ones calculated for a heap size of -Xmx1980M): ColumnFamily: idx_graphable (Super) Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true In the logs it seams to me none oh the thresold is reached ( definitively minutes threshold is not reached ). 9-08 20:12:30,136 MeteredFlusher.java (line 62) flushing high-traffic column family ColumnFamilyStore(table='graph', columnFamily='idx_graphable') INFO [NonPeriodicTasks:1] 2011-09-08 20:12:30,144 ColumnFamilyStore.java (line 1036) Enqueuing flush of Memtable-idx_graphable@915643571(4671498/96780112 serialized/live bytes, 59891 ops) INFO [FlushWriter:111] 2011-09-08 20:12:30,145 Memtable.java (line 237) Writing Memtable-idx_graphable@915643571(4671498/96780112 serialized/live bytes, 59891 ops) INFO [FlushWriter:111] 2011-09-08 20:12:30,348 Memtable.java (line 254) Completed flushing [...]/cassandra/data/graph/idx_graphable-g-23-Data.db (4673905 bytes) Could someone clarify it for me ? high-traffic column family has a special meaning ? Many thanks, Sorin -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: memtable flush thresholds
Thanks Jonathan ! memtable_total_space_in_mb is the threshold that is reached. Kind regards, Sorin On Mon, Sep 12, 2011 at 3:16 PM, Jonathan Ellis jbel...@gmail.com wrote: see memtable_total_space_in_mb at http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ On Mon, Sep 12, 2011 at 6:55 AM, Sorin Julean sorin.jul...@gmail.com wrote: Hi, I've checked the memtable flush (cassandra 0.8.4) and it seams to me it hapens sooner then the threshold is reached. Here's the threshould's (the default ones calculated for a heap size of -Xmx1980M): ColumnFamily: idx_graphable (Super) Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true In the logs it seams to me none oh the thresold is reached ( definitively minutes threshold is not reached ). 9-08 20:12:30,136 MeteredFlusher.java (line 62) flushing high-traffic column family ColumnFamilyStore(table='graph', columnFamily='idx_graphable') INFO [NonPeriodicTasks:1] 2011-09-08 20:12:30,144 ColumnFamilyStore.java (line 1036) Enqueuing flush of Memtable-idx_graphable@915643571(4671498/96780112 serialized/live bytes, 59891 ops) INFO [FlushWriter:111] 2011-09-08 20:12:30,145 Memtable.java (line 237) Writing Memtable-idx_graphable@915643571(4671498/96780112 serialized/live bytes, 59891 ops) INFO [FlushWriter:111] 2011-09-08 20:12:30,348 Memtable.java (line 254) Completed flushing [...]/cassandra/data/graph/idx_graphable-g-23-Data.db (4673905 bytes) Could someone clarify it for me ? high-traffic column family has a special meaning ? Many thanks, Sorin -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Replace Live Node
Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
Re: Not all data structures need timestamps (and don't require wasted memory).
On Sat, Sep 3, 2011 at 8:26 PM, Kevin Burton bur...@spinn3r.com wrote: The point is that replication in Cassandra only needs timestamps to handle out of order writes … for values that are idempotent, this isn't necessary. The order doesn't matter. I believe this is a mis-understanding of how idempotency applies to Cassandra replication. If there were no timestamps stored, how would read-repair work? There would be two different values with no way to tell which was written second.
Re: Replace Live Node
Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson wrote: Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
Re: what's the difference between repair CF separately and repair the entire node?
I am using 0.7.4. so it is always okay to do the routine repair on Column Family basis? thanks! It's okay but won't do what you want; due to a bug you'll see streaming of data for other column families than the one you're trying to repair. This will be fixed in 1.0. -- / Peter Schuller (@scode on twitter)
Re: Not all data structures need timestamps (and don't require wasted memory).
After writing my message, I recognized a scenerio you might be referring to Kevin. If I understand correctly, you're not referring to set-membership in the general sense, where one could add and remove entries. General set-membership, in the context of eventual-consistency, requires timestamps. The timestamps distinguish between the two values present and not-present. (not-present being represented by timestamped tombstones in the case of deletion/removal). So I suppose you're referring to additive-only set membership, where there is no need to distinguish between two different states (such as present or not present in a set), because items can only be added, never changed or removed. If entries are not allowed to be deleted or modified, then cassandra-style eventual consistency replication could occur without any timestamp, because you're simply replicating the existence of keys to all replicas. To me this seems a particularly narrow use-case. Any inadvertant write (even one from a bug or data-corruption), would require very frustrating manual intervention to remove. (you'd have to manually shutdown all nodes, manually purge bad values out of the dataset, then bring the nodes back online) I'm not a cassandra developer, but this seems like a path which is very specialized and not very in-line with Cassandra's design. You might have better luck with a distributed store that is not based on timestamp eventual consistency. I don't know if you can explicitly turn off timestamps in HBase, but AFAIK the client is allowed to supply them, so you can just supply zero and they should be compressed out quite well.
AntiEntropyService.getNeighbors pulls information from where?
This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes. In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com
Re: Replace Live Node
What could you do if the initial_token is 0? On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson wrote: Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
Re: Replace Live Node
The ring wraps around, so the value before 0 is the max possible token. I believe that it is 2**127 -1 . - Original Message - From: Kyle Gibson kyle.gib...@frozenonline.com To: user@cassandra.apache.org Sent: Monday, September 12, 2011 3:30:20 PM Subject: Re: Replace Live Node What could you do if the initial_token is 0? On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson wrote: Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
Re: Replace Live Node
I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727 On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote: What could you do if the initial_token is 0? On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson wrote: Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
cleanup / move
While it would certainly be preferable to not run a cleanup and a move at the same time on the same node, is there a techincal problem with running a nodetool move on a node while a cleanup is running? Or if its possible to gracefully kill a cleanup, so that a move can be run and then cleanup run after? We have a node that is almost full and need to move it so that we can shift its loadbut it already has a cleanup process running which, instead of causing less data usage as expected, is actually growing the amount of space taken at a pretty fast rate. -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Replace Live Node
So to move data from node with token 0, the new node needs to have initial token set to 170141183460469231731687303715884105727 ? Another idea: could I move token to 1, and then use token 0 on the new node? On Mon, Sep 12, 2011 at 3:38 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727 On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote: What could you do if the initial_token is 0? On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson wrote: Version=0.7.8 I have a 3 node cluster with RF=3, how would I move data from a live node to a replacement node? I tried an autobootstrap + decomission, but I got this error on the live node: Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) And I got this error on the new node: Bootstraping to existing token 113427455640312821154458202477256070484 is not allowed (decommission/removetoken the old node first). - Do I really need to do the token - 1 manual selection for this? Thanks
Re: Replace Live Node
So to move data from node with token 0, the new node needs to have initial token set to 170141183460469231731687303715884105727 ? I would do this route. Another idea: could I move token to 1, and then use token 0 on the new node? nodetool move prior to 0.8 is a very heavy operation.
balancing issue with Random partitioner
We are running the datastax .8 rpm distro. We have a situation where we have 4 nodes and each owns 25% of the keys. However the last node in the ring does not seem to be getting much of a load at all. We are using the random partitioner, we have a total of about 20k keys that are sequential... Our nodetool ring output is currently: Address DC RackStatus State LoadOwns Token 127605887595351923798765477786913079296 10.181.138.167 datacenter1 rack1 Up Normal 99.37 GB 25.00% 0 192.168.100.6 datacenter1 rack1 Up Normal 106.25 GB 25.00% 42535295865117307932921825928971026432 10.181.137.37 datacenter1 rack1 Up Normal 77.7 GB 25.00% 85070591730234615865843651857942052863 192.168.100.5 datacenter1 rack1 Up Normal 494.67 KB 25.00% 127605887595351923798765477786913079296 Nothing is running on netstats on .37 or .5. I understand that the nature of the beast would cause the load to differ between the nodes...but I wouldn't expect it to be so drastic. We had the token for .37 set to 85070591730234615865843651857942052864, and I decremented and moved it to try to kickstart some streaming on the thought that something may have failed, but that didn't yield any appreciable results. Are we seeing completely abnormal behavior? Should I consider making the token for the fourth node considerably smaller? We calculated the node's tokens using the standard python script. -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Index search in provided list of rows (list of rowKeys).
Just checking, you want an API call like this ? multiget_filtered_slice(keys, column_parent, predicate, filter_clause, consistency_level) Where filter_clause is an IndexClause. It's a bit messy. is there no way to express this as a single get_indexed_slice() call? With a == index expression to get the row keys and the other expressions todo the range filtering ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 1:55 AM, Evgeniy Ryabitskiy wrote: Hi, We have an issue to search over Cassandra and we are using Sphinx for indexing. Because of Sphinx architecture we can't use range queries over all fields that we need to. So we have to run Sphinx Query first to get List of rowKeys and perform additional range filtering over column values. First simple solution is to do it on Client side. That will increase network traffic and memory usage on client. Now I'm wondering if it possible to perform such filtering on Cassandra side. I wish to use some IndexExpression for range filtering in list of records (list of rowKeys returned from external Indexing Search Engine). Looking at get_indexed_slices I found out that in IndexClause is no possibility set List of rowKeys (like for multiget_slice), only start_key. So 2 questions: 1) Am I missing something and my idea is possible via some another API? 2) If not possible, can I add JIRA for this feature? Evgeny.
Re: balancing issue with Random partitioner
Looks kind of like the 4th node was added to the cluster w/o bootstrapping. On Mon, Sep 12, 2011 at 3:59 PM, David McNelis dmcne...@agentisenergy.com wrote: We are running the datastax .8 rpm distro. We have a situation where we have 4 nodes and each owns 25% of the keys. However the last node in the ring does not seem to be getting much of a load at all. We are using the random partitioner, we have a total of about 20k keys that are sequential... Our nodetool ring output is currently: Address DC Rack Status State Load Owns Token 127605887595351923798765477786913079296 10.181.138.167 datacenter1 rack1 Up Normal 99.37 GB 25.00% 0 192.168.100.6 datacenter1 rack1 Up Normal 106.25 GB 25.00% 42535295865117307932921825928971026432 10.181.137.37 datacenter1 rack1 Up Normal 77.7 GB 25.00% 85070591730234615865843651857942052863 192.168.100.5 datacenter1 rack1 Up Normal 494.67 KB 25.00% 127605887595351923798765477786913079296 Nothing is running on netstats on .37 or .5. I understand that the nature of the beast would cause the load to differ between the nodes...but I wouldn't expect it to be so drastic. We had the token for .37 set to 85070591730234615865843651857942052864, and I decremented and moved it to try to kickstart some streaming on the thought that something may have failed, but that didn't yield any appreciable results. Are we seeing completely abnormal behavior? Should I consider making the token for the fourth node considerably smaller? We calculated the node's tokens using the standard python script. -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: balancing issue with Random partitioner
Auto-bootstrapping is turned on and the node had been started several hours ago. Since the node already shows up as part of the ring I would imagine that nodetool join wouldn't do anything.Is there a command to jumpstart bootstrapping? On Mon, Sep 12, 2011 at 4:22 PM, Jonathan Ellis jbel...@gmail.com wrote: Looks kind of like the 4th node was added to the cluster w/o bootstrapping. On Mon, Sep 12, 2011 at 3:59 PM, David McNelis dmcne...@agentisenergy.com wrote: We are running the datastax .8 rpm distro. We have a situation where we have 4 nodes and each owns 25% of the keys. However the last node in the ring does not seem to be getting much of a load at all. We are using the random partitioner, we have a total of about 20k keys that are sequential... Our nodetool ring output is currently: Address DC RackStatus State Load Owns Token 127605887595351923798765477786913079296 10.181.138.167 datacenter1 rack1 Up Normal 99.37 GB 25.00% 0 192.168.100.6 datacenter1 rack1 Up Normal 106.25 GB 25.00% 42535295865117307932921825928971026432 10.181.137.37 datacenter1 rack1 Up Normal 77.7 GB 25.00% 85070591730234615865843651857942052863 192.168.100.5 datacenter1 rack1 Up Normal 494.67 KB 25.00% 127605887595351923798765477786913079296 Nothing is running on netstats on .37 or .5. I understand that the nature of the beast would cause the load to differ between the nodes...but I wouldn't expect it to be so drastic. We had the token for .37 set to 85070591730234615865843651857942052864, and I decremented and moved it to try to kickstart some streaming on the thought that something may have failed, but that didn't yield any appreciable results. Are we seeing completely abnormal behavior? Should I consider making the token for the fourth node considerably smaller? We calculated the node's tokens using the standard python script. -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: what's the difference between repair CF separately and repair the entire node?
On Mon, Sep 12, 2011 at 1:44 PM, Peter Schuller peter.schul...@infidyne.com wrote: I am using 0.7.4. so it is always okay to do the routine repair on Column Family basis? thanks! It's okay but won't do what you want; due to a bug you'll see streaming of data for other column families than the one you're trying to repair. This will be fixed in 1.0. I think we might be running into this. Is CASSANDRA-2280 the issue you're referring to? Jim
Re: Index search in provided list of rows (list of rowKeys).
Something like this. Actually I think it's better to extend get_indexed_slice() API instead of creating new one thrift method. I wish to have something like this: //here we run query to external search engine Listbyte[] keys = performSphinxQuery(someFullTextSearchQuery); IndexClause indexClause = new IndexClause(); //required API to set list of keys indexClause.setKeys(keys); indexClause.setExpressions(someFilteringExpressions); List finalResult = get_indexed_slices(colParent, indexClause, colPredicate, cLevel); I can't solve my issue with single get_indexed_slice(). Here is issue in more details: 1) have ~ 6 millions records, in feature could be much more 2) have 10k different properties (stored as column values in Cassandra), in feature could be much more 3) properties are text descriptions , int/float values, string values 4) need to implement search over all properties. For text descriptions: full text search. for int/float properties: range search. 5) Search query could use any combination of property descriptions. Like full text search description and some range expression for int/float field. 6) have external search engine (Sphinx) that indexed all string and text properties 7) still need to perform range search for int, float fields. So now I split my query expressions in 2 groups: 1) expressions that can be handled by search engine 2) others (additional filters) For example I run first query to Sphinx and got list of rowKeys, with length of 100k. (mark as RESULT1) Now I need to filter it by second group of expressions. For example I have simple expression: age 25. So imagine I would run get_indexed_slice() with this query and could possibly get half of my records in result. (mark as RESULT2) Then I would need to get intersection between RESULT1 and RESULT2 on client side, which could take a lot of time and memory. That is why I can't use single get_indexed_slice here. For me is better to iterate RESULT1 (with 100k records) at client side to filter by age and got 10-50k record as final result. Disadvantage here is that I have to fetch all 100k records. Evgeny.
Cassandra performance on a virtual network....
Hello everyone, I wanted to tell you about some performance benchmarking we have done with Cassandra running in EC2 on a virtual network. The purpose of the experiment was to see how running Cassandra on a virtual network could simplify operational complexity and to determine the performance impact, relative to native interfaces. The summary results for running a 4 node cluster are: Cassandra Performance on vCider Virtual Network Replication Factor 1 32 64 128 192 256 byte cols. v. Unencrypted: -8.2% 0.8% -2.3%-2.3% -6.7% v. Encrypted: 63.8% 55.4% 60.0% 53.9% 61.7% v. Node Only Encryption: -0.7% -5.0%1.9%5.4%4.7% Replication Factor 3 32 64128 192 256 byte cols v. Unencrypted: -4.5% -4.7% -5.8% -4.5%-1.5% v. Encrypted: 31.5% 29.6% 31.4% 27.3% 29.9% v. Node Only Encryption: 3.8% 3.9% 6.1%8.3% 4.0% There is tremendous EC2 performance variability and our experiments tried to adjust for that by running 10 trials for each column size and averaging them. Averaged across all column widths, the performance was: Replication Factor 1 v. Unencrypted: -3.7% v. Encrypted: +59% v. Node Only Encryption: +1.3% Replication Factor 3 v. Unencrypted: -4.2% v. Encrypted: +30% v. Node Only Encryption: +5.2% As you might expect, the performance while running on a virtual network was slower than running on the native interfaces. However, when you encrypt communications (both node and client) the performance of the virtual network was faster by nearly 60% (30% with R3). Since this measurement is primarily an indication of the client encryption performance, we also measured performance of the somewhat unrealistic configuration when only node communications were encrypted. Here the virtual network performed better as well. The overall decrease performance loss -3.7% to -4.2% for un-encrypted R1 v. R3 is understandable since R3 is more network intensive than R1. However, since the virtual network performs encryption in the kernel (which seems to be faster than what Cassandra can do natively) when encryption is turned on, the performance gains are greater with R3 since more data needs to be encrypted. We ran the tests using the Cassandra stress test tool across a range of column widths, replication strategies and consistency levels (One, Quourm). We used OpenVPN for client encryption. The complete test results are attached. I’m going to write up a more complete analysis of these results, but wanted to share them with you to see if there was anything obvious that we overlooked. We are currently running experiments against clusters running in multiple EC2 regions. We expect similar performance characteristics across regions, but with the added benefit of not needing to fuss with the EC2 snitch. The virtual network lets you assign your own private IPs for all Cassandra interfaces so the standard Snitch can be used everywhere. If you're running Cassandra in EC2 (or any other public cloud) and want encrypted communications, running on virtual network is a clear winner. Here, not only is it 30-60% faster, but you don’t have to bother with the point-to-point configurations of setting up a third party encryption technique. Since these run in user space, its not surprising that dramatic performance gains can be achieved with the kernel based approach of the virtual network. When we’re done will put everything in a public repo that includes all Puppet configuration modules as well as collection of scripts that automate nearly all of the testing. I hope to have that in the next week or so, but wanted to get some of these single region results out there in advance. If you are interested, you can learn more about the vCider virtual network at www.vcider.com Let me know if you have any questions. CM vCider.Cassandra.benchmarks.pdf Description: Adobe PDF document
Re: AntiEntropyService.getNeighbors pulls information from where?
I'm pretty sure I'm behind on how to deal with this problem. Best I know is to start the node with -Dcassandra.load_ring_state=false as a JVM option. But if the ghost IP address is in gossip it will not work, and it should be in gossip. Does the ghost IP show up in nodetool ring ? Anyone know a way to remove a ghost IP from gossip that does not have a token associated with it ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote: This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes. In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com
Re: cleanup / move
is there a techincal problem with running a nodetool move on a node while a cleanup is running? Cleanup is removing data that the node is no longer responsible for while move is first removing *all* data from the node and then streaming new data to it. I'd put that in the crossing the streams category (http://www.youtube.com/watch?v=jyaLZHiJJnE). i.e. best avoided. To kill the cleanup kill the node. Operations such as that create new data, and then delete old data. They do not mutate existing data. Cleanup will write new SSTables, and then mark the old ones as compacted. When the old SSTables are marked as compacted you should will see a zero length .Compacted file. Cassandra will delete the compacted data files when it needs to. If you want the deletion to happen sooner rather than later force a Java GC through JConsole. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 7:41 AM, David McNelis wrote: While it would certainly be preferable to not run a cleanup and a move at the same time on the same node, is there a techincal problem with running a nodetool move on a node while a cleanup is running? Or if its possible to gracefully kill a cleanup, so that a move can be run and then cleanup run after? We have a node that is almost full and need to move it so that we can shift its loadbut it already has a cleanup process running which, instead of causing less data usage as expected, is actually growing the amount of space taken at a pretty fast rate. -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: balancing issue with Random partitioner
Try a reapir on 100.5 , it will then request the data from the existing nodes. You will then need to clean on the existing three nodes once the repair has completed. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 9:32 AM, David McNelis wrote: Auto-bootstrapping is turned on and the node had been started several hours ago. Since the node already shows up as part of the ring I would imagine that nodetool join wouldn't do anything.Is there a command to jumpstart bootstrapping? On Mon, Sep 12, 2011 at 4:22 PM, Jonathan Ellis jbel...@gmail.com wrote: Looks kind of like the 4th node was added to the cluster w/o bootstrapping. On Mon, Sep 12, 2011 at 3:59 PM, David McNelis dmcne...@agentisenergy.com wrote: We are running the datastax .8 rpm distro. We have a situation where we have 4 nodes and each owns 25% of the keys. However the last node in the ring does not seem to be getting much of a load at all. We are using the random partitioner, we have a total of about 20k keys that are sequential... Our nodetool ring output is currently: Address DC RackStatus State LoadOwns Token 127605887595351923798765477786913079296 10.181.138.167 datacenter1 rack1 Up Normal 99.37 GB 25.00% 0 192.168.100.6 datacenter1 rack1 Up Normal 106.25 GB 25.00% 42535295865117307932921825928971026432 10.181.137.37 datacenter1 rack1 Up Normal 77.7 GB 25.00% 85070591730234615865843651857942052863 192.168.100.5 datacenter1 rack1 Up Normal 494.67 KB 25.00% 127605887595351923798765477786913079296 Nothing is running on netstats on .37 or .5. I understand that the nature of the beast would cause the load to differ between the nodes...but I wouldn't expect it to be so drastic. We had the token for .37 set to 85070591730234615865843651857942052864, and I decremented and moved it to try to kickstart some streaming on the thought that something may have failed, but that didn't yield any appreciable results. Are we seeing completely abnormal behavior? Should I consider making the token for the fourth node considerably smaller? We calculated the node's tokens using the standard python script. -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: AntiEntropyService.getNeighbors pulls information from where?
use system; del LocationInfo[52696e67]; i ran this on the nodes that had the problems. stopped, started the nodes, it re-did it's job job done. all fixed with a new bug! https://issues.apache.org/jira/browse/CASSANDRA-3186 On Tue, Sep 13, 2011 at 2:09 AM, aaron morton aa...@thelastpickle.com wrote: I'm pretty sure I'm behind on how to deal with this problem. Best I know is to start the node with -Dcassandra.load_ring_state=false as a JVM option. But if the ghost IP address is in gossip it will not work, and it should be in gossip. Does the ghost IP show up in nodetool ring ? Anyone know a way to remove a ghost IP from gossip that does not have a token associated with it ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote: This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes. In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Re: what's the difference between repair CF separately and repair the entire node?
I think it is a serious problem since I can not repair. I am using cassandra on production servers. is there some way to fix it without upgrade? I heard of that 0.8.x is still not quite ready in production environment. thanks! On Tue, Sep 13, 2011 at 1:44 AM, Peter Schuller peter.schul...@infidyne.com wrote: I am using 0.7.4. so it is always okay to do the routine repair on Column Family basis? thanks! It's okay but won't do what you want; due to a bug you'll see streaming of data for other column families than the one you're trying to repair. This will be fixed in 1.0. -- / Peter Schuller (@scode on twitter)
Re: Cassandra -f problem
Hi, Roshan. This is great support, amazing support; not used to it :) Thanks for the reply. Well I think java is installed correctly, I mean, the java -version command works on a terminal, so the PATH env variable is correctly set, right? I downloaded the JDK7 and put it on opt/java/ and then set the path. But, the eclipse icon says it can't find any JRE or JDK, which is weird because of what I said above... but... but... what else could it be? Thanks! On Sun, Sep 11, 2011 at 10:05 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Hi, Cassandra starts JVM as $JAVA -ea -cp $CLASSPATH Looks like $JAVA is coming is empty in your case, hence the error exec -ea not found. Do you not have java installed? Please install it and set JAVA_HOME appropriately and retry. Cheers. On Mon, Sep 12, 2011 at 8:23 AM, Hernán Quevedo alexandros.c@gmail.com wrote: Hi, all. I´m new at this and haven´t been able to install cassandra in debian 6. After uncompressing the tar and creating var/log and var/lib directories, the command bin/cassandra -f results in message exec: 357 -ea not found preventing cassandra from run the process README file says it is suppose to start. Any help would be very appreciated. Thnx! -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Είναι η θέληση των Θεών.
Re: Cassandra -f problem
Hi, Do you have JAVA_HOME exported? If not, can you export it and retry? Cheers. On Tue, Sep 13, 2011 at 8:59 AM, Hernán Quevedo alexandros.c@gmail.comwrote: Hi, Roshan. This is great support, amazing support; not used to it :) Thanks for the reply. Well I think java is installed correctly, I mean, the java -version command works on a terminal, so the PATH env variable is correctly set, right? I downloaded the JDK7 and put it on opt/java/ and then set the path. But, the eclipse icon says it can't find any JRE or JDK, which is weird because of what I said above... but... but... what else could it be? Thanks! On Sun, Sep 11, 2011 at 10:05 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Hi, Cassandra starts JVM as $JAVA -ea -cp $CLASSPATH Looks like $JAVA is coming is empty in your case, hence the error exec -ea not found. Do you not have java installed? Please install it and set JAVA_HOME appropriately and retry. Cheers. On Mon, Sep 12, 2011 at 8:23 AM, Hernán Quevedo alexandros.c@gmail.com wrote: Hi, all. I´m new at this and haven´t been able to install cassandra in debian 6. After uncompressing the tar and creating var/log and var/lib directories, the command bin/cassandra -f results in message exec: 357 -ea not found preventing cassandra from run the process README file says it is suppose to start. Any help would be very appreciated. Thnx! -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Είναι η θέληση των Θεών. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani