CQL and counters
I'm looking for some guidance on how to model some stat tracking over time, bucketed to some type of interval (15 min, hour, etc). As an example, let's say I would like to track network traffic throughput and bucket it to 15 minute intervals. In our old model, using thrift I would create a column family set to counter, and use a timestamp ticks for the column name for a total and count column. And as data was sampled, we would increment count by one, and increment the total with the sampled value for that time bucket. The column name would give us the datetime for the values, as well as provide me with a convenient row slice query to get a date range for any given statistic. Key| 1215 | 1230 | 1245 NIC1:Total | 100| 56 | 872 NIC1:Count | 15 | 15 | 15 Then given the total/count I can show an average over time. In CQL it seems like I can't do new counter columns at runtime unless they are defined in the schema first or run an ALTER statement, which may not be the correct way to go. So is there a better way to model this type of data with the new CQL world? Nor do I know how to query that type of data, similar to the row slice by column name. Thanks, Bryce
RE: High bandwidth usage between datacenters for cluster
Network topology with the topology file filled out is already the configuration we are using. From: sankalp kohli [mailto:kohlisank...@gmail.com] Sent: Thursday, October 25, 2012 11:55 AM To: user@cassandra.apache.org Subject: Re: High bandwidth usage between datacenters for cluster Use placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the topology.properties file. This will tell cassandra that you have two DCs. You can verify that by looking at output of the ring command. If you DCs are setup properly, only one request will go over WAN. Though the responses from all nodes in other DC will go over WAN. On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We have a 5 node cluster, with a matching 5 nodes for DR in another data center. With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also? Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN? Normally this isn't an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers. If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication? We are on 1.1.5 Thanks
High bandwidth usage between datacenters for cluster
We have a 5 node cluster, with a matching 5 nodes for DR in another data center. With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also? Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN? Normally this isn't an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers. If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication? We are on 1.1.5 Thanks
Prevent queries from OOM nodes
Is there anything I can do on the configuration side to prevent nodes from going OOM due to queries that will read large amounts of data and exceed the heap available? For the past few days of we had some nodes consistently freezing/crashing with OOM. We got a heap dump into MAT and figured out the nodes were dying due to some queries for a few extremely large data sets. Tracked it back to an app that just didn't prevent users from doing these large queries, but it seems like Cassandra could be smart enough to guard against this type of thing? Basically some kind of setting like if the data to satisfy query available heap then throw an error to the caller and about query. I would much rather return errors to clients then crash a node, as the error is easier to track down that way and resolve. Thanks.
RE: Expanding cluster to include a new DR datacenter
Well I tried to drop the keyspace, but it's still there. No errors in logs and Cassandra-cli showed the schema agreement after the command. I took a snapshot of the system keyspace first. Nothing is crashing in the clients yet either, still able to read/write to that keyspace. [default@EBonding] drop keyspace EBonding; 2eb11095-b8a8-31cd-80c3-c748d32a4208 Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] use EBonding; Authenticated to keyspace: EBonding [default@EBonding] describe; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, August 29, 2012 2:36 AM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter It would be handy to work out what the corruption is. Could you snapshot the system keyspace and store it somewhere, just incase we can look at it later ? Is there a way I can confirm this Errors in the client and/or the server log is the the traditional way. go about cleaning up/restoring the proper schema? If you need to get it back, and can handle the down time, the simple thing is drop the KS and re-create it. Remember to take a snapshot first. Drop keyspace takes one but it's the sort of thing I would do myself. Or you can try _try_ nodetool resetlocalschema. Without knowing what the error is it's hard to say if it would work. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/08/2012, at 9:10 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: I believe what may be really going on is that my schema is in a bad or corrupt state. I also have one keyspace that I just cannot drop an existing column family from even though it shows no errors. So right now I was able to get 4 of my 6 keyspaces over to Network Topology strategy. I think I got into this bad state after pointing Opscenter at this cluster for the first time, as it started throwing errors after that and crashed a couple of my nodes until I stopped it and its agents. Is there a way I can confirm this or go about cleaning up/restoring the proper schema? From: Bryce Godfrey [mailto:bryce.godf...@azaleos.comhttp://azaleos.com] Sent: Tuesday, August 28, 2012 11:09 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter So in an interesting turn of events, this works on my other 4 keyspaces but just not this 'EBonding' one which will not recognize the changes. I can probably get around this by dropping and re-creating this keyspace since its uptime is not too important for us. [default@AlertStats] describe AlertStats; Keyspace: AlertStats: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [Fisher:3] From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 3:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter Can you describe your schema again with TierPoint in it? On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Same results. I restarted the node also to see if it just wasn't picking up the changes and it still shows Simple. When I specify the DC for strategy_options I should be using the DC name from properfy file snitch right? Ours is Fisher and TierPoint so that's what I used. From: Mohit Anchlia [mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 1:21 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter In your update command is it possible to specify RF for both DC? You could just do DC1:2, DC2:0. On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Show schema output show the simple strategy still [default@unknown] show schema EBonding; create keyspace EBonding with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2} and durable_writes = true; This is the only thing I see in the system log at the time on all the nodes: INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) Completed flushing /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db (241 bytes) for commitlog p$ Should
RE: Expanding cluster to include a new DR datacenter
So in an interesting turn of events, this works on my other 4 keyspaces but just not this 'EBonding' one which will not recognize the changes. I can probably get around this by dropping and re-creating this keyspace since its uptime is not too important for us. [default@AlertStats] describe AlertStats; Keyspace: AlertStats: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [Fisher:3] From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 3:50 PM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter Can you describe your schema again with TierPoint in it? On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Same results. I restarted the node also to see if it just wasn't picking up the changes and it still shows Simple. When I specify the DC for strategy_options I should be using the DC name from properfy file snitch right? Ours is Fisher and TierPoint so that's what I used. From: Mohit Anchlia [mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 1:21 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter In your update command is it possible to specify RF for both DC? You could just do DC1:2, DC2:0. On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Show schema output show the simple strategy still [default@unknown] show schema EBonding; create keyspace EBonding with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2} and durable_writes = true; This is the only thing I see in the system log at the time on all the nodes: INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) Completed flushing /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db (241 bytes) for commitlog p$ Should I turn the logging level up on something to see some more info maybe? From: aaron morton [mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com] Sent: Monday, August 27, 2012 1:35 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter I did a quick test on a clean 1.1.4 and it worked Can you check the logs for errors ? Can you see your schema change in there ? Also what is the output from show schema; in the cli ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Yes [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 10.20.8.2, 10.20.8.3] From: Mohit Anchlia [mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/] Sent: Friday, August 24, 2012 1:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter That's interesting can you do describe cluster? On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Expanding cluster to include
RE: Expanding cluster to include a new DR datacenter
I believe what may be really going on is that my schema is in a bad or corrupt state. I also have one keyspace that I just cannot drop an existing column family from even though it shows no errors. So right now I was able to get 4 of my 6 keyspaces over to Network Topology strategy. I think I got into this bad state after pointing Opscenter at this cluster for the first time, as it started throwing errors after that and crashed a couple of my nodes until I stopped it and its agents. Is there a way I can confirm this or go about cleaning up/restoring the proper schema? From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Tuesday, August 28, 2012 11:09 AM To: user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter So in an interesting turn of events, this works on my other 4 keyspaces but just not this 'EBonding' one which will not recognize the changes. I can probably get around this by dropping and re-creating this keyspace since its uptime is not too important for us. [default@AlertStats] describe AlertStats; Keyspace: AlertStats: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [Fisher:3] From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 3:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter Can you describe your schema again with TierPoint in it? On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Same results. I restarted the node also to see if it just wasn't picking up the changes and it still shows Simple. When I specify the DC for strategy_options I should be using the DC name from properfy file snitch right? Ours is Fisher and TierPoint so that's what I used. From: Mohit Anchlia [mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 1:21 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter In your update command is it possible to specify RF for both DC? You could just do DC1:2, DC2:0. On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Show schema output show the simple strategy still [default@unknown] show schema EBonding; create keyspace EBonding with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2} and durable_writes = true; This is the only thing I see in the system log at the time on all the nodes: INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) Completed flushing /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db (241 bytes) for commitlog p$ Should I turn the logging level up on something to see some more info maybe? From: aaron morton [mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com] Sent: Monday, August 27, 2012 1:35 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter I did a quick test on a clean 1.1.4 and it worked Can you check the logs for errors ? Can you see your schema change in there ? Also what is the output from show schema; in the cli ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Yes [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 10.20.8.2, 10.20.8.3] From: Mohit Anchlia [mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/] Sent: Friday, August 24, 2012 1:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter That's interesting can you do describe cluster? On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy
RE: Expanding cluster to include a new DR datacenter
Show schema output show the simple strategy still [default@unknown] show schema EBonding; create keyspace EBonding with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2} and durable_writes = true; This is the only thing I see in the system log at the time on all the nodes: INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) Completed flushing /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db (241 bytes) for commitlog p$ Should I turn the logging level up on something to see some more info maybe? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, August 27, 2012 1:35 AM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter I did a quick test on a clean 1.1.4 and it worked Can you check the logs for errors ? Can you see your schema change in there ? Also what is the output from show schema; in the cli ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Yes [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 10.20.8.2, 10.20.8.3] From: Mohit Anchlia [mailto:mohitanch...@gmail.comhttp://gmail.com] Sent: Friday, August 24, 2012 1:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter That's interesting can you do describe cluster? On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? Join them at token+10 just to leave a little space. Make sure you're using LOCAL_QUORUM for your queries instead of regular QUORUM. All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Switch your keyspaces over
RE: Expanding cluster to include a new DR datacenter
Same results. I restarted the node also to see if it just wasn't picking up the changes and it still shows Simple. When I specify the DC for strategy_options I should be using the DC name from properfy file snitch right? Ours is Fisher and TierPoint so that's what I used. From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Monday, August 27, 2012 1:21 PM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter In your update command is it possible to specify RF for both DC? You could just do DC1:2, DC2:0. On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Show schema output show the simple strategy still [default@unknown] show schema EBonding; create keyspace EBonding with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 2} and durable_writes = true; This is the only thing I see in the system log at the time on all the nodes: INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops) INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) Completed flushing /opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db (241 bytes) for commitlog p$ Should I turn the logging level up on something to see some more info maybe? From: aaron morton [mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com] Sent: Monday, August 27, 2012 1:35 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter I did a quick test on a clean 1.1.4 and it worked Can you check the logs for errors ? Can you see your schema change in there ? Also what is the output from show schema; in the cli ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 25/08/2012, at 6:53 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: Yes [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 10.20.8.2, 10.20.8.3] From: Mohit Anchlia [mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/] Sent: Friday, August 24, 2012 1:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter That's interesting can you do describe cluster? On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful:http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan
RE: Expanding cluster to include a new DR datacenter
Yes [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 10.20.8.2, 10.20.8.3] From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Friday, August 24, 2012 1:55 PM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter That's interesting can you do describe cluster? On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful: http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? Join them at token+10 just to leave a little space. Make sure you're using LOCAL_QUORUM for your queries instead of regular QUORUM. All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Switch your keyspaces over to NetworkTopologyStrategy before adding the new nodes. For the strategy options, just list the first dc until the second is up (e.g. {main_dc: 3}). Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch going to cause any issues? Since its in the Cassandra.yaml file I assume a rolling restart to pick up the value would be ok? This is the first thing you'll want to do. Unless your node IPs would naturally put all nodes in a DC in the same rack, I recommend using PropertyFileSnitch, explicitly using the same rack. (I tend to prefer PFSnitch regardless; it's harder to accidentally mess up.) A rolling restart is required to pick up the change. Make sure to fill out cassandra-topology.properties first if using PFSnitch. This is all on Cassandra 1.1.4, Thanks for any help! -- Tyler Hobbs DataStaxhttp://datastax.com/
RE: Expanding cluster to include a new DR datacenter
So I'm at the point of updating the keyspaces from Simple to NetworkTopology and I'm not sure if the changes are being accepted using Cassandra-cli. I issue the change: [default@EBonding] update keyspace EBonding ... with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' ... and strategy_options={Fisher:2}; 9511e292-f1b6-3f78-b781-4c90aeb6b0f6 Waiting for schema agreement... ... schemas agree across the cluster Then I do a describe and it still shows the old strategy. Is there something else that I need to do? I've exited and restarted Cassandra-cli and it still shows the SimpleStrategy for that keyspace. Other nodes show the same information. [default@EBonding] describe EBonding; Keyspace: EBonding: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Thursday, August 23, 2012 11:06 AM To: user@cassandra.apache.org Subject: RE: Expanding cluster to include a new DR datacenter Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful: http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? Join them at token+10 just to leave a little space. Make sure you're using LOCAL_QUORUM for your queries instead of regular QUORUM. All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Switch your keyspaces over to NetworkTopologyStrategy before adding the new nodes. For the strategy options, just list the first dc until the second is up (e.g. {main_dc: 3}). Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch going to cause any issues? Since its in the Cassandra.yaml file I assume a rolling restart to pick up the value would be ok? This is the first thing you'll want to do. Unless your node IPs would naturally put all nodes in a DC in the same rack, I recommend using PropertyFileSnitch, explicitly using the same rack. (I tend to prefer PFSnitch regardless; it's harder to accidentally mess up.) A rolling restart is required to pick up the change. Make sure to fill out cassandra-topology.properties first if using PFSnitch. This is all on Cassandra 1.1.4, Thanks for any help! -- Tyler Hobbs DataStaxhttp://datastax.com/
RE: Expanding cluster to include a new DR datacenter
Thanks for the information! Answers my questions. From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, August 22, 2012 7:10 PM To: user@cassandra.apache.org Subject: Re: Expanding cluster to include a new DR datacenter If you didn't see this particular section, you may find it useful: http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster Some comments inline: On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote: We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? Join them at token+10 just to leave a little space. Make sure you're using LOCAL_QUORUM for your queries instead of regular QUORUM. All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Switch your keyspaces over to NetworkTopologyStrategy before adding the new nodes. For the strategy options, just list the first dc until the second is up (e.g. {main_dc: 3}). Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch going to cause any issues? Since its in the Cassandra.yaml file I assume a rolling restart to pick up the value would be ok? This is the first thing you'll want to do. Unless your node IPs would naturally put all nodes in a DC in the same rack, I recommend using PropertyFileSnitch, explicitly using the same rack. (I tend to prefer PFSnitch regardless; it's harder to accidentally mess up.) A rolling restart is required to pick up the change. Make sure to fill out cassandra-topology.properties first if using PFSnitch. This is all on Cassandra 1.1.4, Thanks for any help! -- Tyler Hobbs DataStaxhttp://datastax.com/
Expanding cluster to include a new DR datacenter
We are in the process of building out a new DR system in another Data Center, and we want to mirror our Cassandra environment to that DR. I have a couple questions on the best way to do this after reading the documentation on the Datastax website. We didn't initially plan for this to be a DR setup when first deployed a while ago due to budgeting, but now we need to. So I'm just trying to nail down the order of doing this as well as any potential issues. For the nodes, we don't plan on querying the servers in this DR until we fail over to this data center. We are going to have 5 similar nodes in the DR, should I join them into the ring at token+1? All keyspaces are set to the replication strategy of SimpleStrategy. Can I change the replication strategy after joining the new nodes in the DR to NetworkTopologyStategy with the updated replication factor for each dr? Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch going to cause any issues? Since its in the Cassandra.yaml file I assume a rolling restart to pick up the value would be ok? This is all on Cassandra 1.1.4, Thanks for any help!
Joining DR nodes in new data center
What is the process for joining a new data center to an existing cluster as DR? We have a 5 node cluster in our primary DC, and want to bring up 5 more in our 2nd data center purely for DR. How should these new nodes be joined to the cluster and be seen as the 2nd data center? Do the new nodes mirror the configuration of the existing nodes but with some setting to indicate they are in another DC? Our existing cluster is using the defaults mostly of network placement strategy and simple snitch. Thanks.
2 nodes throwing exceptions trying to compact after upgrade to 1.1.2 from 1.1.0
This may not be directly related to the upgrade to 1.1.2, but I was running on 1.1.0 for a while with no issues, and I did the upgrade to 1.1.2 a few days ago. 2 of my nodes started throwing lots of promote exceptions, and then a lot of the beforeAppend exceptions from then on every few minutes. This is on the high update CF that's using leveled compaction and compression. The other 3 nodes are not experiencing this. I can send entire log files if desired. These 2 nodes now have much higher load #'s then the other 3, and I'm assuming that's because they are failing with the compaction errors? $ INFO [CompactionExecutor:1783] 2012-07-13 07:35:23,268 CompactionTask.java (line 109) Compacting [SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-392322-Data$ ERROR [CompactionExecutor:1783] 2012-07-13 07:35:29,696 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:1783,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO [CompactionExecutor:3310] 2012-07-16 11:14:02,481 CompactionTask.java (line 109) Compacting [SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-369173-Data$ ERROR [CompactionExecutor:3310] 2012-07-16 11:14:04,031 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:3310,1,main] java.lang.RuntimeException: Last written key DecoratedKey(150919285004100953907590722809541628889, 5b30363334353237652d383966382d653031312d623131632d3030313535643031373530325d5b436f6d70757465725b4d5350422d$ at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
RE: 2 nodes throwing exceptions trying to compact after upgrade to 1.1.2 from 1.1.0
Thanks, is there a way around this for now or should I fall back to 1.1.0? From: Rudolf van der Leeden [mailto:rudolf.vanderlee...@scoreloop.com] Sent: Monday, July 16, 2012 12:55 PM To: user@cassandra.apache.org Cc: Rudolf van der Leeden Subject: Re: 2 nodes throwing exceptions trying to compact after upgrade to 1.1.2 from 1.1.0 See https://issues.apache.org/jira/browse/CASSANDRA-4411https://issues.apache.org/jira/browse/CASSANDRA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel The bug is related to LCS (leveled compaction) and has been fixed. On 16.07.2012, at 20:32, Bryce Godfrey wrote: This may not be directly related to the upgrade to 1.1.2, but I was running on 1.1.0 for a while with no issues, and I did the upgrade to 1.1.2 a few days ago. 2 of my nodes started throwing lots of promote exceptions, and then a lot of the beforeAppend exceptions from then on every few minutes. This is on the high update CF that's using leveled compaction and compression. The other 3 nodes are not experiencing this. I can send entire log files if desired. These 2 nodes now have much higher load #'s then the other 3, and I'm assuming that's because they are failing with the compaction errors? $ INFO [CompactionExecutor:1783] 2012-07-13 07:35:23,268 CompactionTask.java (line 109) Compacting [SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-392322-Data$ ERROR [CompactionExecutor:1783] 2012-07-13 07:35:29,696 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:1783,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO [CompactionExecutor:3310] 2012-07-16 11:14:02,481 CompactionTask.java (line 109) Compacting [SSTableReader(path='/opt/cassandra/data/MonitoringData/Properties/MonitoringData-Properties-hd-369173-Data$ ERROR [CompactionExecutor:3310] 2012-07-16 11:14:04,031 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:3310,1,main] java.lang.RuntimeException: Last written key DecoratedKey(150919285004100953907590722809541628889, 5b30363334353237652d383966382d653031312d623131632d3030313535643031373530325d5b436f6d70757465725b4d5350422d$ at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
RE: Problem joining new node to cluster in 1.1.1
https://issues.apache.org/jira/browse/CASSANDRA-4323 Not sure if it's a dupe of what Brandon sent (4251), so created the bug anyway. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Friday, June 08, 2012 9:08 AM To: user@cassandra.apache.org Subject: Re: Problem joining new node to cluster in 1.1.1 That very much look like a bug. Would you mind opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA with those stack traces and maybe a little bit more precision on what you were doing when that happened? -- Sylvain On Fri, Jun 8, 2012 at 12:28 AM, Bryce Godfrey bryce.godf...@azaleos.com wrote: As the new node starts up I get this error before boostrap starts: INFO 08:20:51,584 Enqueuing flush of Memtable-schema_columns@1493418651(0/0 serialized/live bytes, 1 ops) INFO 08:20:51,584 Writing Memtable-schema_columns@1493418651(0/0 serialized/live bytes, 1 ops) INFO 08:20:51,589 Completed flushing /opt/cassandra/data/system/schema_columns/system-schema_columns-hc-1-D ata.db (61 bytes) ERROR 08:20:51,889 Exception in thread Thread[MigrationStage:1,5,main] java.lang.IllegalArgumentException: value already present: 1015 at com.google.common.base.Preconditions.checkArgument(Preconditions.java: 115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.ja va:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:385) at org.apache.cassandra.db.DefsTable.addColumnFamily(DefsTable.java:426) at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:3 61) at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270) at org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:248 ) at org.apache.cassandra.service.MigrationManager$MigrationTask.runMayThro w(MigrationManager.java:416) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30 ) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO 08:20:51,931 Enqueuing flush of Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 ops) INFO 08:20:51,932 Writing Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 ops) Then it starts spewing these errors nonstop until I kill it. ERROR 08:21:45,959 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil ySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize( RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize( RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbH andler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j ava:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR 08:21:45,814 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil ySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize( RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize( RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbH andler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j ava:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR 08:21:45,813 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamil
Problem joining new node to cluster in 1.1.1
As the new node starts up I get this error before boostrap starts: INFO 08:20:51,584 Enqueuing flush of Memtable-schema_columns@1493418651(0/0 serialized/live bytes, 1 ops) INFO 08:20:51,584 Writing Memtable-schema_columns@1493418651(0/0 serialized/live bytes, 1 ops) INFO 08:20:51,589 Completed flushing /opt/cassandra/data/system/schema_columns/system-schema_columns-hc-1-Data.db (61 bytes) ERROR 08:20:51,889 Exception in thread Thread[MigrationStage:1,5,main] java.lang.IllegalArgumentException: value already present: 1015 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:385) at org.apache.cassandra.db.DefsTable.addColumnFamily(DefsTable.java:426) at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:361) at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270) at org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:248) at org.apache.cassandra.service.MigrationManager$MigrationTask.runMayThrow(MigrationManager.java:416) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO 08:20:51,931 Enqueuing flush of Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 ops) INFO 08:20:51,932 Writing Memtable-schema_keyspaces@833041663(943/1178 serialized/live bytes, 20 ops) Then it starts spewing these errors nonstop until I kill it. ERROR 08:21:45,959 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR 08:21:45,814 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1019 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR 08:21:45,813 Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR 08:21:45,813 Error in row mutation I'm guessing the first error caused some column families to not be created?
RE: 1.1 not removing commit log files?
I'll try to get some log files for this with DEBUG enabled. Tough on production though. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 11:15 AM To: user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? Apply the local hint mutation follows the same code path and regular mutations. When the commit log is being truncated you should see flush activity, logged from the ColumnFamilyStore with Enqueuing flush of messages. If you set DEBUG logging for the org.apache.cassandra.db.ColumnFamilyStore it will log if it things the CF is clean and no flush takes place. If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will see if the commit log file could not be deleted because a dirty CF was not flushed. Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 4:43 AM, Rob Coli wrote: On Thu, May 31, 2012 at 7:01 PM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: But that talks about segments not being cleared at startup. Does not explain why they were allowed to get past the limit in the first place. Perhaps the commit log size tracking for this limit does not, for some reason, track hints? This seems like the obvious answer given the state which appears to trigger it? This doesn't explain why the files aren't getting deleted after the hints are delivered, of course... =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.commailto:rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
RE: 1.1 not removing commit log files?
So this happened to me again, but it was only when the cluster had a node down for a while. Then the commit logs started piling up past the limit I set in the config file, and filled the drive. After the node recovered and hints had replayed the space was never reclaimed. A flush or drain did not reclaim the space either and delete any log files. Bryce Godfrey | Sr. Software Engineer | Azaleos Corporationhttp://www.azaleos.com/ From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Tuesday, May 22, 2012 1:10 PM To: user@cassandra.apache.org Subject: RE: 1.1 not removing commit log files? The nodes appear to be holding steady at the 8G that I set it to in the config file now. I'll keep an eye on them. From: aaron morton [mailto:aa...@thelastpickle.com]mailto:[mailto:aa...@thelastpickle.com] Sent: Tuesday, May 22, 2012 4:08 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? 4096 is also the internal hard coded default for commitlog_total_space_in_mb If you are seeing more that 4GB of commit log files let us know. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote: Thanks, I'll give it a try. -Original Message- From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]mailto:[mailto:arodr...@gmail.com] Sent: Monday, May 21, 2012 2:12 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? commitlog_total_space_in_mb: 4096 By default this line is commented in 1.0.x if I remember well. I guess it is the same in 1.1. You really should remove this comment or your commit logs will entirely fill up your disk as it happened to me a while ago. Alain 2012/5/21 Pieter Callewaert pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be: Hi, In 1.1 the commitlog files are pre-allocated with files of 128MB. (https://issues.apache.org/jira/browse/CASSANDRA-3411) This should however not exceed your commitlog size in Cassandra.yaml. commitlog_total_space_in_mb: 4096 Kind regards, Pieter Callewaert From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com] Sent: maandag 21 mei 2012 9:52 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: 1.1 not removing commit log files? The commit log drives on my nodes keep slowly filling up. I don't see any errors in my logs that are indicating any issues that I can map to this issue. Is this how 1.1 is supposed to work now? Previous versions seemed to keep this drive at a minimum as it flushed. /dev/mapper/mpathf 25G 21G 4.2G 83% /opt/cassandra/commitlog
RE: 1.1 not removing commit log files?
The nodes appear to be holding steady at the 8G that I set it to in the config file now. I'll keep an eye on them. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, May 22, 2012 4:08 AM To: user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? 4096 is also the internal hard coded default for commitlog_total_space_in_mb If you are seeing more that 4GB of commit log files let us know. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote: Thanks, I'll give it a try. -Original Message- From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]mailto:[mailto:arodr...@gmail.com] Sent: Monday, May 21, 2012 2:12 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? commitlog_total_space_in_mb: 4096 By default this line is commented in 1.0.x if I remember well. I guess it is the same in 1.1. You really should remove this comment or your commit logs will entirely fill up your disk as it happened to me a while ago. Alain 2012/5/21 Pieter Callewaert pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be: Hi, In 1.1 the commitlog files are pre-allocated with files of 128MB. (https://issues.apache.org/jira/browse/CASSANDRA-3411) This should however not exceed your commitlog size in Cassandra.yaml. commitlog_total_space_in_mb: 4096 Kind regards, Pieter Callewaert From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com] Sent: maandag 21 mei 2012 9:52 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: 1.1 not removing commit log files? The commit log drives on my nodes keep slowly filling up. I don't see any errors in my logs that are indicating any issues that I can map to this issue. Is this how 1.1 is supposed to work now? Previous versions seemed to keep this drive at a minimum as it flushed. /dev/mapper/mpathf 25G 21G 4.2G 83% /opt/cassandra/commitlog
1.1 not removing commit log files?
The commit log drives on my nodes keep slowly filling up. I don't see any errors in my logs that are indicating any issues that I can map to this issue. Is this how 1.1 is supposed to work now? Previous versions seemed to keep this drive at a minimum as it flushed. /dev/mapper/mpathf 25G 21G 4.2G 83% /opt/cassandra/commitlog
Node join streaming stuck at 100%
This is the second node I've joined to my cluster in the last few days, and so far both have become stuck at 100% on a large file according to netstats. This is on 1.0.9, is there anything I can do to make it move on besides restarting Cassandra? I don't see any errors or warns in logs for either server, and there is plenty of disk space. On the sender side I see this: Streaming to: /10.20.1.152 /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db sections=1 progress=82393861085/82393861085 - 100% On the node joining I don't see this file in netstats, and all pending streams are sitting at 0%
RE: size tiered compaction - improvement
Per CF or per Row TTL would be very usefull for me also with our timeseries data. -Original Message- From: Igor [mailto:i...@4friends.od.ua] Sent: Wednesday, April 18, 2012 6:06 AM To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement For my use case it would be nice to have per CF TTL (to protect myself from application bug and from storage leak due to missed TTL), but seems you can't avoid tombstones even in this case and if you change CF TTL during runtime. On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote: Our use case requires Column TTL, not CF TTL, because it is variable, not constant. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Wednesday, April 18, 2012 12:57 To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed
RE: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released
Sorry, I found the issue. The server I was using had 32bit java installed. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Monday, April 16, 2012 11:39 PM To: user@cassandra.apache.org Subject: Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released On Mon, Apr 16, 2012 at 10:45 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: I keep running into this with my testing (on a windows box), Is this just a OOM for RAM? How much RAM do you have? Do you use completely standard settings? Do you also OOM if you try the same test with Cassandra 1.0.9? -- Sylvain ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.io.IOError: java.io.IOException: Map failed at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg ment.java:127) at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(Commit LogSegment.java:80) at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegmen t(CommitLogAllocator.java:244) at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(Commit LogAllocator.java:49) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(Com mitLogAllocator.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30 ) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg ment.java:119) ... 6 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ... 8 more INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 CassandraDaemon.java (line 218) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 MessagingService.java (line 539) Waiting for messaging service to quiesce INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java (line 695) MessagingService shutting down server thread. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Friday, April 13, 2012 9:41 AM To: user@cassandra.apache.org Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released The Cassandra team is pleased to announce the release of the first release candidate for the future Apache Cassandra 1.1. Please first note that this is a release candidate, *not* the final release yet. All help in testing this release candidate will be greatly appreciated. Please report any problem you may encounter[3,4] and have a look at the change log[1] and the release notes[2] to see where Cassandra 1.1 differs from the previous series. Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra website (http://cassandra.apache.org/download/) and a debian package is available using the 11x branch (see http://wiki.apache.org/cassandra/DebianPackaging). Thank you for your help in testing and have fun with it. [1]: http://goo.gl/XwH7J (CHANGES.txt) [2]: http://goo.gl/JocLX (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=re fs/tags/cassandra-1.1.0-rc1
RE: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released
I keep running into this with my testing (on a windows box), Is this just a OOM for RAM? ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.io.IOError: java.io.IOException: Map failed at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127) at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:80) at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:244) at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:49) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:119) ... 6 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ... 8 more INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 CassandraDaemon.java (line 218) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 MessagingService.java (line 539) Waiting for messaging service to quiesce INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java (line 695) MessagingService shutting down server thread. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Friday, April 13, 2012 9:41 AM To: user@cassandra.apache.org Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released The Cassandra team is pleased to announce the release of the first release candidate for the future Apache Cassandra 1.1. Please first note that this is a release candidate, *not* the final release yet. All help in testing this release candidate will be greatly appreciated. Please report any problem you may encounter[3,4] and have a look at the change log[1] and the release notes[2] to see where Cassandra 1.1 differs from the previous series. Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra website (http://cassandra.apache.org/download/) and a debian package is available using the 11x branch (see http://wiki.apache.org/cassandra/DebianPackaging). Thank you for your help in testing and have fun with it. [1]: http://goo.gl/XwH7J (CHANGES.txt) [2]: http://goo.gl/JocLX (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-rc1
RE: Large hints column family
I took the reset the world approach, things are much better now and the hints table is staying empty. Bit disconcerting that it could get so large and not be able to recover itself, but at least there was a solution. Thanks From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, March 15, 2012 7:24 PM To: user@cassandra.apache.org Subject: Re: Large hints column family These messages make it look like the node is having trouble delivering hints. INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint /192.168.20.4 died before hint delivery, aborting INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries Take another look at the logs on this machine and on 20.4 and 20.3. I would be looking int why so many hints are been stored. GC ? are there also logs about dropped messages ? If you want to reset the world, make sure the nodes have all run repair and then drop the hints. Either via JMX or stopped in the node and deleting the files on disk. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/03/2012, at 12:58 PM, Bryce Godfrey wrote: We were having some occasional memory pressure issues, but we just added some more RAM a few days ago to the nodes and things are running more smoothly now, but in general nodes have not been going up and down. I tried to do a list HintsColumnFamily from Cassandra-cli and it locks my Cassandra node and never returns, forcing me to kill the Cassandra process and restart it to get the node back. Here is my settings which I believe are default since I don't remember changing them: hinted_handoff_enabled: true max_hint_window_in_ms: 360 # one hour hinted_handoff_throttle_delay_in_ms: 50 Greping for Hinted in system log I get these INFO [HintedHandoff:1] 2012-03-13 16:13:22,215 HintedHandOffManager.java (line 373) Finished hinted handoff of 852703 rows to endpoint /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 16:13:34,188 HintedHandOffManager.java (line 284) Endpoint /192.168.20.4 died before hint delivery, aborting INFO [ScheduledTasks:1] 2012-03-13 16:15:32,569 StatusLogger.java (line 65) HintedHandoff 1 1 0 INFO [HintedHandoff:1] 2012-03-13 16:15:44,362 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 16:21:37,266 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [ScheduledTasks:1] 2012-03-13 16:23:07,662 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:25:49,330 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:30:52,503 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-13 16:42:22,202 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 HintedHandOffManager.java (line 354) Timed out replaying hints to /192.168.20.3; aborting further deliveries INFO [HintedHandoff:1] 2012-03-13 17:03:50,986 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-HintsColumnFamily@661547256(34298224/74465815 serialized/live bytes, 78808 ops) INFO [HintedHandoff:1] 2012-03-13 17:11:00,098 HintedHandOffManager.java (line 373) Finished hinted handoff of 44160 rows to endpoint /192.168.20.3 INFO [HintedHandoff:1] 2012-03-13 17:11:36,596 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /192.168.20.4 INFO [ScheduledTasks:1] 2012-03-13 17:12:25,248 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [HintedHandoff:1] 2012-03-13 18:47:56,151 HintedHandOffManager.java (line 296) Started hinted handoff for token: 113427455640312814857969558651062452224 with IP: /192.168.20.3 INFO [ScheduledTasks:1] 2012-03-13 18:50:24,326 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:12:48,177 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:13:57,685 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:14:57,258 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:14:58,260 StatusLogger.java (line 65) HintedHandoff 1 2 0 INFO [ScheduledTasks:1] 2012-03-14 12:15:59,093 StatusLogger.java (line 65) HintedHandoff
Large hints column family
The system HintsColumnFamily seems large in my cluster, and I want to track down why that is. I try invoking listEndpointsPendingHints() for o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node that its invoked against. It's a 3 node cluster, and all nodes have been up and running without issue for a while. Any help on where to start with this? Column Family: HintsColumnFamily SSTable count: 11 Space used (live): 11271669539 Space used (total): 11271669539 Number of Keys (estimate): 1408 Memtable Columns Count: 338 Memtable Data Size: 0 Memtable Switch Count: 1 Read Count: 3 Read Latency: 4354.669 ms. Write Count: 848 Write Latency: 0.029 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 12656 Key cache capacity: 14 Key cache size: 11 Key cache hit rate: 0. Row cache: disabled Compacted row minimum size: 105779 Compacted row maximum size: 7152383774 Compacted row mean size: 590818614 Thanks, Bryce
RE: Large hints column family
Forgot to mention that this is on 1.0.8 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Wednesday, March 14, 2012 12:34 PM To: user@cassandra.apache.org Subject: Large hints column family The system HintsColumnFamily seems large in my cluster, and I want to track down why that is. I try invoking listEndpointsPendingHints() for o.a.c.db.HintedHandoffManager and it never returns, and also freezes the node that its invoked against. It's a 3 node cluster, and all nodes have been up and running without issue for a while. Any help on where to start with this? Column Family: HintsColumnFamily SSTable count: 11 Space used (live): 11271669539 Space used (total): 11271669539 Number of Keys (estimate): 1408 Memtable Columns Count: 338 Memtable Data Size: 0 Memtable Switch Count: 1 Read Count: 3 Read Latency: 4354.669 ms. Write Count: 848 Write Latency: 0.029 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 12656 Key cache capacity: 14 Key cache size: 11 Key cache hit rate: 0. Row cache: disabled Compacted row minimum size: 105779 Compacted row maximum size: 7152383774 Compacted row mean size: 590818614 Thanks, Bryce
RE: tmp files in /var/lib/cassandra/data
I'm seeing this also, and my nodes have started crashing with too many open file errors. Running lsof I see lots of these open tmp files. java 8185 root 911u REG 8,32 38 129108266 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db java 8185 root 912u REG 8,32 0 155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db java 8185 root 913u REG 8,32 0 155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db java 8185 root 914u REG 8,32 0 155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db java 8185 root 916u REG 8,32 0 155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db java 8185 root 918u REG 8,32 0 155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db java 8185 root 919u REG 8,32 0 155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db java 8185 root 920u REG 8,32 0 155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db java 8185 root 921u REG 8,32 0 129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db java 8185 root 922u REG 8,32 38 129108273 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db java 8185 root 923u REG 8,32 0 155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db java 8185 root 929u REG 8,32 38 129108262 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db java 8185 root 947u REG 8,32 0 129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db java 8185 root 948u REG 8,32 38 129108285 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db java 8185 root 954u REG 8,32 0 155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db java 8185 root 955u REG 8,32 0 155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db Going to try rolling back to 1.0.5 for the time being even though I was hoping to use one of the fixes in 1.0.6 -Original Message- From: Ramesh Natarajan [mailto:rames...@gmail.com] Sent: Wednesday, December 14, 2011 6:03 PM To: user@cassandra.apache.org Subject: tmp files in /var/lib/cassandra/data We are using leveled compaction running cassandra 1.0.6. I checked the data directory (/var/lib/cassandra/data) and i see these 0 bytes tmp files. What are these files? thanks Ramesh -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db
RE: node stuck leaving on 1.0.5
So I got past the leaving problem once I found the removetoken force command. Now I'm trying to move tokens and that will never complete either, but as I was watching netstats for streaming to the moving node I noticed it seemed to stop all of a sudden and list no more pending streams. At the same time on the moving node this is in the system log: ERROR [Thread-455] 2011-12-13 16:15:51,939 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[Thread-455,5,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242) at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) Now I see no streams going on between any nodes, and the node is still listed as moving when viewing the ring. From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, December 11, 2011 11:02 PM To: user@cassandra.apache.org Subject: node stuck leaving on 1.0.5 I have a dead node I need to remove from the cluster so that I can rebalance among the existing servers (can't replace it for a while). I used nodetool removetoken and it's been stuck in the leaving state for over a day now. I've tried a rolling restart, which kicks of some streaming for a while under netstats but now even that lists nothing going on. I'm stuck on what to do next to get this node to finally leave so I can move the tokens around. Only error I see in the system log: ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[Thread-209,5,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242) at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)
node stuck leaving on 1.0.5
I have a dead node I need to remove from the cluster so that I can rebalance among the existing servers (can't replace it for a while). I used nodetool removetoken and it's been stuck in the leaving state for over a day now. I've tried a rolling restart, which kicks of some streaming for a while under netstats but now even that lists nothing going on. I'm stuck on what to do next to get this node to finally leave so I can move the tokens around. Only error I see in the system log: ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[Thread-209,5,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242) at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)
RE: Client Timeouts on incrementing counters
I'm seeing this same problem after upgrade to 1.0.3 from .8 Nothing changed with the column family storing the counters, but now it just constantly times out trying to increment them. No errors in the event logs or any other issues with my cluster. Did you find a resolution? From: Carlos Rolo [mailto:c.r...@ocom.com] Sent: Monday, November 14, 2011 12:34 AM To: user@cassandra.apache.org Subject: RE: Client Timeouts on incrementing counters I have digged a bit more to try to find the root cause of the error, and I have some more information. It seems that all started after I upgraded Cassandra from 0.8.x to 1.0.0 When I do a incr on the CLI I also get a timeout. row_cache_save_period_in_seconds is set to 60sec. Could be a problem from the upgrade? I just did a rolling restart of all nodes one-by-one. From: Tyler Hobbs [mailto:ty...@datastax.com]mailto:[mailto:ty...@datastax.com] Sent: vrijdag 11 november 2011 20:18 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Client Timeouts on incrementing counters On Fri, Nov 11, 2011 at 7:17 AM, Carlos Rolo c.r...@ocom.commailto:c.r...@ocom.com wrote: Also Cassandra logs have lots (as in, several times per second) of this message now: INFO 14:15:25,740 Saved ClusterCassandra-CounterFamily-RowCache (52 items) in 1 ms What does the CLI say the row_cache_save_period_in_seconds for this CF is? -- Tyler Hobbs DataStaxhttp://datastax.com/
RE: Problem after upgrade to 1.0.1
I have no errors in my system.log just these typs of warnings occasionally: WARN [pool-1-thread-1] 2011-11-08 00:03:44,726 Memtable.java (line 167) setting live ratio to minimum of 1.0 instead of 0.9511448007676252 I did find the problem with my data drive consumption being so large, as I did not know that running scrub after the upgrade would take a snapshot of the data. Once I removed all the snapshots, they data drive is back down to where I expect it to be. Although the Load numbers reported by ring are much larger then what is in the data drive. I've also upgrade to 1.0.2 and re-ran scrub, and now I can run cfstats again, so thanks for that. Although I'm still confused on why the hints CF has become so large on a few of the nodes; Column Family: HintsColumnFamily SSTable count: 11 Space used (live): 127490858389 Space used (total): 72123363085 Number of Keys (estimate): 1408 Memtable Columns Count: 43174 Memtable Data Size: 44376138 Memtable Switch Count: 103 Read Count: 494 Read Latency: NaN ms. Write Count: 30970531 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 14 Key cache size: 10 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 88149 Compacted row maximum size: 53142810146 Compacted row mean size: 6065512727 -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, November 04, 2011 9:29 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 One possibility: If you're overloading the cluster, replicas will drop updates to avoid OOMing. (This is logged at WARN level.) Before 1.x Cassandra would just let that slide, but with w/ 1.0 it started recording hints for those. On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue
Problem after upgrade to 1.0.1
I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Bryce Godfrey | Sr. Software Engineer | Azaleos Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477
RE: Problem after upgrade to 1.0.1
Nope. I did alter two of my own column families to use Leveled compaction and then ran scrub on each node, is the only change I have made from the upgrade. Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:44 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Just to rule it out: you didn't do anything tricky like update HintsColumnFamily to use compression? On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21 6) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def aultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav a:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio nImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection Impl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run( RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM IConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti onImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:53 5) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport .java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport. java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:886
RE: Problem after upgrade to 1.0.1
Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
RE: Problem after upgrade to 1.0.1
A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21 6) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def aultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav a:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio nImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection Impl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run( RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM IConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti onImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke
RE: Problem after upgrade to 1.0.1
Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3 9 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily S tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces s orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j a va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6 5 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2 1 6
Running on Windows
I'm wondering what the consensus is for running a Cassandra cluster on top of Windows boxes? We are currently running a small 5 node cluster on top of CentOS without problems, so I have no desire to move. But we are a windows shop, and I have an IT department that is scared of Linux since they don't know how to manage it. My primary thoughts of why not, was just community support (haven't seen or heard of anybody else doing it on Windows), performance, and stability. The last two are mostly guesses by me, but my thoughts was that java on windows just does not perform as well. We have a very high right load, and are adding about 5 GB a day of data with a 3 month retention. I really don't want to move a stable system onto an unknown just out of fear of the unknown from my IT department, so looking for some ammo. Thanks :) ~Bryce
RE: Completely removing a node from the cluster
Taking the cluster down completely did remove the phantom node. The hintscolumnfamily is causing a lot of commit logs to back up and threaten the commit log drive to run out of space. A manual flush of that column family always clears out the files though. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Tuesday, August 23, 2011 10:42 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote: I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071 --Brandon
RE: Completely removing a node from the cluster
Could this ghost node be causing my hints column family to grow to this size? I also crash after about 24 hours due to commit logs growth taking up all the drive space. A manual nodetool flush keeps it under control though. Column Family: HintsColumnFamily SSTable count: 6 Space used (live): 666480352 Space used (total): 666480352 Number of Keys (estimate): 768 Memtable Columns Count: 1043 Memtable Data Size: 461773 Memtable Switch Count: 3 Read Count: 38 Read Latency: 131.289 ms. Write Count: 582108 Write Latency: 0.019 ms. Pending Tasks: 0 Key cache capacity: 7 Key cache size: 6 Key cache hit rate: 0.8334 Row cache: disabled Compacted row minimum size: 2816160 Compacted row maximum size: 386857368 Compacted row mean size: 120432714 Is there a way for me to manually remove this dead node? -Original Message- From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, August 21, 2011 9:09 PM To: user@cassandra.apache.org Subject: RE: Completely removing a node from the cluster It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the tokens. However, I can still see it being considered as part of the cluster from the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up hints for the node, or any other issues it may cause: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3] UNREACHABLE: [192.168.20.1] Do I need to do something else to completely remove this node? Thanks, Bryce
RE: Completely removing a node from the cluster
Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the tokens. However, I can still see it being considered as part of the cluster from the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up hints for the node, or any other issues it may cause: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3] UNREACHABLE: [192.168.20.1] Do I need to do something else to completely remove this node? Thanks, Bryce
RE: Completely removing a node from the cluster
It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the tokens. However, I can still see it being considered as part of the cluster from the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up hints for the node, or any other issues it may cause: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3] UNREACHABLE: [192.168.20.1] Do I need to do something else to completely remove this node? Thanks, Bryce
Completely removing a node from the cluster
I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the tokens. However, I can still see it being considered as part of the cluster from the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up hints for the node, or any other issues it may cause: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: dcc8f680-caa4-11e0--553d4dced3ff: [192.168.20.2, 192.168.20.3] UNREACHABLE: [192.168.20.1] Do I need to do something else to completely remove this node? Thanks, Bryce
RE: No space left on device problem when starting Cassandra
That did it. Once I moved the logs over to folder on /dev drive and deleted the old logs directory it started up. Thanks! -Original Message- From: Maki Watanabe [mailto:watanabe.m...@gmail.com] Sent: Tuesday, May 31, 2011 6:40 PM To: user@cassandra.apache.org Subject: Re: No space left on device problem when starting Cassandra at org.apache.log4j.Category.info(Category.java:666) It seems that your cassandra can't write log by device full. Check where your cassanra log is written to. The log file path is configured at log4j.appender.R.File property in conf/log4j-server.properties. maki 2011/6/1 Bryce Godfrey bryce.godf...@azaleos.com: Hi there, I'm a bit new to Linux and Cassandra so I'm hoping someone can help me with this. I've been evaluating Cassandra for the last few days and I'm now having a problem starting up the service. I receive this error below and I'm unsure on where I'm out of space at, and how to free up more. azadmin@cassandra-01: $ sudo /usr/tmp/apache-cassandra-0.7.6-2/bin/cassandra -f INFO 18:21:46,830 Logging initialized log4j:ERROR Failed to flush writer, java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:297) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:290) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:294) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.jav a:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders( AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.info(Category.java:666) at org.apache.cassandra.service.AbstractCassandraDaemon.clinit(Abstract CassandraDaemon.java:79) INFO 18:21:46,841 Heap size: 16818110464/16819159040 # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7f35b493f571, pid=1234, tid=139869156091648 # # JRE version: 6.0_22-b22 # Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 compressed oops) # Derivative: IcedTea6 1.10.1 # Distribution: Ubuntu Natty (development branch), package 6b22-1.10.1-0ubuntu1 # Problematic frame: # C [libffi.so.5+0x2571] ffi_prep_java_raw_closure+0x541 # # An error report file with more information is saved as: # /media/commitlogs/hs_err_pid1234.log # # If you would like to submit a bug report, please include # instructions how to reproduce the bug and visit: # https://bugs.launchpad.net/ubuntu/+source/openjdk-6/ # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # I seem to have enough space, except in the /dev/mapper/Cassandra-01-root, and I'm unsure of that anyway: azadmin@cassandra-01:/$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/cassandra--01-root 1.2G 1.2G 0 100% / none 16G 236K 16G 1% /dev none 16G 0 16G 0% /dev/shm none 16G 36K 16G 1% /var/run none 16G 0 16G 0% /var/lock /dev/sdb1 33G 176M 33G 1% /media/commitlogs /dev/sdc1 66G 180M 66G 1% /media/data /dev/sda1 228M 23M 193M 11% /boot Thanks, ~Bryce