Read-repair working, repair not working?
** Hi, ** ** I have a 20 node cluster running v1.0.7 split between 5 data centres, each with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. ** ** I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I brought online late last year with data consistency availability: I’d request data, nothing would be returned, I would then re-request the data and it would correctly be returned: i.e. read-repair appeared to be occurring. However running repairs on the nodes didn’t resolve this (I tried general ‘*repair’* commands as well as targeted keyspace commands) – this didn’t alter the behaviour. ** ** After a lot of fruitless investigation, I decided to wipe re-install/re-populate the nodes. The re-install repair operations are now complete: I see the expected amount of data on the nodes, however I am still seeing the same behaviour, i.e. I only get data after one failed attempt. ** ** When I run repair commands, I don’t see any errors in the logs. I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during repair sessions. I see a number of dropped ‘MUTATION’ operations : just under 5% of the total ‘MutationStage’ count. ** ** Questions : **- **Could anybody suggest anything specific to look at to see why the repair operations aren’t having the desired effect? **- **Would increasing logging level to ‘DEBUG’ show read-repair activity (to confirm that this is happening, when for what proportion of total requests)? **- **Is there something obvious that I could be missing here? ** ** Many thanks, Brian **
High CPU usage during repair
Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: Netflix/Astynax Client for Cassandra
Sorry to hijack into this email thread, but what are the use cases/benefits of using the new binary protocol? and why doesn't Cassandra offer a drive as a project driver? Renato M. 2013/2/8 aaron morton aa...@thelastpickle.com: I'm going to guess Netflix are running Astynax in production with Cassandra 1.1. cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 8/02/2013, at 6:50 AM, Cassa L lcas...@gmail.com wrote: Thank you all for the responses to this thread. I am planning to use Cassandra 1.1.9 with Astynax. Does anyone has Cassandra 1.x version running in production with astynax? Did you come across any show-stopper issues? Thanks LCassa On Thu, Feb 7, 2013 at 8:50 AM, Bartłomiej Romański b...@sentia.pl wrote: Hi, Does anyone know how about virtual nodes support in Astynax? Are they handled correctly? Especially with ConnectionPoolType.TOKEN_AWARE? Thanks, BR
Re: High CPU usage during repair
During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? there are compactions waiting. That's normally ok. How many are waiting? I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: Read-repair working, repair not working?
I’d request data, nothing would be returned, I would then re-request the data and it would correctly be returned: What CL are you using for reads and writes? I see a number of dropped ‘MUTATION’ operations : just under 5% of the total ‘MutationStage’ count. Dropped mutations in a multi DC setup may be a sign of network congestion or overloaded nodes. - Could anybody suggest anything specific to look at to see why the repair operations aren’t having the desired effect? I would first build a test case to ensure correct operation when using strong consistency. i.e. QUOURM write and read. Because you are using RF 2 per DC I assume you are not using LOCAL_QUOURM because that is 2 and you would not have any redundancy in the DC. - Would increasing logging level to ‘DEBUG’ show read-repair activity (to confirm that this is happening, when for what proportion of total requests)? It would, but the INFO logging for the AES is pretty good. I would hold off for now. - Is there something obvious that I could be missing here? When a new AES session starts it logs this logger.info(String.format([repair #%s] new session: will sync %s on range %s for %s.%s, getName(), repairedNodes(), range, tablename, Arrays.toString(cfnames))); When it completes it logs this logger.info(String.format([repair #%s] session completed successfully, getName())); Or this on failure logger.error(String.format([repair #%s] session completed with the following error, getName()), exception); Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 9:56 PM, Brian Fleming bigbrianflem...@gmail.com wrote: Hi, I have a 20 node cluster running v1.0.7 split between 5 data centres, each with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I brought online late last year with data consistency availability: I’d request data, nothing would be returned, I would then re-request the data and it would correctly be returned: i.e. read-repair appeared to be occurring. However running repairs on the nodes didn’t resolve this (I tried general ‘repair’ commands as well as targeted keyspace commands) – this didn’t alter the behaviour. After a lot of fruitless investigation, I decided to wipe re-install/re-populate the nodes. The re-install repair operations are now complete: I see the expected amount of data on the nodes, however I am still seeing the same behaviour, i.e. I only get data after one failed attempt. When I run repair commands, I don’t see any errors in the logs. I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during repair sessions. I see a number of dropped ‘MUTATION’ operations : just under 5% of the total ‘MutationStage’ count. Questions : - Could anybody suggest anything specific to look at to see why the repair operations aren’t having the desired effect? - Would increasing logging level to ‘DEBUG’ show read-repair activity (to confirm that this is happening, when for what proportion of total requests)? - Is there something obvious that I could be missing here? Many thanks, Brian
Re: High CPU usage during repair
Hi! Thanks for the response. See my answers and questions below. Thanks! Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote: During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? Usually just load, but in the past two weeks I have seen CPU of over 90%! I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? m1.large there are compactions waiting. That's normally ok. How many are waiting? I have seen 4 this morning I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: Issues with writing data to Cassandra column family using a Hive script
Don't use the variable length Cassandra integer, use the Int32Type. It also sounds like you want to use a DoubleType rather than FloatType. http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi sdddilruk...@gmail.com wrote: Hi All, Data was originally stored in column family called test_cf. Definition of column family is as follows: CREATE COLUMN FAMILY test_cf WITH COMPARATOR = 'IntegerType' AND key_validation_class = UTF8Type AND default_validation_class = FloatType; And, following is the sample data set that contains in test_cf. cqlsh:temp_ks select * from test_cf; key| column1| value --++--- localhost:8282 | 1350468600 |76 localhost:8282 | 1350468601 |76 Hive script (shown in the end of mail) is use to take the data from above column family test_cf and insert into a new column family called cpu_avg_5min_new7. Column family description of cpu_avg_5min_new7 is also same as the test_cf. Issue is, data written in to cpu_avg_5min_new7 column family after executing the hive script is as follows. It's not in the format of data present in the original column family test_cf. Any explanations would highly appreciate.. cqlsh:temp_ks select * from cpu_avg_5min_new7; key| column1 | value --+--+-- localhost:8282 | 232340574229062170849328 | 1.09e-05 localhost:8282 | 232340574229062170849329 | 1.09e-05 Hive script: drop table cpu_avg_5min_new7_hive; CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( cassandra.host = 127.0.0.1 , cassandra.port = 9160 , cassandra.ks.name = temp_ks , cassandra.ks.username = xxx , cassandra.ks.password = xxx , cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name = cpu_avg_5min_new7 ); drop table xxx; CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( cassandra.host = 127.0.0.1 , cassandra.port = 9160 , cassandra.ks.name = temp_ks , cassandra.ks.username = xxx , cassandra.ks.password = xxx , cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name = test_cf ); insert overwrite table cpu_avg_5min_new7_hive select src_id,start_time,cpu_avg from xxx; Regards, Dinusha.
Re: Cassandra 1.1.2 - 1.1.8 upgrade
I would do #1. You can play with nodetool setcompactionthroughput to speed things up, but beware nothing comes for free. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 6:40 AM, Mike mthero...@yahoo.com wrote: Thank you, Another question on this topic. Upgrading from 1.1.2-1.1.9 requires running upgradesstables, which will take many hours on our dataset (about 12). For this upgrade, is it recommended that I: 1) Upgrade all the DB nodes to 1.1.9 first, then go around the ring and run a staggered upgrade of the sstables over a number of days. 2) Upgrade one node at a time, running the clustered in a mixed 1.1.2-1.1.9 configuration for a number of days. I would prefer #1, as with #2, streaming will not work until all the nodes are upgraded. I appreciate your thoughts, -Mike On 1/16/2013 11:08 AM, Jason Wee wrote: always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool upgradesstables if your cf has counter. On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote: Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone had any gotchas recently that I should be aware of before performing this upgrade? In order to upgrade, is the only thing that needs to change are the JAR files? Can everything remain as-is? Thanks, -Mike
Re: Cassandra flush spin?
Sounds like flushing due to memory consumption. The flush log messages include the number of ops, so you can see if this node was processing more mutations that the others. Try to see if there was more (serialised) data being written or more operations being processed. Also just for fun check the JVM and yaml settings are as expected. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 6:29 AM, Mike mthero...@yahoo.com wrote: Hello, We just hit a very odd issue in our Cassandra cluster. We are running Cassandra 1.1.2 in a 6 node cluster. We use a replication factor of 3, and all operations utilize LOCAL_QUORUM consistency. We noticed a large performance hit in our application's maintenance activities and I've been investigating. I discovered a node in the cluster that was flushing a memtable like crazy. It was flushing every 2-3 minutes, and has been apparently doing this for days. Typically, during this time of day, a flush would happen every 30 minutes or so. alldb.sh cat /var/log/cassandra/system.log | grep \flushing high-traffic column family CFS(Keyspace='open', ColumnFamily='msgs')\ | grep 02-08 | wc -l [1] 18:41:04 [SUCCESS] db-1c-1 59 [2] 18:41:05 [SUCCESS] db-1c-2 48 [3] 18:41:05 [SUCCESS] db-1a-1 1206 [4] 18:41:05 [SUCCESS] db-1d-2 54 [5] 18:41:05 [SUCCESS] db-1a-2 56 [6] 18:41:05 [SUCCESS] db-1d-1 52 I restarted the database node, and, at least for now, the problem appears to have stopped. There are a number of things that don't make sense here. We use a replication factor of 3, so if this was being caused by our application, I would have expected 3 nodes in the cluster to have issues. Also, I would have expected the issue to continue once the node restarted. Another information point of interest, and I'm wondering if its exposed a bug, was this node was recently converted to use ephemeral storage on EC2, and was restored from a snapshot. After the restore, a nodetool repair was run. However, repair was going to run into some heavy activity for our application, and we canceled that validation compaction (2 of the 3 anti-entropy sessions had completed). The spin appears to have started at the start of the second session. Any hints? -Mike
Re: persisted ring state
Is that the right way to do? No. If you want to change the token for a node use nodetool move. Changing it like this will not make the node change it's token. Because after startup the token is stored in the System.LocationInfo CF. or -Dcassandra.load_ring_state=false|true is only limited to changes to seed/listen_address ? it's used when a node somehow as a bad view of the ring, and you want it to forget things. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 3:35 AM, S C as...@outlook.com wrote: In one of the scenarios that I encountered, I needed to change the token on the node. I added new token and started the node with -Dcassandra.load_ring_state=false in anticipation that the node will not pick from the locally persisted data. Is that the right way to do? or -Dcassandra.load_ring_state=false|true is only limited to changes to seed/listen_address ? Thanks, SC
Re: Issues with writing data to Cassandra column family using a Hive script
Hi Aaron, Thanks for the reply.. I ll try out your suggestion. Regards, Dinusha. On Mon, Feb 11, 2013 at 1:55 AM, aaron morton aa...@thelastpickle.comwrote: Don't use the variable length Cassandra integer, use the Int32Type. It also sounds like you want to use a DoubleType rather than FloatType. http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi sdddilruk...@gmail.com wrote: Hi All, Data was originally stored in column family called test_cf. Definition of column family is as follows: CREATE COLUMN FAMILY test_cf WITH COMPARATOR = 'IntegerType' AND key_validation_class = UTF8Type AND default_validation_class = FloatType; And, following is the sample data set that contains in test_cf. cqlsh:temp_ks select * from test_cf; key| column1| value --++--- localhost:8282 | 1350468600 |76 localhost:8282 | 1350468601 |76 Hive script (shown in the end of mail) is use to take the data from above column family test_cf and insert into a new column family called cpu_avg_5min_new7. Column family description of cpu_avg_5min_new7 is also same as the test_cf. Issue is, data written in to cpu_avg_5min_new7 column family after executing the hive script is as follows. It's not in the format of data present in the original column family test_cf. Any explanations would highly appreciate.. cqlsh:temp_ks select * from cpu_avg_5min_new7; key| column1 | value --+--+-- localhost:8282 | 232340574229062170849328 | 1.09e-05 localhost:8282 | 232340574229062170849329 | 1.09e-05 Hive script: drop table cpu_avg_5min_new7_hive; CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( cassandra.host = 127.0.0.1 , cassandra.port = 9160 , cassandra.ks.name = temp_ks , cassandra.ks.username = xxx , cassandra.ks.password = xxx , cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name = cpu_avg_5min_new7 ); drop table xxx; CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( cassandra.host = 127.0.0.1 , cassandra.port = 9160 , cassandra.ks.name = temp_ks , cassandra.ks.username = xxx , cassandra.ks.password = xxx , cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name = test_cf ); insert overwrite table cpu_avg_5min_new7_hive select src_id,start_time,cpu_avg from xxx; Regards, Dinusha.
Querying composite keys
Hello I have key and columns defined in following fashion: HotelName1:RoomNum1 HotelName2:RoomNum2 HotelName3:RoomNum3 Key1:TimeStamp:VersionNum Is there a way that I can query this schema by only 'key' or 'HotelName' i.e. querying using a part of composite key and not the full key. Thanks and Regards Rishabh Agrawal NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Querying composite keys
You can query over composite columns as: 1) Partition key 2) First part of clustered key(using EQ ops). Secondary indexes over non composite columns are not possible. -Vivek On Mon, Feb 11, 2013 at 12:06 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: Hello I have key and columns defined in following fashion: HotelName1:RoomNum1 HotelName2:RoomNum2 HotelName3:RoomNum3 Key1:TimeStamp:VersionNum Is there a way that I can query this schema by only ‘key’ or ‘HotelName’ i.e. querying using a part of composite key and not the full key. Thanks and Regards Rishabh Agrawal -- NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Cassandra 1.1.2 - 1.1.8 upgrade
2) Upgrade one node at a time, running the clustered in a mixed 1.1.2-1.1.9 configuration for a number of days. I'm about to upgrade my 1.1.0 cluster and http://www.datastax.com/docs/1.1/install/upgrading#info says: If you are upgrading to Cassandra 1.1.9 from a version earlier than 1.1.7, all nodes must be upgraded before any streaming can take place. Until you upgrade all nodes, you cannot add version 1.1.7 nodes or later to a 1.1.7 or earlier cluster. Which one is correct then? Can I run mixed 1.1.2 (in my case 1.1.0) 1.1.9 cluster or not? M.