Mutation dropped and Read-Repair performance issue
Hi All, We are facing problems of failure of Read-Repair stages with error Digest Mismatch and count is 300+ per day per node. At the same time, we are experiencing node is getting overloaded for a quick couple of seconds due to long GC pauses (of around 7-8 seconds). We are not running a repair on regular basis as a maintenance activity owing to the node is going down whenever we are running repair for the tables. After running the repair node is going down due to long GC pauses again. Except for one table for all other tables, we can run the repair with option --in-local-dc. Below is the configuration of the cluster: 1. 15 node cluster. 2. RF=3 3. Xmx and Xms 31GB. 4. G1GC algorithm is in use. 5. Version 3.11.2 6. Load on each node roughly around 500GB 7. One table is having a maximum amount of load compared to other tables. Please suggest if there are any configuration level changes which we can do to avoid the above problems. Getting too many digest mismatch messages is a sign of node is doing more read and write operations compared to without those messages and it can be the cause of node is getting overloaded for that particular moment? -- Thanks, S.R.
Re: Mutation dropped
Aaron, what did u mean with RF3 CLQuorum is more a real world scenario? If there are only 2 nodes, where will be placed the third replica? By increasing the CL wont it decrease the performance on w/r and then increase the timeoutexceptions of this mentioned case? On Fri, Feb 22, 2013 at 1:59 PM, aaron morton aa...@thelastpickle.comwrote: If you are running repair, using QUORUM, and there are not dropped writes you should not be getting DigestMismatch during reads. If everything else looks good, but the request latency is higher than the CF latency I would check that client load is evenly distributed. Then start looking to see if the request throughput is at it's maximum for the cluster. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 8:15 PM, Wei Zhu wz1...@yahoo.com wrote: Thanks Aaron for the great information as always. I just checked cfhistograms and only a handful of read latency are bigger than 100ms, but for proxyhistograms there are 10 times more are greater than 100ms. We are using QUORUM for reading with RF=3, and I understand coordinator needs to get the digest from other nodes and read repair on the miss match etc. But is it normal to see the latency from proxyhistograms to go beyond 100ms? Is there anyway to improve that? We are tracking the metrics from Client side and we see the 95th percentile response time averages at 40ms which is a bit high. Our 50th percentile was great under 3ms. Any suggestion is very much appreciated. Thanks. -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: Cassandra User user@cassandra.apache.org Sent: Thursday, February 21, 2013 9:20:49 AM Subject: Re: Mutation dropped What does rpc_timeout control? Only the reads/writes? Yes. like data stream, streaming_socket_timeout_in_ms in the yaml merkle tree request? Either no time out or a number of days, cannot remember which right now. What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote: What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http
Re: Mutation dropped
If you are running repair, using QUORUM, and there are not dropped writes you should not be getting DigestMismatch during reads. If everything else looks good, but the request latency is higher than the CF latency I would check that client load is evenly distributed. Then start looking to see if the request throughput is at it's maximum for the cluster. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 8:15 PM, Wei Zhu wz1...@yahoo.com wrote: Thanks Aaron for the great information as always. I just checked cfhistograms and only a handful of read latency are bigger than 100ms, but for proxyhistograms there are 10 times more are greater than 100ms. We are using QUORUM for reading with RF=3, and I understand coordinator needs to get the digest from other nodes and read repair on the miss match etc. But is it normal to see the latency from proxyhistograms to go beyond 100ms? Is there anyway to improve that? We are tracking the metrics from Client side and we see the 95th percentile response time averages at 40ms which is a bit high. Our 50th percentile was great under 3ms. Any suggestion is very much appreciated. Thanks. -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: Cassandra User user@cassandra.apache.org Sent: Thursday, February 21, 2013 9:20:49 AM Subject: Re: Mutation dropped What does rpc_timeout control? Only the reads/writes? Yes. like data stream, streaming_socket_timeout_in_ms in the yaml merkle tree request? Either no time out or a number of days, cannot remember which right now. What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote: What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two
Re: Mutation dropped
What does rpc_timeout control? Only the reads/writes? Yes. like data stream, streaming_socket_timeout_in_ms in the yaml merkle tree request? Either no time out or a number of days, cannot remember which right now. What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote: What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Re: Mutation dropped
Thanks Aaron for the great information as always. I just checked cfhistograms and only a handful of read latency are bigger than 100ms, but for proxyhistograms there are 10 times more are greater than 100ms. We are using QUORUM for reading with RF=3, and I understand coordinator needs to get the digest from other nodes and read repair on the miss match etc. But is it normal to see the latency from proxyhistograms to go beyond 100ms? Is there anyway to improve that? We are tracking the metrics from Client side and we see the 95th percentile response time averages at 40ms which is a bit high. Our 50th percentile was great under 3ms. Any suggestion is very much appreciated. Thanks. -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: Cassandra User user@cassandra.apache.org Sent: Thursday, February 21, 2013 9:20:49 AM Subject: Re: Mutation dropped What does rpc_timeout control? Only the reads/writes? Yes. like data stream, streaming_socket_timeout_in_ms in the yaml merkle tree request? Either no time out or a number of days, cannot remember which right now. What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote: What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some
Re: Mutation dropped
What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ?No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ?There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Re: Mutation dropped
Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
RE: Mutation dropped
Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.comhttp://mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Mutation dropped Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Re: Mutation dropped
You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Mutation dropped
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
RE: Mutation dropped
Hi - Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
RE: Mutation Dropped Messages
1. One node is running at 8G rest on 10G - same config 2. Nodetool - Status State LoadOwnsToken 162563731948587347959549934419333022646 Up Normal 107.79 MB 25.00% 34957844353235424160784456632419943350 Up Normal 116.44 MB 25.00% 77493140218352732093706282561390969782 Up Normal 27.01 MB12.68% 99065646426277998282363457251162269147 Up Normal 35.9 MB 12.32% 120028436083470040026628108490361996214 Up Normal 512.55 KB 25.00% 162563731948587347959549934419333022646 RF:2 and CL: QUORUM - writes at a rate of 1750 rows/s - every row - 5 cols and 2 of them indexes. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 11:07 PM To: user@cassandra.apache.org Subject: Re: Mutation Dropped Messages I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. Ensure all the nodes have the same spec, and the nodes have the same config. In a virtual environment consider moving the node. Is this due to some improper load balancing? What does nodetool ring say and what sort of queries (and RF and CL) are you sending. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote: Hey Aaron, I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. The other nodes are not dropping mutation messages. I am using Hector API and had done nothing for load balancing so far. Just provided the host:port of the nodes in the Cassandrahostconfig. Is this due to some improper load balancing? Also the physical host where the node is hosted is relatively heavier than other nodes' host. What can I do to improve? PS: The node is seed of the cluster. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 4:15 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Mutation Dropped Messages 1. Which parameters to tune in the config files? - Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found Mutation Dropped messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? - Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing. NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do
Re: Mutation Dropped Messages
1. One node is running at 8G rest on 10G – same config Make them all the same. 2. Nodetool – Even though the token ranges are not balanced, the load looks a little odd. Have you moved tokens ? Did you do a cleanup ? You'll need to look at the node that is dropping messages (not sure what that is). What is happening in the log ? Is it having GC problems ? What is happening with the io and CPU load on the machine ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 11:57 PM, Tiwari, Dushyant wrote: 1. One node is running at 8G rest on 10G – same config 2. Nodetool – Status State LoadOwnsToken 162563731948587347959549934419333022646 Up Normal 107.79 MB 25.00% 34957844353235424160784456632419943350 Up Normal 116.44 MB 25.00% 77493140218352732093706282561390969782 Up Normal 27.01 MB12.68% 99065646426277998282363457251162269147 Up Normal 35.9 MB 12.32% 120028436083470040026628108490361996214 Up Normal 512.55 KB 25.00% 162563731948587347959549934419333022646 RF:2 and CL: QUORUM – writes at a rate of 1750 rows/s – every row – 5 cols and 2 of them indexes. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 11:07 PM To: user@cassandra.apache.org Subject: Re: Mutation Dropped Messages I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. Ensure all the nodes have the same spec, and the nodes have the same config. In a virtual environment consider moving the node. Is this due to some improper load balancing? What does nodetool ring say and what sort of queries (and RF and CL) are you sending. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote: Hey Aaron, I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. The other nodes are not dropping mutation messages. I am using Hector API and had done nothing for load balancing so far. Just provided the host:port of the nodes in the Cassandrahostconfig. Is this due to some improper load balancing? Also the physical host where the node is hosted is relatively heavier than other nodes’ host. What can I do to improve? PS: The node is seed of the cluster. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 4:15 PM To: user@cassandra.apache.org Subject: Re: Mutation Dropped Messages 1. Which parameters to tune in the config files? – Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? – Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender
Mutation Dropped Messages
Hi All, While benchmarking Cassandra I found Mutation Dropped messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? - Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant -- NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: Mutation Dropped Messages
1. Which parameters to tune in the config files? – Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? – Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: Mutation Dropped Messages
I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. Ensure all the nodes have the same spec, and the nodes have the same config. In a virtual environment consider moving the node. Is this due to some improper load balancing? What does nodetool ring say and what sort of queries (and RF and CL) are you sending. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote: Hey Aaron, I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. The other nodes are not dropping mutation messages. I am using Hector API and had done nothing for load balancing so far. Just provided the host:port of the nodes in the Cassandrahostconfig. Is this due to some improper load balancing? Also the physical host where the node is hosted is relatively heavier than other nodes’ host. What can I do to improve? PS: The node is seed of the cluster. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 4:15 PM To: user@cassandra.apache.org Subject: Re: Mutation Dropped Messages 1. Which parameters to tune in the config files? – Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? – Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing. NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.