Re: how to solve one node is in heavy load in unbalanced cluster
move first removes the node from the cluster, then adds it back http://wiki.apache.org/cassandra/Operations#Moving_nodes If you have 3 nodes and rf 3, removing the node will result in the error you are seeing. There is not enough nodes in the cluster to implement the replication factor. You can drop the RF down to 2 temporarily and then put it back to 3 later, see http://wiki.apache.org/cassandra/Operations#Replication Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 Aug 2011, at 03:39, Yan Chunlu wrote: hi, any help? thanks! On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote: forgot to mention I am using cassandra 0.7.4 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote: also nothing happens about the streaming: nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 165086750 Responses n/a 0 99372520 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote: sorry the ring info should be this: nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.com wrote: I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) at org.apache.cassandra.service.StorageService.move(StorageService.java:1734) at org.apache.cassandra.service.StorageService.move(StorageService.java:1709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at
Re: how to solve one node is in heavy load in unbalanced cluster
thanks for the confirmation aaron! On Sun, Aug 7, 2011 at 4:01 PM, aaron morton aa...@thelastpickle.comwrote: move first removes the node from the cluster, then adds it back http://wiki.apache.org/cassandra/Operations#Moving_nodes If you have 3 nodes and rf 3, removing the node will result in the error you are seeing. There is not enough nodes in the cluster to implement the replication factor. You can drop the RF down to 2 temporarily and then put it back to 3 later, see http://wiki.apache.org/cassandra/Operations#Replication Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 Aug 2011, at 03:39, Yan Chunlu wrote: hi, any help? thanks! On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote: forgot to mention I am using cassandra 0.7.4 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote: also nothing happens about the streaming: nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 165086750 Responses n/a 0 99372520 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.comwrote: sorry the ring info should be this: nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.comwrote: I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) at org.apache.cassandra.service.StorageService.move(StorageService.java:1734) at org.apache.cassandra.service.StorageService.move(StorageService.java:1709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) then nodetool shows the node is leaving nodetool -h node3
Re: how to solve one node is in heavy load in unbalanced cluster
I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) at org.apache.cassandra.service.StorageService.move(StorageService.java:1734) at org.apache.cassandra.service.StorageService.move(StorageService.java:1709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) then nodetool shows the node is leaving nodetool -h reagon ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node3 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node3 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 the log didn't show any error message neither anything abnormal. is there something wrong? I used to have RF=2, and changed it to RF=3 using cassandra-cli. On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.com wrote: thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote: springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x Yes use that logic to get the tokens. I think it's safe to run move first and reair later. You are moving some nodes data as is so it's no worse than what you have right now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
also nothing happens about the streaming: nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 165086750 Responses n/a 0 99372520 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote: sorry the ring info should be this: nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.com wrote: I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) at org.apache.cassandra.service.StorageService.move(StorageService.java:1734) at org.apache.cassandra.service.StorageService.move(StorageService.java:1709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) then nodetool shows the node is leaving nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 the log didn't show any error message neither anything abnormal. is there something wrong? I used to have RF=2, and changed it to RF=3 using cassandra-cli. On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.comwrote: thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.comwrote: springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x
Re: how to solve one node is in heavy load in unbalanced cluster
hi, any help? thanks! On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote: forgot to mention I am using cassandra 0.7.4 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote: also nothing happens about the streaming: nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 165086750 Responses n/a 0 99372520 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote: sorry the ring info should be this: nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.comwrote: I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2) at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) at org.apache.cassandra.service.StorageService.move(StorageService.java:1734) at org.apache.cassandra.service.StorageService.move(StorageService.java:1709) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) then nodetool shows the node is leaving nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal 13.18 GB81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.85 GB10.48% 70597222385644499881390884416714081360 node3 Up Leaving 25.44 GB8.43% 84944475733633104818662955375549269696 the log didn't show any error message neither anything abnormal. is there something wrong? I used to have RF=2, and changed it to RF=3 using cassandra-cli. On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.comwrote: thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra
Re: how to solve one node is in heavy load in unbalanced cluster
any help? thanks! On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu springri...@gmail.com wrote: and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3? coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu springri...@gmail.comwrote: add new nodes seems added more pressure to the cluster? how about your data size? On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote: Dropped read message might be an indicator of capacity issue. We experienced the similar issue with 0.7.6. We ended up adding two extra nodes and physically rebooted the offending node(s). The entire cluster then calmed down. On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.comwrote: I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 it is very un-balanced and I would like to re-balance it using nodetool move asap. unfortunately I haven't been run node repair for a long time. aaron suggested it's better to run node repair on every node then re-balance it. problem is the node3 is in heavy-load currently, and the entire cluster slow down if I start doing node repair. I have to disablegossip and disablethrift to stop the repair. only cassandra running on that server and I have no idea what it was doing. the cpu load is about 20+ currently. compcationstats and netstats shows it was not doing anything. I have change client to not to connect to node3, but still, it seems in heavy load and io utils is 100%. the log seems normal(although not sure what about the Dropped read message thing): INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving 2563726360 used; max is 4248829952 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms INFO 13:21:38,560 Pool NameActive Pending INFO 13:21:38,560 ReadStage 8 7555 INFO 13:21:38,561 RequestResponseStage 0 0 INFO 13:21:38,561 ReadRepairStage 0 0 is there anyway to tell what node3 was doing? or at least is there any way to make it not slowdown the whole cluster? -- Frank Duan aiMatch fr...@aimatch.com c: 703.869.9951 www.aiMatch.com
Re: how to solve one node is in heavy load in unbalanced cluster
First run nodetool move and then you can run nodetool repair. Before you run nodetool move you will need to determine tokens that each node will be responsible for. Then use that token to perform move. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote: First run nodetool move and then you can run nodetool repair. Before you run nodetool move you will need to determine tokens that each node will be responsible for. Then use that token to perform move. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
aaron suggested it's better to run node repair on every node then re-balance it. That's me been cautious with other peoples data. It looks like node 3 is overwhelmed. Try getting the move sorted. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Aug 2011, at 05:48, Yan Chunlu wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): for x in xrange(nodes): print 2 ** 127 / nodes * x On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote: First run nodetool move and then you can run nodetool repair. Before you run nodetool move you will need to determine tokens that each node will be responsible for. Then use that token to perform move. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x Yes use that logic to get the tokens. I think it's safe to run move first and reair later. You are moving some nodes data as is so it's no worse than what you have right now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
okay, thanks Aaron! On Mon, Aug 1, 2011 at 5:43 AM, aaron morton aa...@thelastpickle.comwrote: aaron suggested it's better to run node repair on every node then re-balance it. That's me been cautious with other peoples data. It looks like node 3 is overwhelmed. Try getting the move sorted. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Aug 2011, at 05:48, Yan Chunlu wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote: First run nodetool move and then you can run nodetool repair. Before you run nodetool move you will need to determine tokens that each node will be responsible for. Then use that token to perform move. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: how to solve one node is in heavy load in unbalanced cluster
thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote: springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x Yes use that logic to get the tokens. I think it's safe to run move first and reair later. You are moving some nodes data as is so it's no worse than what you have right now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
how to solve one node is in heavy load in unbalanced cluster
I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 it is very un-balanced and I would like to re-balance it using nodetool move asap. unfortunately I haven't been run node repair for a long time. aaron suggested it's better to run node repair on every node then re-balance it. problem is the node3 is in heavy-load currently, and the entire cluster slow down if I start doing node repair. I have to disablegossip and disablethrift to stop the repair. only cassandra running on that server and I have no idea what it was doing. the cpu load is about 20+ currently. compcationstats and netstats shows it was not doing anything. I have change client to not to connect to node3, but still, it seems in heavy load and io utils is 100%. the log seems normal(although not sure what about the Dropped read message thing): INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving 2563726360 used; max is 4248829952 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms INFO 13:21:38,560 Pool NameActive Pending INFO 13:21:38,560 ReadStage 8 7555 INFO 13:21:38,561 RequestResponseStage 0 0 INFO 13:21:38,561 ReadRepairStage 0 0 is there anyway to tell what node3 was doing? or at least is there any way to make it not slowdown the whole cluster?
Re: how to solve one node is in heavy load in unbalanced cluster
Dropped read message might be an indicator of capacity issue. We experienced the similar issue with 0.7.6. We ended up adding two extra nodes and physically rebooted the offending node(s). The entire cluster then calmed down. On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.com wrote: I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 it is very un-balanced and I would like to re-balance it using nodetool move asap. unfortunately I haven't been run node repair for a long time. aaron suggested it's better to run node repair on every node then re-balance it. problem is the node3 is in heavy-load currently, and the entire cluster slow down if I start doing node repair. I have to disablegossip and disablethrift to stop the repair. only cassandra running on that server and I have no idea what it was doing. the cpu load is about 20+ currently. compcationstats and netstats shows it was not doing anything. I have change client to not to connect to node3, but still, it seems in heavy load and io utils is 100%. the log seems normal(although not sure what about the Dropped read message thing): INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving 2563726360 used; max is 4248829952 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms INFO 13:21:38,560 Pool NameActive Pending INFO 13:21:38,560 ReadStage 8 7555 INFO 13:21:38,561 RequestResponseStage 0 0 INFO 13:21:38,561 ReadRepairStage 0 0 is there anyway to tell what node3 was doing? or at least is there any way to make it not slowdown the whole cluster? -- Frank Duan aiMatch fr...@aimatch.com c: 703.869.9951 www.aiMatch.com
Re: how to solve one node is in heavy load in unbalanced cluster
add new nodes seems added more pressure to the cluster? how about your data size? On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote: Dropped read message might be an indicator of capacity issue. We experienced the similar issue with 0.7.6. We ended up adding two extra nodes and physically rebooted the offending node(s). The entire cluster then calmed down. On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.com wrote: I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 it is very un-balanced and I would like to re-balance it using nodetool move asap. unfortunately I haven't been run node repair for a long time. aaron suggested it's better to run node repair on every node then re-balance it. problem is the node3 is in heavy-load currently, and the entire cluster slow down if I start doing node repair. I have to disablegossip and disablethrift to stop the repair. only cassandra running on that server and I have no idea what it was doing. the cpu load is about 20+ currently. compcationstats and netstats shows it was not doing anything. I have change client to not to connect to node3, but still, it seems in heavy load and io utils is 100%. the log seems normal(although not sure what about the Dropped read message thing): INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving 2563726360 used; max is 4248829952 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms INFO 13:21:38,560 Pool NameActive Pending INFO 13:21:38,560 ReadStage 8 7555 INFO 13:21:38,561 RequestResponseStage 0 0 INFO 13:21:38,561 ReadRepairStage 0 0 is there anyway to tell what node3 was doing? or at least is there any way to make it not slowdown the whole cluster? -- Frank Duan aiMatch fr...@aimatch.com c: 703.869.9951 www.aiMatch.com
Re: how to solve one node is in heavy load in unbalanced cluster
and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3? coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu springri...@gmail.com wrote: add new nodes seems added more pressure to the cluster? how about your data size? On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote: Dropped read message might be an indicator of capacity issue. We experienced the similar issue with 0.7.6. We ended up adding two extra nodes and physically rebooted the offending node(s). The entire cluster then calmed down. On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.comwrote: I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696 it is very un-balanced and I would like to re-balance it using nodetool move asap. unfortunately I haven't been run node repair for a long time. aaron suggested it's better to run node repair on every node then re-balance it. problem is the node3 is in heavy-load currently, and the entire cluster slow down if I start doing node repair. I have to disablegossip and disablethrift to stop the repair. only cassandra running on that server and I have no idea what it was doing. the cpu load is about 20+ currently. compcationstats and netstats shows it was not doing anything. I have change client to not to connect to node3, but still, it seems in heavy load and io utils is 100%. the log seems normal(although not sure what about the Dropped read message thing): INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving 2563726360 used; max is 4248829952 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms INFO 13:21:38,560 Pool NameActive Pending INFO 13:21:38,560 ReadStage 8 7555 INFO 13:21:38,561 RequestResponseStage 0 0 INFO 13:21:38,561 ReadRepairStage 0 0 is there anyway to tell what node3 was doing? or at least is there any way to make it not slowdown the whole cluster? -- Frank Duan aiMatch fr...@aimatch.com c: 703.869.9951 www.aiMatch.com