Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-07 Thread aaron morton
move first removes the node from the cluster, then adds it back 
http://wiki.apache.org/cassandra/Operations#Moving_nodes

If you have 3 nodes and rf 3, removing the node will result in the error you 
are seeing. There is not enough nodes in the cluster to implement the 
replication factor. 

You can drop the RF down to 2 temporarily and then put it back to 3 later, see 
http://wiki.apache.org/cassandra/Operations#Replication

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Aug 2011, at 03:39, Yan Chunlu wrote:

 hi, any  help? thanks!
 
 On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote:
 forgot to mention I am using cassandra 0.7.4
 
 
 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote:
 also nothing happens about the streaming:
 
 nodetool -h node3 netstats
 Mode: Normal
 Not sending any streams.
  Nothing streaming from /10.28.53.11
 Pool NameActive   Pending  Completed
 Commandsn/a 0  165086750
 Responses   n/a 0   99372520
 
 
 
 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote:
 sorry the ring info should be this:
 
 nodetool -h node3 ring
 Address Status State   LoadOwnsToken  
  

 84944475733633104818662955375549269696  
 node1  Up Normal  13.18 GB81.09%  
 52773518586096316348543097376923124102  
 node2 Up Normal  22.85 GB10.48%  
 70597222385644499881390884416714081360  
 node3  Up Leaving 25.44 GB8.43%   
 84944475733633104818662955375549269696 
 
 
 
 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.com wrote:
 I have tried the nodetool move but get the following error
 
 node3:~# nodetool -h node3 move 0
 Exception in thread main java.lang.IllegalStateException: replication 
 factor (3) exceeds number of endpoints (2)
   at 
 org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
   at 
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
   at 
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
   at 
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
   at 
 org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
   at 
 org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
   at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
   at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
   at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
   at sun.rmi.transport.Transport$1.run(Transport.java:159)
   at java.security.AccessController.doPrivileged(Native Method)
   at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
   at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
   at 
 

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-07 Thread Yan Chunlu
thanks for the confirmation aaron!

On Sun, Aug 7, 2011 at 4:01 PM, aaron morton aa...@thelastpickle.comwrote:

 move first removes the node from the cluster, then adds it back
 http://wiki.apache.org/cassandra/Operations#Moving_nodes

 If you have 3 nodes and rf 3, removing the node will result in the error
 you are seeing. There is not enough nodes in the cluster to implement the
 replication factor.

 You can drop the RF down to 2 temporarily and then put it back to 3 later,
 see http://wiki.apache.org/cassandra/Operations#Replication

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5 Aug 2011, at 03:39, Yan Chunlu wrote:

 hi, any  help? thanks!

 On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote:

 forgot to mention I am using cassandra 0.7.4


 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote:

 also nothing happens about the streaming:

 nodetool -h node3 netstats
 Mode: Normal
 Not sending any streams.
  Nothing streaming from /10.28.53.11
 Pool NameActive   Pending  Completed
 Commandsn/a 0  165086750
 Responses   n/a 0   99372520



 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.comwrote:

 sorry the ring info should be this:

 nodetool -h node3 ring
 Address Status State   LoadOwnsToken


  84944475733633104818662955375549269696
 node1  Up Normal  13.18 GB81.09%
  52773518586096316348543097376923124102
 node2 Up Normal  22.85 GB10.48%
  70597222385644499881390884416714081360
 node3  Up Leaving 25.44 GB8.43%
 84944475733633104818662955375549269696



 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.comwrote:

 I have tried the nodetool move but get the following error

 node3:~# nodetool -h node3 move 0
 Exception in thread main java.lang.IllegalStateException: replication
 factor (3) exceeds number of endpoints (2)
  at
 org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
  at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
 at
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
  at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
 at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
  at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
  at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
  at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
  at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
  at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
 at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 at java.security.AccessController.doPrivileged(Native Method)
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
  at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)




 then nodetool shows the node is leaving


 nodetool -h node3 

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
I have tried the nodetool move but get the following error

node3:~# nodetool -h node3 move 0
Exception in thread main java.lang.IllegalStateException: replication
factor (3) exceeds number of endpoints (2)
 at
org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
 at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
at
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
 at
org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
 at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
 at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
 at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)




then nodetool shows the node is leaving


nodetool -h reagon ring
Address Status State   LoadOwnsToken


 84944475733633104818662955375549269696
node3  Up Normal  13.18 GB81.09%
 52773518586096316348543097376923124102
node3 Up Normal  22.85 GB10.48%
 70597222385644499881390884416714081360
node3  Up Leaving 25.44 GB8.43%
84944475733633104818662955375549269696

the log didn't show any error message neither anything abnormal.  is there
something wrong?


I used to have RF=2, and changed it to RF=3 using cassandra-cli.


On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.com wrote:

 thanks a lot! I will try the move.


 On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote:


 springrider wrote:
 
  is that okay to do nodetool move before a completely repair?
 
  using this equation?
  def tokens(nodes):
 
 - for x in xrange(nodes):
- print 2 ** 127 / nodes * x
 

 Yes use that logic to get the tokens. I think it's safe to run move first
 and reair later. You are moving some nodes data as is so it's no worse
 than
 what you have right now.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.





Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
also nothing happens about the streaming:

nodetool -h node3 netstats
Mode: Normal
Not sending any streams.
 Nothing streaming from /10.28.53.11
Pool NameActive   Pending  Completed
Commandsn/a 0  165086750
Responses   n/a 0   99372520



On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote:

 sorry the ring info should be this:

 nodetool -h node3 ring
 Address Status State   LoadOwnsToken


  84944475733633104818662955375549269696
 node1  Up Normal  13.18 GB81.09%
  52773518586096316348543097376923124102
 node2 Up Normal  22.85 GB10.48%
  70597222385644499881390884416714081360
 node3  Up Leaving 25.44 GB8.43%
 84944475733633104818662955375549269696



 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.com wrote:

 I have tried the nodetool move but get the following error

 node3:~# nodetool -h node3 move 0
 Exception in thread main java.lang.IllegalStateException: replication
 factor (3) exceeds number of endpoints (2)
  at
 org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
  at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
 at
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
  at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
 at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
  at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
  at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
  at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
  at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
  at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
 at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 at java.security.AccessController.doPrivileged(Native Method)
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
  at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)




 then nodetool shows the node is leaving


 nodetool -h node3 ring
 Address Status State   LoadOwnsToken


  84944475733633104818662955375549269696
 node1  Up Normal  13.18 GB81.09%
  52773518586096316348543097376923124102
 node2 Up Normal  22.85 GB10.48%
  70597222385644499881390884416714081360
  node3  Up Leaving 25.44 GB8.43%
 84944475733633104818662955375549269696

 the log didn't show any error message neither anything abnormal.  is there
 something wrong?


 I used to have RF=2, and changed it to RF=3 using cassandra-cli.


 On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.comwrote:

 thanks a lot! I will try the move.


 On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.comwrote:


 springrider wrote:
 
  is that okay to do nodetool move before a completely repair?
 
  using this equation?
  def tokens(nodes):
 
 - for x in xrange(nodes):
- print 2 ** 127 / nodes * x
 


Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
hi, any  help? thanks!

On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote:

 forgot to mention I am using cassandra 0.7.4


 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote:

 also nothing happens about the streaming:

 nodetool -h node3 netstats
 Mode: Normal
 Not sending any streams.
  Nothing streaming from /10.28.53.11
 Pool NameActive   Pending  Completed
 Commandsn/a 0  165086750
 Responses   n/a 0   99372520



 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote:

 sorry the ring info should be this:

 nodetool -h node3 ring
 Address Status State   LoadOwnsToken


  84944475733633104818662955375549269696
 node1  Up Normal  13.18 GB81.09%
  52773518586096316348543097376923124102
 node2 Up Normal  22.85 GB10.48%
  70597222385644499881390884416714081360
 node3  Up Leaving 25.44 GB8.43%
 84944475733633104818662955375549269696



 On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu springri...@gmail.comwrote:

 I have tried the nodetool move but get the following error

 node3:~# nodetool -h node3 move 0
 Exception in thread main java.lang.IllegalStateException: replication
 factor (3) exceeds number of endpoints (2)
  at
 org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
 at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
  at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
 at
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
  at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
 at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
  at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
  at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
  at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
  at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
  at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
 at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 at java.security.AccessController.doPrivileged(Native Method)
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
  at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)




 then nodetool shows the node is leaving


 nodetool -h node3 ring
 Address Status State   LoadOwnsToken


  84944475733633104818662955375549269696
 node1  Up Normal  13.18 GB81.09%
  52773518586096316348543097376923124102
 node2 Up Normal  22.85 GB10.48%
  70597222385644499881390884416714081360
  node3  Up Leaving 25.44 GB8.43%
 84944475733633104818662955375549269696

 the log didn't show any error message neither anything abnormal.  is
 there something wrong?


 I used to have RF=2, and changed it to RF=3 using cassandra-cli.


 On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.comwrote:

 thanks a lot! I will try the move.


 On Mon, Aug 1, 2011 at 7:07 AM, mcasandra 

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
any help? thanks!

On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu springri...@gmail.com wrote:

 and by the way, my RF=3 and the other two nodes have much more capacity,
 why does they always routed the request to node3?

 coud I do a rebalance now? before node repair?


 On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu springri...@gmail.comwrote:

 add new nodes seems added more pressure  to the cluster?  how about your
 data size?


 On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote:

 Dropped read message might be an indicator of capacity issue. We
 experienced the similar issue with 0.7.6.

 We ended up adding two extra nodes and physically rebooted the offending
 node(s).

 The entire cluster then calmed down.

 On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.comwrote:

 I have three nodes and RF=3.here is the current ring:


 Address Status State Load Owns Token

 84944475733633104818662955375549269696
 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


 it is very un-balanced and I would like to re-balance it using
 nodetool move asap. unfortunately I haven't been run node repair for
 a long time.

 aaron suggested it's better to run node repair on every node then
 re-balance it.


 problem is the node3 is in heavy-load currently, and the entire
 cluster slow down if I start doing node repair. I have to
 disablegossip and disablethrift to stop the repair.

 only cassandra running on that server and I have no idea what it was
 doing. the cpu load is about 20+ currently. compcationstats and
 netstats shows it was not doing anything.

 I have change client to not to connect to node3, but still, it seems
 in heavy load and io utils is 100%.


 the log seems normal(although not sure what about the Dropped read
 message thing):

  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
 2563726360 used; max is 4248829952
  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
  INFO 13:21:38,560 Pool NameActive   Pending
  INFO 13:21:38,560 ReadStage 8  7555
  INFO 13:21:38,561 RequestResponseStage  0 0
  INFO 13:21:38,561 ReadRepairStage   0 0



 is there anyway to tell what node3 was doing? or at least is there any
 way to make it not slowdown the whole cluster?




 --
 Frank Duan
 aiMatch
 fr...@aimatch.com
 c: 703.869.9951
 www.aiMatch.com






Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread mcasandra
First run nodetool move and then you can run nodetool repair. Before you run
nodetool move you will need to determine tokens that each node will be
responsible for. Then use that token to perform move.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
is that okay to do nodetool move before a completely repair?

using this equation?
def tokens(nodes):

   - for x in xrange(nodes):
  - print 2 ** 127 / nodes * x


On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote:

 First run nodetool move and then you can run nodetool repair. Before you
 run
 nodetool move you will need to determine tokens that each node will be
 responsible for. Then use that token to perform move.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread aaron morton
 aaron suggested it's better to run node repair on every node then re-balance 
 it.

That's me been cautious with other peoples data.

It looks like node 3 is overwhelmed. Try getting the move sorted. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1 Aug 2011, at 05:48, Yan Chunlu wrote:

 is that okay to do nodetool move before a completely repair?
 
 using this equation?
 def tokens(nodes):
 for x in xrange(nodes):
 print 2 ** 127 / nodes * x
 
 On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote:
 First run nodetool move and then you can run nodetool repair. Before you run
 nodetool move you will need to determine tokens that each node will be
 responsible for. Then use that token to perform move.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.
 



Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread mcasandra

springrider wrote:
 
 is that okay to do nodetool move before a completely repair?
 
 using this equation?
 def tokens(nodes):
 
- for x in xrange(nodes):
   - print 2 ** 127 / nodes * x
 

Yes use that logic to get the tokens. I think it's safe to run move first
and reair later. You are moving some nodes data as is so it's no worse than
what you have right now.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
okay, thanks Aaron!

On Mon, Aug 1, 2011 at 5:43 AM, aaron morton aa...@thelastpickle.comwrote:

 aaron suggested it's better to run node repair on every node then
 re-balance it.


 That's me been cautious with other peoples data.

 It looks like node 3 is overwhelmed. Try getting the move sorted.

 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 1 Aug 2011, at 05:48, Yan Chunlu wrote:

 is that okay to do nodetool move before a completely repair?

 using this equation?
 def tokens(nodes):

- for x in xrange(nodes):
   - print 2 ** 127 / nodes * x


 On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote:

 First run nodetool move and then you can run nodetool repair. Before you
 run
 nodetool move you will need to determine tokens that each node will be
 responsible for. Then use that token to perform move.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.






Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
thanks a lot! I will try the move.

On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote:


 springrider wrote:
 
  is that okay to do nodetool move before a completely repair?
 
  using this equation?
  def tokens(nodes):
 
 - for x in xrange(nodes):
- print 2 ** 127 / nodes * x
 

 Yes use that logic to get the tokens. I think it's safe to run move first
 and reair later. You are moving some nodes data as is so it's no worse than
 what you have right now.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
I have three nodes and RF=3.here is the current ring:


Address Status State Load Owns Token

84944475733633104818662955375549269696
node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


it is very un-balanced and I would like to re-balance it using
nodetool move asap. unfortunately I haven't been run node repair for
a long time.

aaron suggested it's better to run node repair on every node then re-balance it.


problem is the node3 is in heavy-load currently, and the entire
cluster slow down if I start doing node repair. I have to
disablegossip and disablethrift to stop the repair.

only cassandra running on that server and I have no idea what it was
doing. the cpu load is about 20+ currently. compcationstats and
netstats shows it was not doing anything.

I have change client to not to connect to node3, but still, it seems
in heavy load and io utils is 100%.


the log seems normal(although not sure what about the Dropped read
message thing):

 INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
2563726360 used; max is 4248829952
 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
 INFO 13:21:38,560 Pool NameActive   Pending
 INFO 13:21:38,560 ReadStage 8  7555
 INFO 13:21:38,561 RequestResponseStage  0 0
 INFO 13:21:38,561 ReadRepairStage   0 0



is there anyway to tell what node3 was doing? or at least is there any
way to make it not slowdown the whole cluster?


Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Frank Duan
Dropped read message might be an indicator of capacity issue. We
experienced the similar issue with 0.7.6.

We ended up adding two extra nodes and physically rebooted the offending
node(s).

The entire cluster then calmed down.

On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.com wrote:

 I have three nodes and RF=3.here is the current ring:


 Address Status State Load Owns Token

 84944475733633104818662955375549269696
 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


 it is very un-balanced and I would like to re-balance it using
 nodetool move asap. unfortunately I haven't been run node repair for
 a long time.

 aaron suggested it's better to run node repair on every node then
 re-balance it.


 problem is the node3 is in heavy-load currently, and the entire
 cluster slow down if I start doing node repair. I have to
 disablegossip and disablethrift to stop the repair.

 only cassandra running on that server and I have no idea what it was
 doing. the cpu load is about 20+ currently. compcationstats and
 netstats shows it was not doing anything.

 I have change client to not to connect to node3, but still, it seems
 in heavy load and io utils is 100%.


 the log seems normal(although not sure what about the Dropped read
 message thing):

  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
 2563726360 used; max is 4248829952
  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
  INFO 13:21:38,560 Pool NameActive   Pending
  INFO 13:21:38,560 ReadStage 8  7555
  INFO 13:21:38,561 RequestResponseStage  0 0
  INFO 13:21:38,561 ReadRepairStage   0 0



 is there anyway to tell what node3 was doing? or at least is there any
 way to make it not slowdown the whole cluster?




-- 
Frank Duan
aiMatch
fr...@aimatch.com
c: 703.869.9951
www.aiMatch.com


Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
add new nodes seems added more pressure  to the cluster?  how about your
data size?

On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote:

 Dropped read message might be an indicator of capacity issue. We
 experienced the similar issue with 0.7.6.

 We ended up adding two extra nodes and physically rebooted the offending
 node(s).

 The entire cluster then calmed down.

 On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.com wrote:

 I have three nodes and RF=3.here is the current ring:


 Address Status State Load Owns Token

 84944475733633104818662955375549269696
 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


 it is very un-balanced and I would like to re-balance it using
 nodetool move asap. unfortunately I haven't been run node repair for
 a long time.

 aaron suggested it's better to run node repair on every node then
 re-balance it.


 problem is the node3 is in heavy-load currently, and the entire
 cluster slow down if I start doing node repair. I have to
 disablegossip and disablethrift to stop the repair.

 only cassandra running on that server and I have no idea what it was
 doing. the cpu load is about 20+ currently. compcationstats and
 netstats shows it was not doing anything.

 I have change client to not to connect to node3, but still, it seems
 in heavy load and io utils is 100%.


 the log seems normal(although not sure what about the Dropped read
 message thing):

  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
 2563726360 used; max is 4248829952
  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
  INFO 13:21:38,560 Pool NameActive   Pending
  INFO 13:21:38,560 ReadStage 8  7555
  INFO 13:21:38,561 RequestResponseStage  0 0
  INFO 13:21:38,561 ReadRepairStage   0 0



 is there anyway to tell what node3 was doing? or at least is there any
 way to make it not slowdown the whole cluster?




 --
 Frank Duan
 aiMatch
 fr...@aimatch.com
 c: 703.869.9951
 www.aiMatch.com




Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
and by the way, my RF=3 and the other two nodes have much more capacity, why
does they always routed the request to node3?

coud I do a rebalance now? before node repair?

On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu springri...@gmail.com wrote:

 add new nodes seems added more pressure  to the cluster?  how about your
 data size?


 On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan fr...@aimatch.com wrote:

 Dropped read message might be an indicator of capacity issue. We
 experienced the similar issue with 0.7.6.

 We ended up adding two extra nodes and physically rebooted the offending
 node(s).

 The entire cluster then calmed down.

 On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.comwrote:

 I have three nodes and RF=3.here is the current ring:


 Address Status State Load Owns Token

 84944475733633104818662955375549269696
 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
 node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


 it is very un-balanced and I would like to re-balance it using
 nodetool move asap. unfortunately I haven't been run node repair for
 a long time.

 aaron suggested it's better to run node repair on every node then
 re-balance it.


 problem is the node3 is in heavy-load currently, and the entire
 cluster slow down if I start doing node repair. I have to
 disablegossip and disablethrift to stop the repair.

 only cassandra running on that server and I have no idea what it was
 doing. the cpu load is about 20+ currently. compcationstats and
 netstats shows it was not doing anything.

 I have change client to not to connect to node3, but still, it seems
 in heavy load and io utils is 100%.


 the log seems normal(although not sure what about the Dropped read
 message thing):

  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
 2563726360 used; max is 4248829952
  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
  INFO 13:21:38,560 Pool NameActive   Pending
  INFO 13:21:38,560 ReadStage 8  7555
  INFO 13:21:38,561 RequestResponseStage  0 0
  INFO 13:21:38,561 ReadRepairStage   0 0



 is there anyway to tell what node3 was doing? or at least is there any
 way to make it not slowdown the whole cluster?




 --
 Frank Duan
 aiMatch
 fr...@aimatch.com
 c: 703.869.9951
 www.aiMatch.com