What version are you on ? The error stack is from nodetool talking to the server. Check the logs on node 3 in DC2 for errors, it sounds like perhaps it to repair or did not complete.
You can monitor a repair by looking at: - nodetool compactionstats for a validation compaction - nodetool netstats for data transfers I would restart node 3 in dc2 as it may now how 2 repairs running. Then start the repair again and monitor it using the tools above. I'm not sure how many CF's you have but 2GB is not a lot of memory for the Heap, you may want to increase it. Also by default the key cache is enabled and set to 200k entries. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 2/10/2011, at 6:24 AM, Raj N wrote: > I had 3 nodes with strategy_options (DC1=3) in 1 DC. I added 1 more DC and 3 > more nodes. I didnt set the initial token. But I ran nodetool move on the new > nodes(adding 1 to the tokens of the nodes in DC1) . I updated the keyspace to > strategy_options (DC1=3, DC2=3). Then I started running nodetool repair on > each of the nodes. Before I started repair each node had around 5 GB of data. > I started on the new nodes. 2 of the nodes completed the repair in 2 hours > each. During the repair I saw the data to grow to almost 25 GB, but > eventually when the repair was done the data settled at around 9 GB. Is this > normal? The 3rd node has been running repair for a long time. It eventually > stopped throwing an exception - > Exception in thread "main" java.rmi.UnmarshalException: Error unmarshaling > return header; nested exception is: > java.io.EOFException > at > sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:209) > at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142) > at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) > at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown > Source) > at > javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993) > at > javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288) > at $Proxy0.forceTableRepair(Unknown Source) > at > org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:192) > at > org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:773) > at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:669) > Caused by: java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:250) > at > sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:195) > > I started repair again since its safe to do so. Now the GCInspector complains > of not enough heap - > WARN [ScheduledTasks:1] 2011-10-01 13:08:16,227 GCInspector.java (line 149) > Heap is 0.7598414264960864 full. You may need to reduce memtable and/or > cache sizes. Cassandra will now flush up to the two largest memtables to > free up memory. Adjust flush_largest_memtables_at threshold in > cassandra.yaml if you don't want Cassandra to do this automatically > INFO [ScheduledTasks:1] 2011-10-01 13:08:16,227 StorageService.java (line > 2398) Unable to reduce heap usage since there are no dirty column families > > nodetool ring shows 48GB of data on the node. > > My Xmx is 2G. I rely on OS caching more than Row caching or key caching. > Hence the column families are created with default settings. > > Any help would be appreciated. > > Thanks > -Raj
