RE: Problem after upgrade to 1.0.1
I have no errors in my system.log just these typs of warnings occasionally: WARN [pool-1-thread-1] 2011-11-08 00:03:44,726 Memtable.java (line 167) setting live ratio to minimum of 1.0 instead of 0.9511448007676252 I did find the problem with my data drive consumption being so large, as I did not know that running scrub after the upgrade would take a snapshot of the data. Once I removed all the snapshots, they data drive is back down to where I expect it to be. Although the Load numbers reported by ring are much larger then what is in the data drive. I've also upgrade to 1.0.2 and re-ran scrub, and now I can run cfstats again, so thanks for that. Although I'm still confused on why the hints CF has become so large on a few of the nodes; Column Family: HintsColumnFamily SSTable count: 11 Space used (live): 127490858389 Space used (total): 72123363085 Number of Keys (estimate): 1408 Memtable Columns Count: 43174 Memtable Data Size: 44376138 Memtable Switch Count: 103 Read Count: 494 Read Latency: NaN ms. Write Count: 30970531 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 14 Key cache size: 10 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 88149 Compacted row maximum size: 53142810146 Compacted row mean size: 6065512727 -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, November 04, 2011 9:29 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 One possibility: If you're overloading the cluster, replicas will drop updates to avoid OOMing. (This is logged at WARN level.) Before 1.x Cassandra would just let that slide, but with w/ 1.0 it started recording hints for those. On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue
Re: Problem after upgrade to 1.0.1
One possibility: If you're overloading the cluster, replicas will drop updates to avoid OOMing. (This is logged at WARN level.) Before 1.x Cassandra would just let that slide, but with w/ 1.0 it started recording hints for those. On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3 9 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily S tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces s orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:27
Problem after upgrade to 1.0.1
I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Bryce Godfrey | Sr. Software Engineer | Azaleos Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477
Re: Problem after upgrade to 1.0.1
Just to rule it out: you didn't do anything tricky like update HintsColumnFamily to use compression? On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I’m having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I’m also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I’m not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: Problem after upgrade to 1.0.1
Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:1 70) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.j ava:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr ospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr ospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208 ) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMB eanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl. java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.j ava:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMICon nectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConne ctionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl .java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java: 790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:6 49) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Bryce Godfrey | Sr. Software Engineer | http://www.azaleos.com/ Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 No virus found in this incoming message. Checked by AVG
RE: Problem after upgrade to 1.0.1
Nope. I did alter two of my own column families to use Leveled compaction and then ran scrub on each node, is the only change I have made from the upgrade. Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:44 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Just to rule it out: you didn't do anything tricky like update HintsColumnFamily to use compression? On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21 6) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def aultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav a:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio nImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection Impl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run( RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM IConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti onImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:53 5) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport .java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport. java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:886
RE: Problem after upgrade to 1.0.1
Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
Re: Problem after upgrade to 1.0.1
Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I’m having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I’m also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I’m not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535
RE: Problem after upgrade to 1.0.1
A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21 6) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def aultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav a:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio nImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection Impl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run( RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM IConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti onImpl.java:600) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke
Re: Problem after upgrade to 1.0.1
I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21 6) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def aultMBeanServerInterceptor.java:666) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav a:638) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio nImpl.java:1404) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection Impl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run( RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM IConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute
RE: Problem after upgrade to 1.0.1
Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: Thursday, November 03, 2011 11:47 AM To: user@cassandra.apache.org Subject: RE: Problem after upgrade to 1.0.1 Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: November-03-11 14:40 To: user@cassandra.apache.org Subject: Problem after upgrade to 1.0.1 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x the load of others and I get this error below. I'm also went into the o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see if it was growing out of control for some reason, but that just times out eventually for any node. I'm not sure what to do next with this issue. Column Family: HintsColumnFamily SSTable count: 3 Space used (live): 12681676437 Space used (total): 10233130272 Number of Keys (estimate): 384 Memtable Columns Count: 117704 Memtable Data Size: 115107307 Memtable Switch Count: 66 Read Count: 0 Read Latency: NaN ms. Write Count: 21203290 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 30130993 Compacted row maximum size: 9223372036854775807 Exception in thread main java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. java:170) at org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3 9 5) at org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily S tore.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces s orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB e anIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j a va:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6 5 ) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2 1 6