One possibility: If you're overloading the cluster, replicas will drop updates to avoid OOMing. (This is logged at WARN level.) Before 1.x Cassandra would just let that slide, but with w/ 1.0 it started recording hints for those.
On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey <bryce.godf...@azaleos.com> wrote: > Thanks for the help so far. > > Is there any way to find out why my HintsColumnFamily is so large now, since > it wasn't this way before the upgrade and it seems to just climbing? > > I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() > thinking I have a bunch of stale hints from upgrade issues, but it just > eventually times out. Plus the node it gets invoked against gets thrashed > and stops responding, forcing me to restart cassandra. > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Thursday, November 03, 2011 5:06 PM > To: user@cassandra.apache.org > Subject: Re: Problem after upgrade to 1.0.1 > > I found the problem and posted a patch on > https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that > patch and rerun scrub the exception should go away. > > On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey <bryce.godf...@azaleos.com> > wrote: >> A restart fixed the load numbers, they are back to where I expect them to be >> now, but disk utilization is double the load #. I'm also still get the >> cfstats exception from any node. >> >> -----Original Message----- >> From: Jonathan Ellis [mailto:jbel...@gmail.com] >> Sent: Thursday, November 03, 2011 11:52 AM >> To: user@cassandra.apache.org >> Subject: Re: Problem after upgrade to 1.0.1 >> >> Does restarting the node fix this? >> >> On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey <bryce.godf...@azaleos.com> >> wrote: >>> Disk utilization is actually about 80% higher than what is reported >>> for nodetool ring across all my nodes on the data drive >>> >>> >>> >>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: >>> 206.926.1978 | M: 206.849.2477 >>> >>> >>> >>> From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] >>> Sent: Thursday, November 03, 2011 11:47 AM >>> To: user@cassandra.apache.org >>> Subject: RE: Problem after upgrade to 1.0.1 >>> >>> >>> >>> Regarding load growth, presumably you are referring to the load as >>> reported by JMX/nodetool. Have you actually looked at the disk >>> utilization on the nodes themselves? Potential issue I have seen: >>> http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html >>> >>> >>> >>> Dan >>> >>> >>> >>> From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] >>> Sent: November-03-11 14:40 >>> To: user@cassandra.apache.org >>> Subject: Problem after upgrade to 1.0.1 >>> >>> >>> >>> I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go >>> just fine with the rolling upgrade. But now I'm having extreme load >>> growth on one of my nodes (and others are growing faster than usual >>> also). I attempted to run a cfstats against the extremely large node >>> that was seeing 2x the load of others and I get this error below. >>> I'm also went into the o.a.c.db.HintedHandoffManager mbean and >>> attempted to list pending hints to see if it was growing out of >>> control for some reason, but that just times out eventually for any node. >>> I'm not sure what to do next with this issue. >>> >>> >>> >>> Column Family: HintsColumnFamily >>> >>> SSTable count: 3 >>> >>> Space used (live): 12681676437 >>> >>> Space used (total): 10233130272 >>> >>> Number of Keys (estimate): 384 >>> >>> Memtable Columns Count: 117704 >>> >>> Memtable Data Size: 115107307 >>> >>> Memtable Switch Count: 66 >>> >>> Read Count: 0 >>> >>> Read Latency: NaN ms. >>> >>> Write Count: 21203290 >>> >>> Write Latency: 0.014 ms. >>> >>> Pending Tasks: 0 >>> >>> Key cache capacity: 3 >>> >>> Key cache size: 0 >>> >>> Key cache hit rate: NaN >>> >>> Row cache: disabled >>> >>> Compacted row minimum size: 30130993 >>> >>> Compacted row maximum size: 9223372036854775807 >>> >>> Exception in thread "main" java.lang.IllegalStateException: Unable to >>> compute ceiling for max when histogram overflowed >>> >>> at >>> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. >>> java:170) >>> >>> at >>> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3 >>> 9 >>> 5) >>> >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily >>> S >>> tore.java:293) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>> j >>> ava:39) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>> s >>> orImpl.java:25) >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> >>> at >>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB >>> e >>> anIntrospector.java:93) >>> >>> at >>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB >>> e >>> anIntrospector.java:27) >>> >>> at >>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j >>> a >>> va:208) >>> >>> at >>> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6 >>> 5 >>> ) >>> >>> at >>> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2 >>> 1 >>> 6) >>> >>> at >>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(De >>> f >>> aultMBeanServerInterceptor.java:666) >>> >>> at >>> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.ja >>> v >>> a:638) >>> >>> at >>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnecti >>> o >>> nImpl.java:1404) >>> >>> at >>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectio >>> n >>> Impl.java:72) >>> >>> at >>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run >>> ( >>> RMIConnectionImpl.java:1265) >>> >>> at >>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(R >>> M >>> IConnectionImpl.java:1360) >>> >>> at >>> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnect >>> i >>> onImpl.java:600) >>> >>> at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown >>> Source) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>> s >>> orImpl.java:25) >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> >>> at >>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) >>> >>> at sun.rmi.transport.Transport$1.run(Transport.java:159) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at >>> sun.rmi.transport.Transport.serviceCall(Transport.java:155) >>> >>> at >>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:5 >>> 3 >>> 5) >>> >>> at >>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTranspor >>> t >>> .java:790) >>> >>> at >>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport. >>> java:649) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec >>> u >>> tor.java:886) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. >>> java:908) >>> >>> at java.lang.Thread.run(Thread.java:662) >>> >>> >>> >>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: >>> 206.926.1978 | M: 206.849.2477 >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 9.0.920 / Virus Database: 271.1.1/3993 - Release Date: >>> 11/03/11 >>> 03:39:00 >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com