RE: Problem after upgrade to 1.0.1

2011-11-08 Thread Bryce Godfrey
I have no errors in my system.log just these typs of warnings occasionally:
WARN [pool-1-thread-1] 2011-11-08 00:03:44,726 Memtable.java (line 167) setting 
live ratio to minimum of 1.0 instead of 0.9511448007676252

I did find the problem with my data drive consumption being so large, as I did 
not know that running scrub after the upgrade would take a snapshot of the 
data.  Once I removed all the snapshots, they data drive is back down to where 
I expect it to be.  Although the Load numbers reported by ring are much larger 
then what is in the data drive.

I've also upgrade to 1.0.2 and re-ran scrub, and now I can run cfstats again, 
so thanks for that.  Although I'm still confused on why the hints CF has become 
so large on a few of the nodes;

Column Family: HintsColumnFamily
SSTable count: 11
Space used (live): 127490858389
Space used (total): 72123363085
Number of Keys (estimate): 1408
Memtable Columns Count: 43174
Memtable Data Size: 44376138
Memtable Switch Count: 103
Read Count: 494
Read Latency: NaN ms.
Write Count: 30970531
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 14
Key cache size: 10
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 88149
Compacted row maximum size: 53142810146
Compacted row mean size: 6065512727



-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, November 04, 2011 9:29 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

One possibility: If you're overloading the cluster, replicas will drop updates 
to avoid OOMing.  (This is logged at WARN level.)  Before 1.x Cassandra would 
just let that slide, but with w/ 1.0 it started recording hints for those.

On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Thanks for the help so far.

 Is there any way to find out why my HintsColumnFamily is so large now, since 
 it wasn't this way before the upgrade and it seems to just climbing?

 I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() 
 thinking I have a bunch of stale hints from upgrade issues, but it just 
 eventually times out.  Plus the node it gets invoked against gets thrashed 
 and stops responding, forcing me to restart cassandra.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 5:06 PM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 I found the problem and posted a patch on 
 https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build with that 
 patch and rerun scrub the exception should go away.

 On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large 
 node that was seeing 2x the load of others and I get this error below.
 I'm also went into the o.a.c.db.HintedHandoffManager mbean and 
 attempted to list pending hints to see if it was growing out of 
 control for some reason, but that just times out eventually for any node.  
 I'm not sure what to do next with this issue

Re: Problem after upgrade to 1.0.1

2011-11-04 Thread Jonathan Ellis
One possibility: If you're overloading the cluster, replicas will drop
updates to avoid OOMing.  (This is logged at WARN level.)  Before 1.x
Cassandra would just let that slide, but with w/ 1.0 it started
recording hints for those.

On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Thanks for the help so far.

 Is there any way to find out why my HintsColumnFamily is so large now, since 
 it wasn't this way before the upgrade and it seems to just climbing?

 I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() 
 thinking I have a bunch of stale hints from upgrade issues, but it just 
 eventually times out.  Plus the node it gets invoked against gets thrashed 
 and stops responding, forcing me to restart cassandra.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 5:06 PM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 I found the problem and posted a patch on 
 https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build with that 
 patch and rerun scrub the exception should go away.

 On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as
 reported by JMX/nodetool. Have you actually looked at the disk
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go
 just fine with the rolling upgrade.  But now I'm having extreme load
 growth on one of my nodes (and others are growing faster than usual
 also).  I attempted to run a cfstats against the extremely large node
 that was seeing 2x the load of others and I get this error below.
 I'm also went into the o.a.c.db.HintedHandoffManager mbean and
 attempted to list pending hints to see if it was growing out of
 control for some reason, but that just times out eventually for any node.  
 I'm not sure what to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3
 9
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily
 S
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
 j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 s
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:27

Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine 
with the rolling upgrade.  But now I'm having extreme load growth on one of my 
nodes (and others are growing faster than usual also).  I attempted to run a 
cfstats against the extremely large node that was seeing 2x the load of others 
and I get this error below.  I'm also went into the 
o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see 
if it was growing out of control for some reason, but that just times out 
eventually for any node.  I'm not sure what to do next with this issue.

   Column Family: HintsColumnFamily
SSTable count: 3
Space used (live): 12681676437
Space used (total): 10233130272
Number of Keys (estimate): 384
Memtable Columns Count: 117704
Memtable Data Size: 115107307
Memtable Switch Count: 66
Read Count: 0
Read Latency: NaN ms.
Write Count: 21203290
Write Latency: 0.014 ms.
Pending Tasks: 0
Key cache capacity: 3
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 30130993
Compacted row maximum size: 9223372036854775807
Exception in thread main java.lang.IllegalStateException: Unable to compute 
ceiling for max when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
at 
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
at 
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477



Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
Just to rule it out: you didn't do anything tricky like update
HintsColumnFamily to use compression?

On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just
 fine with the rolling upgrade.  But now I’m having extreme load growth on
 one of my nodes (and others are growing faster than usual also).  I
 attempted to run a cfstats against the extremely large node that was seeing
 2x the load of others and I get this error below.  I’m also went into the
 o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to
 see if it was growing out of control for some reason, but that just times
 out eventually for any node.  I’m not sure what to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

     at sun.rmi.transport.Transport$1.run(Transport.java:159)

     at java.security.AccessController.doPrivileged(Native Method)

     at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

     at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)

     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

     at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

     at java.lang.Thread.run(Thread.java:662)



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Dan Hendry
Regarding load growth, presumably you are referring to the load as reported
by JMX/nodetool. Have you actually looked at the disk utilization on the
nodes themselves? Potential issue I have seen:
http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html

 

Dan

 

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
Sent: November-03-11 14:40
To: user@cassandra.apache.org
Subject: Problem after upgrade to 1.0.1

 

I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just
fine with the rolling upgrade.  But now I'm having extreme load growth on
one of my nodes (and others are growing faster than usual also).  I
attempted to run a cfstats against the extremely large node that was seeing
2x the load of others and I get this error below.  I'm also went into the
o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to
see if it was growing out of control for some reason, but that just times
out eventually for any node.  I'm not sure what to do next with this issue.

 

   Column Family: HintsColumnFamily

SSTable count: 3

Space used (live): 12681676437

Space used (total): 10233130272

Number of Keys (estimate): 384

Memtable Columns Count: 117704

Memtable Data Size: 115107307

Memtable Switch Count: 66

Read Count: 0

Read Latency: NaN ms.

Write Count: 21203290

Write Latency: 0.014 ms.

Pending Tasks: 0

Key cache capacity: 3

Key cache size: 0

Key cache hit rate: NaN

Row cache: disabled

Compacted row minimum size: 30130993

Compacted row maximum size: 9223372036854775807

Exception in thread main java.lang.IllegalStateException: Unable to
compute ceiling for max when histogram overflowed

at
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:1
70)

at
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)

at
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.j
ava:293)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr
ospector.java:93)

at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntr
ospector.java:27)

at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208
)

at
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)

at
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)

at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMB
eanServerInterceptor.java:666)

at
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)

at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.
java:1404)

at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.j
ava:72)

at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMICon
nectionImpl.java:1265)

at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConne
ctionImpl.java:1360)

at
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl
.java:600)

at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

at sun.rmi.transport.Transport$1.run(Transport.java:159)

at java.security.AccessController.doPrivileged(Native Method)

at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)

at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:
790)

at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:6
49)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

at java.lang.Thread.run(Thread.java:662)

 

Bryce Godfrey | Sr. Software Engineer |  http://www.azaleos.com/ Azaleos
Corporation | T: 206.926.1978 | M: 206.849.2477

 

No virus found in this incoming message.
Checked by AVG

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Nope.  I did alter two of my own column families to use Leveled compaction and 
then ran scrub on each node, is the only change I have made from the upgrade.

Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | 
M: 206.849.2477

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 11:44 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

Just to rule it out: you didn't do anything tricky like update 
HintsColumnFamily to use compression?

On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  I'm 
 also went into the o.a.c.db.HintedHandoffManager mbean and attempted 
 to list pending hints to see if it was growing out of control for some 
 reason, but that just times out eventually for any node.  I'm not sure what 
 to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21
 6)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def
 aultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav
 a:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio
 nImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection
 Impl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(
 RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM
 IConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti
 onImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown 
 Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

     at sun.rmi.transport.Transport$1.run(Transport.java:159)

     at java.security.AccessController.doPrivileged(Native Method)

     at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

     at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:53
 5)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport
 .java:790)

     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.
 java:649)

     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
 tor.java:886

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Disk utilization is actually about 80% higher than what is reported for 
nodetool ring across all my nodes on the data drive

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/ | T: 206.926.1978 | M: 206.849.2477

From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
Sent: Thursday, November 03, 2011 11:47 AM
To: user@cassandra.apache.org
Subject: RE: Problem after upgrade to 1.0.1

Regarding load growth, presumably you are referring to the load as reported by 
JMX/nodetool. Have you actually looked at the disk utilization on the nodes 
themselves? Potential issue I have seen: 
http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html

Dan

From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com]
Sent: November-03-11 14:40
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Problem after upgrade to 1.0.1

I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine 
with the rolling upgrade.  But now I'm having extreme load growth on one of my 
nodes (and others are growing faster than usual also).  I attempted to run a 
cfstats against the extremely large node that was seeing 2x the load of others 
and I get this error below.  I'm also went into the 
o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to see 
if it was growing out of control for some reason, but that just times out 
eventually for any node.  I'm not sure what to do next with this issue.

   Column Family: HintsColumnFamily
SSTable count: 3
Space used (live): 12681676437
Space used (total): 10233130272
Number of Keys (estimate): 384
Memtable Columns Count: 117704
Memtable Data Size: 115107307
Memtable Switch Count: 66
Read Count: 0
Read Latency: NaN ms.
Write Count: 21203290
Write Latency: 0.014 ms.
Pending Tasks: 0
Key cache capacity: 3
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 30130993
Compacted row maximum size: 9223372036854775807
Exception in thread main java.lang.IllegalStateException: Unable to compute 
ceiling for max when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
at 
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
at 
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run

Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
Does restarting the node fix this?

On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Disk utilization is actually about 80% higher than what is reported for
 nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as reported
 by JMX/nodetool. Have you actually looked at the disk utilization on the
 nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just
 fine with the rolling upgrade.  But now I’m having extreme load growth on
 one of my nodes (and others are growing faster than usual also).  I
 attempted to run a cfstats against the extremely large node that was seeing
 2x the load of others and I get this error below.  I’m also went into the
 o.a.c.db.HintedHandoffManager mbean and attempted to list pending hints to
 see if it was growing out of control for some reason, but that just times
 out eventually for any node.  I’m not sure what to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)

     at sun.rmi.transport.Transport$1.run(Transport.java:159)

     at java.security.AccessController.doPrivileged(Native Method)

     at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

     at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
A restart fixed the load numbers, they are back to where I expect them to be 
now, but disk utilization is double the load #.  I'm also still get the cfstats 
exception from any node.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 11:52 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

Does restarting the node fix this?

On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  I'm 
 also went into the o.a.c.db.HintedHandoffManager mbean and attempted 
 to list pending hints to see if it was growing out of control for some 
 reason, but that just times out eventually for any node.  I'm not sure what 
 to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21
 6)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def
 aultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav
 a:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio
 nImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection
 Impl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(
 RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM
 IConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnecti
 onImpl.java:600)

     at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown 
 Source)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke

Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
I found the problem and posted a patch on
https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build
with that patch and rerun scrub the exception should go away.

On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as
 reported by JMX/nodetool. Have you actually looked at the disk
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go
 just fine with the rolling upgrade.  But now I'm having extreme load
 growth on one of my nodes (and others are growing faster than usual
 also).  I attempted to run a cfstats against the extremely large node
 that was seeing 2x the load of others and I get this error below.  I'm
 also went into the o.a.c.db.HintedHandoffManager mbean and attempted
 to list pending hints to see if it was growing out of control for some
 reason, but that just times out eventually for any node.  I'm not sure what 
 to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:39
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyS
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBe
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.ja
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:21
 6)

     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Def
 aultMBeanServerInterceptor.java:666)

     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.jav
 a:638)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectio
 nImpl.java:1404)

     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnection
 Impl.java:72)

     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(
 RMIConnectionImpl.java:1265)

     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RM
 IConnectionImpl.java:1360)

     at
 javax.management.remote.rmi.RMIConnectionImpl.getAttribute

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Thanks for the help so far.  

Is there any way to find out why my HintsColumnFamily is so large now, since it 
wasn't this way before the upgrade and it seems to just climbing?  

I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking 
I have a bunch of stale hints from upgrade issues, but it just eventually times 
out.  Plus the node it gets invoked against gets thrashed and stops responding, 
forcing me to restart cassandra.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, November 03, 2011 5:06 PM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

I found the problem and posted a patch on 
https://issues.apache.org/jira/browse/CASSANDRA-3451.  If you build with that 
patch and rerun scrub the exception should go away.

On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:
 A restart fixed the load numbers, they are back to where I expect them to be 
 now, but disk utilization is double the load #.  I'm also still get the 
 cfstats exception from any node.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, November 03, 2011 11:52 AM
 To: user@cassandra.apache.org
 Subject: Re: Problem after upgrade to 1.0.1

 Does restarting the node fix this?

 On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 Disk utilization is actually about 80% higher than what is reported 
 for nodetool ring across all my nodes on the data drive



 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
 206.926.1978 | M: 206.849.2477



 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com]
 Sent: Thursday, November 03, 2011 11:47 AM
 To: user@cassandra.apache.org
 Subject: RE: Problem after upgrade to 1.0.1



 Regarding load growth, presumably you are referring to the load as 
 reported by JMX/nodetool. Have you actually looked at the disk 
 utilization on the nodes themselves? Potential issue I have seen:
 http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html



 Dan



 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: November-03-11 14:40
 To: user@cassandra.apache.org
 Subject: Problem after upgrade to 1.0.1



 I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
 just fine with the rolling upgrade.  But now I'm having extreme load 
 growth on one of my nodes (and others are growing faster than usual 
 also).  I attempted to run a cfstats against the extremely large node 
 that was seeing 2x the load of others and I get this error below.  
 I'm also went into the o.a.c.db.HintedHandoffManager mbean and 
 attempted to list pending hints to see if it was growing out of 
 control for some reason, but that just times out eventually for any node.  
 I'm not sure what to do next with this issue.



    Column Family: HintsColumnFamily

     SSTable count: 3

     Space used (live): 12681676437

     Space used (total): 10233130272

     Number of Keys (estimate): 384

     Memtable Columns Count: 117704

     Memtable Data Size: 115107307

     Memtable Switch Count: 66

     Read Count: 0

     Read Latency: NaN ms.

     Write Count: 21203290

     Write Latency: 0.014 ms.

     Pending Tasks: 0

     Key cache capacity: 3

     Key cache size: 0

     Key cache hit rate: NaN

     Row cache: disabled

     Compacted row minimum size: 30130993

     Compacted row maximum size: 9223372036854775807

 Exception in thread main java.lang.IllegalStateException: Unable to 
 compute ceiling for max when histogram overflowed

     at
 org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
 java:170)

     at
 org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3
 9
 5)

     at
 org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily
 S
 tore.java:293)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
 Method)

     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
 j
 ava:39)

     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 s
 orImpl.java:25)

     at java.lang.reflect.Method.invoke(Method.java:597)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:93)

     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
 e
 anIntrospector.java:27)

     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j
 a
 va:208)

     at
 com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6
 5
 )

     at
 com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2
 1
 6