[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113715#comment-15113715 ] Navjyot Nishant commented on CASSANDRA-8028: Hi All, We are getting similar issue while autocompaction is running on few of our nodes. Following is the error being logged, can someone please suggest what is causing this and how to resolve it? We use Cassandra 2.1.9. Please let me know if further information is required. Error: ERROR [CompactionExecutor:3] 2016-01-23 11:54:50,198 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:3,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:203) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1987) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:370) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:96) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:179) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > Unable to compute when histogram overflowed > --- > > Key: CASSANDRA-8028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Linux >Reporter: Gianluca Borello >Assignee: Carl Yeksigian > Fix For: 2.1.3 > > Attachments: 8028-2.1-clean.txt, 8028-2.1-v2.txt, 8028-2.1.txt, > 8028-trunk.txt, sstable-histogrambuster.tar.bz2 > > > It seems like with 2.1.0 histograms can't be computed most of the times: > $ nodetool cfhistograms draios top_files_by_agent1 > nodetool: Unable to compute when histogram overflowed > See 'nodetool help' or 'nodetool help '. > I can probably find a way to attach a .cql script to reproduce it, but I > suspect it must be obvious to replicate it as it happens on more than 50% of > my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113748#comment-15113748 ] Navjyot Nishant commented on CASSANDRA-8028: I have created https://issues.apache.org/jira/browse/CASSANDRA-11063 to track this issue. > Unable to compute when histogram overflowed > --- > > Key: CASSANDRA-8028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Linux >Reporter: Gianluca Borello >Assignee: Carl Yeksigian > Fix For: 2.1.3 > > Attachments: 8028-2.1-clean.txt, 8028-2.1-v2.txt, 8028-2.1.txt, > 8028-trunk.txt, sstable-histogrambuster.tar.bz2 > > > It seems like with 2.1.0 histograms can't be computed most of the times: > $ nodetool cfhistograms draios top_files_by_agent1 > nodetool: Unable to compute when histogram overflowed > See 'nodetool help' or 'nodetool help '. > I can probably find a way to attach a .cql script to reproduce it, but I > suspect it must be obvious to replicate it as it happens on more than 50% of > my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199482#comment-14199482 ] Joshua McKenzie commented on CASSANDRA-8028: It looks like if rowSizeHist or columnCountHist overflow this skips populating the percentiles of both regardless of whether the other was in bounds. Is there a reason we clobber the other even if it didn't overflow? Other than that, LGTM. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.2 Attachments: 8028-2.1-clean.txt, 8028-2.1.txt, 8028-trunk.txt, sstable-histogrambuster.tar.bz2 It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185674#comment-14185674 ] Joshua McKenzie commented on CASSANDRA-8028: [~carlyeks]: do you have a stress syntax handy that'll reproduce the problem? Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.2 Attachments: 8028-2.1.txt, 8028-trunk.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185685#comment-14185685 ] Joshua McKenzie commented on CASSANDRA-8028: Also - 8028-trunk isn't applying cleanly to 2.1 or trunk for me nor to trunk on the commit just prior to its creation. bq. error: patch failed: src/java/org/apache/cassandra/tools/NodeTool.java:908 Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.2 Attachments: 8028-2.1.txt, 8028-trunk.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181904#comment-14181904 ] Brandon Williams commented on CASSANDRA-8028: - bq. This will still be a breaking change which should wait until 3.0. nodetool currently expects the size of the buckets to always be 90, and will fail an assertion if we make changes on the server to send the full histogram to the client. But we don't need to allow mixing an older nodetool with a newer server or vice versa, so can't we change them both in 2.1? Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.2 Attachments: 8028-2.1.txt, 8028-trunk.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181950#comment-14181950 ] Carl Yeksigian commented on CASSANDRA-8028: --- That works for me then. The patch also applies to 2.1 cleanly. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.2 Attachments: 8028-2.1.txt, 8028-trunk.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174385#comment-14174385 ] Cameron Hatfield commented on CASSANDRA-8028: - Looks like this doesn't fully resolve the issue. According to running sstablemetadata on a 2.1.0 sstable file, as well as MetadataCollector.java: https://github.com/apache/cassandra/blob/8d8fed52242c34b477d0384ba1d1ce3978efbbe8/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java#L59 the sstable metadata persisted for these histograms are actually stored with a larger number of buckets then 90. The issue seems to be both the nodetool, https://github.com/apache/cassandra/blob/810c2d5fe64333c0bcfe0b2ed3ea2c8f6aaf89b7/src/java/org/apache/cassandra/tools/NodeTool.java#L892, as well as ColumnFamilyMetrics https://github.com/apache/cassandra/blob/ed1f39480606c95ff6595aad0aad9c1af7460f74/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java#L220 have a hardcoded value of 90. If that was raised, then we would be able to display the non-overflowed histograms stored in the metadata. Example output from sstablemetadata (notice that number of rows is 115 and 150, not 90 and 90) : [cameron@cass-db01 removed]$ sstablemetadata removed-removed-ka-33-Data.db SSTable: ./removed-removed-ka-33 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.01 Minimum timestamp: 1413408134518716 Maximum timestamp: 1413410874004562 SSTable max local deletion time: 2147483647 Compression ratio: 0.2157516194949938 Estimated droppable tombstones: 0.026257982293749805 SSTable Level: 0 Repaired at: 0 ReplayPosition(segmentId=1413409259260, position=15051162) Estimated tombstone drop times:%n 1413408139: 1647 1413408151: 2451 1413408165: 3151 1413408180: 3400 1413408199: 3027 1413408214: 2769 1413408228: 2064 1413408244: 1779 1413408261: 3817 1413408280: 7265 1413408302: 1911 1413408319: 1512 1413408337: 1582 1413408354: 1712 1413408375: 1577 1413408393: 2507 1413408411: 1410 1413408431: 761 1413408447: 507 1413408466: 2593 1413408483: 3840 1413408503: 1557 1413408523: 819 1413409632: 742 1413409646: 641 1413409662: 473 1413409684: 704 1413409700: 762 1413409716: 601 1413409728: 125 1413409744: 1190 1413409763: 1181 1413409783: 1768 1413409800: 1730 1413409820: 1326 1413409837: 1273 1413409856: 1299 1413409871: 2663 1413409887: 2197 1413409901: 1776 1413409917: 871 1413409934: 1449 1413409952: 1700 1413409969: 1301 1413409984: 2100 1413410002: 2103 1413410021: 1208 1413410039: 923 1413410052: 1425 1413410068: 1796 1413410081: 2263 1413410095: 2664 1413410110: 3019 1413410128: 2823 1413410146: 3801 1413410160: 3864 1413410175: 3252 1413410188: 8337 1413410204: 9375 1413410219: 6125 1413410235: 7954 1413410254: 11019 1413410271: 12703 1413410287: 12274 1413410303: 12199 1413410317: 10751 1413410330: 11369 1413410343: 10552 1413410355: 8157 1413410369: 8776 1413410384: 7504 1413410400: 7312 1413410418: 7472 1413410434: 7032 1413410448: 6338 1413410465: 5335 1413410484: 6427 1413410504: 7897 1413410523: 8515 1413410539: 4886 1413410557: 4847 1413410576: 4987 1413410591: 7630 1413410611: 8553 1413410628: 12157 1413410645: 12740 1413410663: 13756 1413410679: 19249 1413410695: 19374 1413410713: 15390 1413410732: 13493 1413410746: 13793 1413410760: 16937 1413410775: 19841 1413410791: 16595 1413410808: 19050 1413410823: 18450 1413410840: 22497 1413410861: 34027 1413410872:16 Count Row SizeCell Count 1 0 0 2 0 0 3 0 0 4 016 5 0 0 6 0 0 7 0 0 8 035 10 0 0 12 017 14 0 0 17 029 20 014 24 021 29 012 35 013 42 040 50 019 60 030 72 0
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158454#comment-14158454 ] Carl Yeksigian commented on CASSANDRA-8028: --- These metrics are captured per-sstable, it is possible that the change in behaviour is down to a compaction. The max partition size is already right at the boundary of not being representable by the histograms, so any difference in the sstables could be enough to change whether the histogram is overflowed or not. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158457#comment-14158457 ] Gianluca Borello commented on CASSANDRA-8028: - I know it's absolutely OT and I can perhaps post this to the mailing list instead, but I really have to ask: are we doing something wrong then? Should we make partitions much smaller? I've read from various different sources that having rows up to a few megabytes is totally acceptable, so that has become our rule of thumb when designing sharding keys for the partitions. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158502#comment-14158502 ] Carl Yeksigian commented on CASSANDRA-8028: --- As long as it isn't a single cell, this is fine. The issue you're running into is just that we haven't updated the histogram code to handle those bigger partitions yet. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156648#comment-14156648 ] Jonathan Ellis commented on CASSANDRA-8028: --- Does nodetool hardcode the EstimatedHistogram size it expects? Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156649#comment-14156649 ] Carl Yeksigian commented on CASSANDRA-8028: --- Yes, it uses the default of 90, so we can't just up the size and have nodetool understand it. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156737#comment-14156737 ] Gianluca Borello commented on CASSANDRA-8028: - How large partitions are we talking about here? The worrisome thing is that the command fails on a column family, and then, a few seconds later, it works, again on that same column family: First attempt: $ nodetool cfhistograms draios protobuf1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. Second attempt (after about 30 seconds): $ nodetool cfhistograms draios protobuf1 draios/protobuf1 histograms Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 0.00 18.60159.55 1955666 1597 75% 1.00 21.77364.55 4055269 3973 95% 1.00 33.11 10789.18 7007506 61214 98% 3.00 53.04 56822.90 8409007 61214 99% 4.00155.01 77205.61 8409007 61214 Min 0.00 7.11 58.23105779 87 Max 5.00 85449.58 189451.45 17436917 61214 There were no deletions in between, and the partitions don't seem that big to me, we try to keep them always under a few MBs. Unable to compute when histogram overflowed --- Key: CASSANDRA-8028 URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 Project: Cassandra Issue Type: Bug Components: Tools Environment: Linux Reporter: Gianluca Borello Assignee: Carl Yeksigian Fix For: 2.1.1 Attachments: 8028-2.1.txt It seems like with 2.1.0 histograms can't be computed most of the times: $ nodetool cfhistograms draios top_files_by_agent1 nodetool: Unable to compute when histogram overflowed See 'nodetool help' or 'nodetool help command'. I can probably find a way to attach a .cql script to reproduce it, but I suspect it must be obvious to replicate it as it happens on more than 50% of my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)