[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039297#comment-17039297 ] Jordan West commented on CASSANDRA-15213: - Updated the default and the commit message. [branch|https://github.com/jrwest/cassandra/commits/jwest/15213] > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15213-perf-branch, 15213-perf-trunk > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039092#comment-17039092 ] Benedict Elliott Smith commented on CASSANDRA-15213: [~jrwest] we discussed (somewhere) maybe reducing the default value to 2, at least initially? I don't mind using 4, but I think doubling the memory we use here is probably a good first stab at balancing improvement for larger users (who are ordinarily better able to reconfigure) against the cost for users with smaller nodes? Also, patch message formatting norms sign off: patch by A, B; reviewed by C, D for CASSANDRA-X This is particularly helpful if using awk for analysis, since this is the only standard we've ever followed almost consistently, so it's helpful to retain it. Otherwise LGTM > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15213-perf-branch, 15213-perf-trunk > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031972#comment-17031972 ] Jordan West commented on CASSANDRA-15213: - Was asked to run tlp-stress against this branch. After several runs I think at the cluster level its pretty much a wash between this branch and trunk (except the memory savings of this branch ofc). Added the output from one of the runs if folks want to look closer. Its worth noting: this was pretty small / limited hardware. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > Attachments: 15213-perf-branch, 15213-perf-trunk > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026770#comment-17026770 ] Jordan West commented on CASSANDRA-15213: - Incorporated your change, rebased/squashed, and pushed. Thanks [~benedict]. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026615#comment-17026615 ] Benedict Elliott Smith commented on CASSANDRA-15213: Awesome. I think we're ready. I've pushed one last set of minor suggestions, that _should_ permit C2 to produce branchless code for the calculation, at the expense only of values 0 through 8 being slower to compute (since these are likely extremely uncommon, this is probably preferable). I think it would still be possible to reduce the total work by a few instructions with some time to think, but probably not worth it. Entirely up to you if you prefer to use this suggestion, as it's not particularly important, and it's very hard to objectively determine the effect (since it will depend on branch predictor pollution). Once you've decided, I'll get this committed. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026466#comment-17026466 ] Jordan West commented on CASSANDRA-15213: - Pushed an update to the branch that includes prime distribution > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016077#comment-17016077 ] Benedict Elliott Smith commented on CASSANDRA-15213: Except if we use the cleaner {{stripedIndex}} calculation, we might need to go up to e.g. 8x stripes, in which case we'd need to throw {{23}} into the mix. That seems to take us up to 16x, with {{29}} taking us all the way to 64, which is way over provisioning stripes. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016076#comment-17016076 ] Jordan West commented on CASSANDRA-15213: - Sounds good. Thanks for the test / proof. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016070#comment-17016070 ] Benedict Elliott Smith commented on CASSANDRA-15213: Also fwiw, it looks like the primes 17 and 19 are sufficient, so we can literally just try either of those. Proof: {code} int[] primes = new int[] { 17, 19 }; BitSet sizeWithoutConflict = new BitSet(); for (int prime : primes) { for (int size = 1 ; size < 238 ; ++size) { BitSet conflict = new BitSet(); boolean hasConflict = false; for (int i = 0 ; i < size ; ++i) { if (conflict.get((i * prime) % size)) hasConflict = true; conflict.set((i * prime) % size); } if (!hasConflict) sizeWithoutConflict.set(size); } } for (int size = 1 ; size < 238 ; ++size) { if (!sizeWithoutConflict.get(size)) System.out.println(size); } {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055 ] Jordan West commented on CASSANDRA-15213: - Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve extra reads (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015512#comment-17015512 ] Benedict Elliott Smith commented on CASSANDRA-15213: fwiw, wrt bucket distribution, we should be able to do something as simple as multiplying by the smallest prime larger than two cache-lines (when multiplied by 8), that doesn't also divide the number of buckets. i.e., 17 (with verification it doesn't divide a custom number of buckets, and choosing another prime if it does) i.e. {code} stripedIndex(index, stripe) = ((index * prime) % (bucketOffsets.length + 1)) + ((bucketOffsets.length + 1) * stripe) or stripedIndex(index, stripe) = (((index * nStripes + stripe) * prime) % buckets.length()) {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014718#comment-17014718 ] Jordan West commented on CASSANDRA-15213: - (y) I'll add the configurable stripe count and look more into bucket distribution as well as better benchmarking > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014715#comment-17014715 ] Benedict Elliott Smith commented on CASSANDRA-15213: > but its encouraging we are seeing improved performance and memory usage with > the test we have and this branch (y) > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014713#comment-17014713 ] Jordan West commented on CASSANDRA-15213: - PEBKAC. I was rushing to run the bm between meetings and used the wrong branch. I am able to reproduce your numbers: {code} [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt5 39263498.481 ± 1809260.330 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt5 10473724.356 ± 137670.100 ops/s {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014704#comment-17014704 ] Benedict Elliott Smith commented on CASSANDRA-15213: Just to reiterate, though, it's still not a great test, and it might be worthwhile improving it first. Ideally we would have tighter control on the amount of competition we're measuring, since that's what we're interested in. We probably also do not want to visit the same values in the same order for every execution, as this can lead to increased synchrony in execution lowering overall throughput artificially. We also probably _do_ want to use the same random sequence on each run, to improve reproducibility. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014686#comment-17014686 ] Benedict Elliott Smith commented on CASSANDRA-15213: Interesting. I get: {code} addAndGet # Run progress: 0.00% complete, ETA 00:00:13 # Fork: 1 of 1 # Warmup Iteration 1: 39047588.027 ops/s # Warmup Iteration 2: 42432555.911 ops/s # Warmup Iteration 3: 42290231.202 ops/s Iteration 1: 42252484.876 ops/s Iteration 2: 41715818.895 ops/s Iteration 3: 40307394.422 ops/s Iteration 4: 40383945.073 ops/s Iteration 5: 38825162.703 ops/s Result "org.apache.cassandra.test.microbench.LatencyTrackingBench.benchInsertToDEHR": 40696961.194 ±(99.9%) 5170160.207 ops/s [Average] {code} {code} compareAndSet # Run progress: 0.00% complete, ETA 00:00:13 # Fork: 1 of 1 # Warmup Iteration 1: 18926088.938 ops/s # Warmup Iteration 2: 19504293.182 ops/s # Warmup Iteration 3: 19261074.598 ops/s Iteration 1: 19399456.839 ops/s Iteration 2: 18979470.788 ops/s Iteration 3: 18720138.076 ops/s Iteration 4: 18458839.475 ops/s Iteration 5: 18776539.883 ops/s Result "org.apache.cassandra.test.microbench.LatencyTrackingBench.benchInsertToDEHR": 18866889.012 ±(99.9%) 1351167.820 ops/s [Average] {code} i.e. twice the throughput with {{addAndGet}}. There is no material difference to my version: {code} public void updateBucket(AtomicLongArray buckets, int index, long value) { index = stripedIndex(index, (int) Thread.currentThread().getId() & (nStripes - 1)); buckets.addAndGet(index, value); } {code} It's possible it's an issue of JDK? Perhaps yours is not replacing {{addAndGet}} with the relevant {{lock xadd}} assembly? This seems pretty unlikely, I think this has happened for a long time, but you could try looking at the assembly that's produced. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014661#comment-17014661 ] Jordan West commented on CASSANDRA-15213: - Using {{addAndGet}} I see performance comparable to 3.0 and prior to CASSANDRA-14281. {codee} 15213 (4 core, 4 stripes, striping & linear search, addAndGet): [java] Benchmark Mode Cnt ScoreError Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt5 5742550.865 ± 256043.651 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt5 1979885.731 ± 117381.276 ops/s {/code} Here is how I implemented update bucket in case I missed something {code:java} public void updateBucket(AtomicLongArray buckets, int index, long value) { int stripe = (int) (Thread.currentThread().getId() & (nStripes - 1)); buckets.addAndGet(stripedIndex(index, stripe), value); } {/code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013139#comment-17013139 ] Benedict Elliott Smith commented on CASSANDRA-15213: I'm inclined to agree that a simple approach is best, and simply offering a tunable parameter to modify the number of stripes on startup might be sufficient. Here's some further minor feedback/suggestions: # It's faster to use {{addAndGet}} than {{compareAndSet}} because it's guaranteed to succeed - the increment occurs whilst holding the cache line in an exclusive state, so it will always execute as quickly as the first failed {{compareAndSet}} # Combined with this, it probably makes most sense to pick a "random" bucket using e.g. {{Thread.currentThread().getId() & (stripes-1)}}, to always increment # Finally, it might make sense to "shuffle" the ordinary buckets as well, so that logically adjacent buckets are not spatially adjacent; since a histogram is _likely_ to receive updates to mostly oscillating around a given median point, this means there won't be any false-sharing competition between two updates to two logically adjacent buckets I think at this point it then makes most sense to improve the benchmarks, in one case to account for memory latency of lookup, and in another to produce more realistic distributions of values. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012511#comment-17012511 ] Jordan West commented on CASSANDRA-15213: - I've updated [https://github.com/jrwest/cassandra/tree/jwest/15213] with the changes you suggested. Regarding the extra bucket, I realized that even more simply we could return {{bucketOffsets.length}} if the estimate is greater than or equal to it – saving us an access in this case altogether. Also added a very simple striping approach. The performance is about the same (the runs are within each others' margin of error) and still significantly better than before CASSANDRA-14281. From the memory consumption perspective, however, 4 stripes ends up allocating more than 12MiB after startup. This isn't surprising because with a single stripe (or pre CASSANDRA-14281) after startup was a little more than 3MiB and we're quadrupling the size of each array. However, these won't grow under contention like {{LongAdder[]}} would: memory consumption is on the order of the number of histograms created not number created and contention as before. I've found comparable performance using 2 stripes on my 4 core machine. I have an 8 core available to me I can test with tomorrow. If the memory consumption is still a concern I can investigate on-demand striping but I prefer the simpler approach and think the trade-offs of 2-4 stripes is reasonable. {code:java} 3.0 (4 core): [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 5848504.963 ± 147881.511 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 1946154.537 ± 623185.290 ops/s Trunk (4 core) [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 21887453.678 ± 2108481.285 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 8908817.316 ± 115453.642 ops/s 15213 (4 core, linear search changes only): [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 24646022.304 ± 1052105.818 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 9175928.594 ± 269984.204 ops/s 15213 (4 core, 4 stripes, striping and linear search): [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 18818181.576 ± 506997.366 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 8895569.814 ± 154219.113 ops/s {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012390#comment-17012390 ] Benedict Elliott Smith commented on CASSANDRA-15213: Looking good. A couple of minor suggestions: # The {{value <= 0}} check can be moved inside of the {{value <= 8}} check, since both will rarely be performed # Can {{(int) value - (bucketOffsets[0] == 0 ? 0 : 1)}} be simplified to {{(int) value - bucketOffsets[0]}} ? # {{value > bucketOffsets[bucketOffsets.length - 1]}} could be replaced by (e.g.) {{firstCandidate = min(firstCandidate,bucketOffsets.length-2)}}, which only requires touching the already-used header portion of the array, avoiding one potential cache-miss (though not one measured on the existing benchmarks I expect). As it stands, I think in the normal course of execution it's already reasonable to expect no mispredicted branches in the implementation you've put up, and these tweaks should improve the situation a little. I think it's fairly likely the final pair of return statements will be implemented as a {{cmov}}, but it would be interesting to take a look at the final assembly. When we have a final product maybe we can take a look. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012330#comment-17012330 ] Jordan West commented on CASSANDRA-15213: - My working branch is here: [https://github.com/jrwest/cassandra/tree/jwest/15213] I've implemented the binary search replacement. Didn't make an attempt at going branchless but even as implemented its faster than the existing implementation. Working on the striping / replacement of {{LongAdder[]}} and will share when ready. {code:java} Trunk: [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 21887453.678 ± 2108481.285 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 8908817.316 ± 115453.642 ops/s 15213 (linear search changes only): [java] Benchmark Mode Cnt Score Error Units [java] LatencyTrackingBench.benchInsertToDEHR thrpt 5 24646022.304 ± 1052105.818 ops/s [java] LatencyTrackingBench.benchLatencyMetricsWrite thrpt 5 9175928.594 ± 269984.204 ops/s {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012063#comment-17012063 ] Benedict Elliott Smith commented on CASSANDRA-15213: It might be the difference is whether or not we're including a zero bucket? Because the fixed offset we use probably needs to be incremented by one in this case, since there's no logarithm of zero. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011952#comment-17011952 ] Jordan West commented on CASSANDRA-15213: - Ok I figured out what was going on. My linked test was generating only large values and the approximation of only the last bucket is very inaccurate as value grows but also easy to account for. Fixing this the test confirms your findings (sometimes I see a distance of 4 indexes but still a huge improvement): Sample from 10k test run: {code:java} model = 80, estimate = 83 model = 83, estimate = 86 model = 81, estimate = 84 model = 82, estimate = 85 model = 69, estimate = 72 model = 79, estimate = 82 model = 81, estimate = 84 model = 79, estimate = 82 model = 75, estimate = 79 model = 83, estimate = 86 model = 77, estimate = 81 model = 72, estimate = 75 model = 84, estimate = 88 model = 61, estimate = 64 {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011774#comment-17011774 ] Benedict Elliott Smith commented on CASSANDRA-15213: bq. It looks to me like the approximation is always within precisely 2 or 3 of the offset FWIW, this appears to hold just fine up to {{long}} overflow > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011744#comment-17011744 ] Benedict Elliott Smith commented on CASSANDRA-15213: {quote}Fwiw, the proposed fast log code does improve performance of my local modifications when benchmarked but basically brings it back inline w/ the existing implementation (caveat: benchmarking issues already mentioned). {quote} Achieving equivalent performance in this synthetic benchmark is pretty much all we need, I think. This should predict a strict performance win once memory latency is taken into account, as our largest such array is ~1.2KiB, with 150 elements, leading to a predicted 7 memory stalls for binary search - although execution will proceed speculatively and may mask one or so of those stalls. If the approximation can achieve an optimal 1 memory stall in the worst case, with similar performance when the entire structure is in cache, it’s a win in my book. {quote}would result in several cases where a linear search would lead to over 10-50 accesses {quote} Interesting, I’m not sure what’s happening in that case. For comparison, I ran the code below {code:java} long[] offsets = newOffsets(160, true); for (int i = 0 ; i < offsets.length ; ++i) { long start = i == 0 ? 0 : offsets[i - 1] + 1; long end = offsets[i]; System.out.printf("%d %d %d %d %.1f %.1f %.1f %.1f\n", i, end, (int)fastLog12(start) - i, (int)fastLog12(end) - i, fastLog12(start), fastLog12(end), slowLog12(start), slowLog12(end)); } {code} Which yields the below results, with columns: # The index in the offsets array # The value associated with the bucket # The integer difference between fastLog(lowest) and this offset, where lowest is the smallest value we want to bucket at this offset # This integer difference between fastLog(highest) # fastLog(lowest) # fastLog(highest) # slowLog(lowest) # slowLog(highest) It looks to me like the approximation is always within precisely 2 or 3 of the offset, but _ahead_ by 2 or 3, because the array doesn’t follow a strict 1.2 logarithm given the first few buckets must increment by whole integers. So for values > 2, we can subtract 2 from the result and walk backwards. With this approach we’d only ever have to consult between 1 and 2 buckets, both adjacent, incurring only a single memory stall. If we wanted to be really clever this could be implemented branchlessly, so that this memory latency wouldn’t pause further execution, or invalidate the pipeline, only consume some reorder buffer slots and register file entries. {code:java} 0 0 0 0 0.0 0.0 -Infinity -Infinity 1 1 -1 -1 0.0 0.0 0.0 0.0 2 2 1 1 3.8 3.8 3.8 3.8 3 3 3 3 6.0 6.0 6.0 6.0 4 4 3 3 7.6 7.6 7.6 7.6 5 5 3 3 8.8 8.8 8.8 8.8 6 6 3 3 9.8 9.8 9.8 9.8 7 7 3 3 10.7 10.7 10.7 10.7 8 8 3 3 11.4 11.4 11.4 11.4 9 10 3 3 12.1 12.6 12.1 12.6 10 12 3 3 13.2 13.6 13.2 13.6 11 14 3 3 14.1 14.5 14.1 14.5 12 17 2 3 14.9 15.5 14.9 15.5 13 20 2 3 15.9 16.4 15.9 16.4 14 24 2 3 16.7 17.4 16.7 17.4 15 29 2 3 17.7 18.5 17.7 18.5 16 35 2 3 18.7 19.3 18.7 19.5 17 42 2 3 19.7 20.5 19.7 20.5 18 50 2 3 20.5 21.5 20.6 21.5 19 60 2 3 21.5 22.5 21.6 22.5 20 72 2 3 22.5 23.5 22.5 23.5 21 86 2 3 23.5 24.3 23.5 24.4 22 103 2 3 24.3 25.3 24.5 25.4 23 124 2 3 25.5 26.4 25.5 26.4 24 149 2 3 26.4 27.3 26.5 27.4 25 179 2 3 27.3 28.4 27.5 28.5 26 215 2 3 28.4 29.3 28.5 29.5 27 258 2 3 29.5 30.4 29.5 30.5 28 310 2 3 30.4 31.4 30.5 31.5 29 372 2 3 31.4 32.4 31.5 32.5 30 446 2 3 32.4 33.3 32.5 33.5 31 535 2 3 33.3 34.2 33.5 34.5 32 642 2 3 34.2 35.4 34.5 35.5 33 770 2 3 35.4 36.4 35.5 36.5 34 924 2 3 36.4 37.3 36.5 37.5 35 1109 2 3 37.3 38.4 37.5 38.5 36 1331 2 3 38.4 39.2 38.5 39.5 37 1597 2 3 39.2 40.2 39.5 40.5 38 1916 2 3 40.2 41.3 40.5 41.5 39 2299 2 3 41.3 42.2 41.5 42.5 40 2759 2 3 42.2 43.3 42.5 43.5 41 3311 2 3 43.3 44.3 43.5 44.5 42 3973 2 3 44.3 45.4 44.5 45.5 43 4768 2 3 45.4 46.3 45.5 46.5 44 5722 2 3 46.3 47.4 46.5 47.5 45 6866 2 3 47.4 48.3 47.5 48.5 46 8239 2 3 48.3 49.4 48.5 49.5 47 9887 2 3 49.4 50.4 49.5 50.5 48 11864 2 3 50.4 51.4 50.5 51.5 49 14237 2 3 51.4 52.3 51.5 52.5 50 17084 2 3 52.3 53.2 52.5 53.5 51 20501 2 3 53.2 54.4 53.5 54.5 52 24601 2 3 54.4 55.4 54.5 55.5 53 29521 2 3 55.4 56.3 55.5 56.5 54 35425 2 3 56.3 57.4 56.5 57.5 55 42510 2 3 57.4 58.3 57.5 58.5 56 51012 2 3 58.3 59.3 58.5 59.5 57 61214 2 3 59.3 60.3 59.5 60.5 58 73457 2 3 60.3 61.2 60.5 61.5 59 88148 2 3 61.2 62.3 61.5 62.5 60 105778 2 3 62.3 63.3 62.5 63.5 61 126934 2 3 63.3 64.3 63.5 64.5 62 152321 2 3 64.3 65.3 64.5 65.5 63 182785 2 3 65.3 66.4 65.5 66.5 64 219342 2 3 66.4 67.3 66.5 67.5 65 263210 2 3 67.3 68.4 67.5 68.5 66 315852 2 3 68.4 69.4 68.5 69.5 67 379022 2 3 69.4 70.4 69.5 70.5 68 454826 2 3 70.4 71.3 70.5 71.5 69 545791 2 3 71.3 72.2 71.5 72.5 70 654949 2 3 72.2 73.2 72.5 73.5 71 785939 2 3 73.2 74.2 73.5 74.5 72 943127 2 3 74.2 75.3 74.5 75.5 73 1131752 2 3
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011493#comment-17011493 ] Jordan West commented on CASSANDRA-15213: - Thanks for the clarification. I will take a stab at something to replace the usage of {{LongAdder[]}} along the lines of what you suggested and share when ready. I agree re: benchmarking and will try and come up with a more complete set -- clearly we used the existing benchmarks before without finding this issue. Regarding the binary search, I am still a bit confused. If you run: https://github.com/jrwest/cassandra/blob/jwest/15213/test/unit/org/apache/cassandra/metrics/DecayingEstimatedHistogramReservoirTest.java#L58 you will see that the proposed estimation (using the code provided) would result in several cases where a linear search would lead to 10-50 accesses (as value grows the approximation gets more inaccurate). Fwiw, the proposed fast log code does improve performance of my local modifications when benchmarked but basically brings it back inline w/ the existing implementation (caveat: benchmarking issues already mentioned). > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011029#comment-17011029 ] Benedict Elliott Smith commented on CASSANDRA-15213: FWIW, I think when benchmarking something like this you need to create a few hundred MiB worth of backing arrays, and cycle through them for each test. Or, at least, there are different ways to achieve this but you ideally want tests that include memory latency and this is a simple mechanism to achieve that. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010990#comment-17010990 ] Benedict Elliott Smith commented on CASSANDRA-15213: bq. Or are you suggesting increasing the bucket resolution bq. Do you mean having multiple buckets per offset? Right. I meant one option was to essentially inline the striping by introducing, e.g., 4x as many buckets as requested. Each logical bucket can be represented by the sum of that many buckets. The buckets would be offset from each other by at least two cache lines, and all buckets would ideally be more uniformly distributed in memory (by e.g. {{Integer.reverse(idx)}}), so that adjacent buckets that are most likely to be used together don't also compete for cache locks. bq. Additionally, re: your LongAdder reference, are you proposing creating new buckets for the same offset under contention (like LongAdder does w/ Cell)? I meant an alternative approach might be to batch a range of buckets together - say 16 or more, to amortise the object overhead - and to perform {{LongAdder}}’s inflation of the number of buckets on the detection of competition, so that we can gradually increase the number of cells needed. This might have the added benefit of not wasting space for striping of buckets that are rarely used, but at the cost of greater complexity, object overheads and indirection. bq. Regarding the binary search suggestion, are you suggesting using the approximation to replace the maximum index I meant to use it as the floor for a linear search for the correct position, so that it would ordinarily be 1 or 2 comparisons, but typically only one cache miss. Specifically, I meant to use an integer approximation, along the lines of... {code} private static final int TABLE_BITS = 4; private static final int TABLE_MASK = -1 >>> (32 - TABLE_BITS); private static final float[] LOG2_TABLE = computeTable(TABLE_BITS); private static final float log2_12 = (float) slowLog2(1.2d); private static final float log2_12_recp = (float) (1d / slowLog2(1.2d)); private static float[] computeTable(int bits) { float[] table = new float[1 << bits]; for (int i = 1 ; i < 1< DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010956#comment-17010956 ] Jordan West commented on CASSANDRA-15213: - I’ve verified the reported {{LongAdder}} memory consumption. I’m seeing 13.5MiB right after startup. A few questions about your proposed solutions? * By the backing array do you mean the original AtomicLongArray? Two issues I see with increasing the number of buckets: the maximum we can create are 237 (then the offset overflows) and increasing the number of buckets doesn’t necessarily reduce contention. This is because the resolution of the buckets doesn’t change except for the last bucket (which for microseconds is over 200 days and for nanoseconds is over 5 hours). Or are you suggesting increasing the bucket resolution by decreasing the factor in {{newOffsets}} when adding more buckets? * I’m not sure what you mean by “providing multiple buckets” but I suspect it might be what I am missing to better understand the proposal. Do you mean having multiple buckets per offset? Additionally, re: your {{LongAdder}} reference, are you proposing creating new buckets for the same offset under contention (like {{LongAdder}} does w/ {{Cell}})? * Regarding the binary search suggestion, are you suggesting using the approximation to replace the maximum index (e.g. binarySearch(offsets, 0, Math.floor(….) + 1, value)? Or to outright use that approximation? I wasn’t sure of your exact proposal but it does seem like using the equation you gave as an upper bound estimate and {{Math.floor(Math.log(Math.floor(n / 1.2)) / Math.log(1.2))}} as a lower bound estimate can reduce the space of the array we have to search by 25-60% (I saw up to 75% in one case with DEFAULT_BUCKET_COUNT but haven’t reproduced that high of a reduction since). I’ve added a QuickTheories test to show this here: [https://github.com/jrwest/cassandra/commit/a4795bc651cd0e0bd582a8df2414e312782b5020]. * We could also circumvent the search entirely in the lowest buckets by using {{if value < bucketCount && bucketOffsets[value - 1] == value}} although given our use case I’m not sure how often we will see sub 165 ns operations. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org