Re: [PR] Redesign progress mechanisms to be memory efficient, use fewer messag… [cassandra-accord]

via GitHub Thu, 05 Sep 2024 11:31:40 -0700


belliottsmith commented on code in PR #113:
URL: https://github.com/apache/cassandra-accord/pull/113#discussion_r1745987410



##########
accord-core/src/main/java/accord/utils/LogGroupTimers.java:
##########
@@ -0,0 +1,557 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package accord.utils;
+
+import java.util.Comparator;
+import java.util.concurrent.TimeUnit;
+import java.util.function.BiConsumer;
+import java.util.function.Function;
+
+import accord.api.Scheduler;
+
+/**
+ * Basic idea is we collect timers in buckets that are 
logarithmically/exponentially spaced,
+ * with buckets nearer in time closer together (down to some minimum spacing).
+ *
+ * These buckets are disjoint, and are split on insert when they both exceed a 
certain size and are eligible
+ * to cover a smaller span due to the passing of time.
+ *
+ * A bucket becomes the current epoch once "now" truncated to the 
minBucketSpan is equal to the bucket's epoch.
+ * At this point, the bucket is heapified so that the entries may be visited 
in order. Prior to this point,
+ * insertions and deletions within a bucket are constant time.
+ *
+ * This design expects to have a maximum of log2(maxDelay)-K buckets, so 
bucket lookups are log(log(maxDelay)).
+ *
+ * This design permits log(log(maxDelay)) time insertion and removal for all 
items not in the nearest bucket, and log(K)
+ * for the nearest bucket, where K is the size of the current epoch's bucket.
+ *
+ * Given that we may split buckets logarithmically many times, amortised 
insertion time is logarithmic for entries
+ * that survive multiple bucket splits. However, due to the nature of these 
timer events (essentially timeouts), and
+ * that further out timers are grouped in exponentially larger buckets, we 
expect most entries to be inserted and deleted
+ * in constant time.
+ *
+ * TODO (desired): consider the case of repeatedly splitting the nearest 
bucket, as can maybe lead to complexity between
+ *  n.lg(n) and n^2. In the worst case every item is in the nearest bucket 
that has lg(D) buckets that are split lg(D)
+ *  times and either
+ *  (1) all stay in the same bucket. This yields lg(D).n.lg(n) complexity, but 
we could perhaps avoid this with some summary
+ *      data about where we could split a bucket, or by shrinking the bucket 
to smaller than its ideal span on split when
+ *      we detect it.
+ *  (2) splits half into the next bucket each time. So each lg(D) round incurs 
(n/D^2).lg(n/D^2) costs.
+ *  However, in both cases, if D is small we probably don't care - and if it 
is large then this will happen over a very long
+ *  period of time and so we still probably don't care.
+ * @param <T>
+ */
+@SuppressWarnings({ "rawtypes", "unchecked" })
+public class LogGroupTimers<T extends LogGroupTimers.Timer>

Review Comment:
   Most of the code that is not covered (`Scheduling`) was moved there from the 
`DefaultProgressLog` to make sharing it easier, after the testing for 
`LogGroupTimers` was written. There look to be only three sections of the core 
code not covered by testing besides that, one of which is unused (peek) and 
could be removed. The other two are well covered by the burn test, though I 
agree it would be good to cover them in this test. I would prefer to revisit 
the overall coverage of the project en masse at a later state though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Redesign progress mechanisms to be memory efficient, use fewer messag… [cassandra-accord]

Reply via email to