#18910: distributing descriptors accross CollecTor instances
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_review
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
Comment (by iwakeh):
I think below I reply to all of [comment:25] except the paths structures,
which will have to wait for the commit of the tests.
Changes can be found in the new commit of the
-first-sync-2&id=aafedaa74e8451303e8cdaa882fd2821927d29ab second branch].
> ... - All classes in the new `persist` package contains unused imports
or unused attributes. Can you remove those in a fixup commit?
Removed in second branch.
> - Can we somehow use `printf` patterns for putting together paths to
make that bunch of `DescriptorPersistence` classes easier to read? It
should be possible to express all paths using placeholders for
dates/times, strings, and chars. And if we care about path separators
being platform independent, we could split the resulting path at `/`
characters and put it back together using `Paths.get()`. Untested example
for a bridge extra-info descriptor:
What makes this more readable to me, is the code formatting, i.e. one line
per path element. Whereas, I don't read string format patterns that
often, which makes it less readable to me. (String.format is also locale
dependent and would have to have an additional argument. )
Added the one line per path element to the new branch as well as some
comments for the path elements. Hope this works for you too?
> - Building upon the previous idea, maybe we can avoid having
`DescriptorPersistence` classes at all if all we need is a type annotation
and two `printf` patterns. We could define an order of `printf` arguments
for all patterns, like: published, received, source, digest, first char of
digest, second char of digest. That would really save a few lines of
code, wouldn't it? Maybe something for later?
The path deriving code surely can be done nicer. The main goal for the
current code was to extract the somewhat hidden (in huge methods/classes)
path structures and have them in a defined location, which opens the door
for future simplifications.
Having persistence classes might be useful for other types of persistence
or the statistics (cf. #20335, #20345).
> - In `ConsensusPersistence`, the check for `null ==
desc.getConsensusFlavor()` seems too fragile with respect to flavors we'll
add in the future. We wouldn't notice except for microdesc consensuses
being overwritten. We should check the actual consensus flavor if it's
If flavor is something other than "microdesc", the consensus type will be
> - In `DescriptorPersistence`, I wonder if we should not accept
`StandardOpenOption` or even `StandardOpenOption ...` to distinguish
replace vs. append but rather just a boolean. It seems like an
implementation level detail we're handing out to the caller. Are we sure
we're implementing all options and in particular combination of options
correctly? Those checks where we're only looking at how many such option
values we're being given look really fragile. If we only want to support
replace vs. append, we should only provide those two options in the
> - Same class, method `storeAll()` should be more explicit in the method
name or at least documentation that `recent` will only be written if
writing `archive` is successful.
The StandardOpenOptions are defined in the java api, so there is no
implementation to be done. There are more than two options of interest
(afaict): create_new, append, and truncate_existing, i.e. do not do
anything in case the file exists, append, and replace.
I added more javadoc and overloaded the methods, so we have methods w/o
options, and other methods with exactly one option per file.
> - Same class, methods `storeRecent()` and `storeOut()` might have wrong
argument orders in their `Paths.get()` calls, but I'm not sure.
I renamed the parameters to `recentRoot` and `outRoot` to better express
> - In `PersistenceUtils`, method `storeToFileSystem()`, that check for
`append.length == 0` could be changed to `append.length <= i` to make the
method a tiny bit more robust against wrong usage.
I thought the same, so the method wasn't in use and is now removed.
> - Same class, are the `dateDependentPath*()` methods used anywhere? If
not and if there are no plans to use them, can you remove them?
Those are gone in the second branch.
> - The following didn't become entirely clear from reading the code, so
let me ask: are files written to temporary files like `.somefile.tmp` and
later renamed to `somefile`? We should make sure we're doing that for
files in `recent/` that we append descriptors, because otherwise those
half-written files will be indexed and made available, which we should
avoid. Basically, if a file may change after being written, let's write
it to a temp file and only rename it to its final file name when we're
sure it won't change anymore.
Files for 'recent' are first written to a temporary file and then moved to
the final location.
> - Similar to the previous comments, I found a few formattings that
should have been caught by checkstyle.
There were no complaints, checked again for the changes for the commit on
the second branch.
Regarding @type annotations: I introduced an `Annotation` enum, which
contains all @type annotations. All occurances of @type strings are
replaced. So, now these annotations are in one place. But, don't they
actually belong to metrics-lib? There the parser could check for unknown
annotation versions or types.
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:45>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
tor-bugs mailing list