#18910: distributing descriptors accross CollecTor instances
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_review
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
Comment (by karsten):
Alright, here's a review of f4026fc:
- All classes in the new `persist` package contains unused imports or
unused attributes. Can you remove those in a fixup commit?
- Can we somehow use `printf` patterns for putting together paths to make
that bunch of `DescriptorPersistence` classes easier to read? It should
be possible to express all paths using placeholders for dates/times,
strings, and chars. And if we care about path separators being platform
independent, we could split the resulting path at `/` characters and put
it back together using `Paths.get()`. Untested example for a bridge
- Building upon the previous idea, maybe we can avoid having
`DescriptorPersistence` classes at all if all we need is a type annotation
and two `printf` patterns. We could define an order of `printf` arguments
for all patterns, like: published, received, source, digest, first char of
digest, second char of digest. That would really save a few lines of
code, wouldn't it? Maybe something for later?
- Half of the paths generated by `DescriptorPersistence` subclasses do
not match paths as they are currently implemented. I only read the code
and did not run it, but I believe the following actual paths would not
have been generated by these classes:
- For the sake of completeness, I believe that the following paths would
have been generated by these classes:
- In `ConsensusPersistence`, the check for `null ==
desc.getConsensusFlavor()` seems too fragile with respect to flavors we'll
add in the future. We wouldn't notice except for microdesc consensuses
being overwritten. We should check the actual consensus flavor if it's
- In `DescriptorPersistence`, I wonder if we should not accept
`StandardOpenOption` or even `StandardOpenOption ...` to distinguish
replace vs. append but rather just a boolean. It seems like an
implementation level detail we're handing out to the caller. Are we sure
we're implementing all options and in particular combination of options
correctly? Those checks where we're only looking at how many such option
values we're being given look really fragile. If we only want to support
replace vs. append, we should only provide those two options in the
- Same class, method `storeAll()` should be more explicit in the method
name or at least documentation that `recent` will only be written if
writing `archive` is successful.
- Same class, methods `storeRecent()` and `storeOut()` might have wrong
argument orders in their `Paths.get()` calls, but I'm not sure.
- `ExtraInfoPersistence` does not distinguish enough between relay and
bridge extra-info descriptors. The latter use a different `@type`
annotation which needs to be reflected in the constructor. It might be
easier to use two distinct `DescriptorPersistence` classes for
`RelayExtraInfoDescriptor` and `BridgeExtraInfoDescriptor`.
- Same class, `@type bridge-extra-info` is not `1.0` but `1.3` right now.
And we'll almost certainly forget to update the version number there in
case we move to `1.4`. Can we avoid having to remember that? This
probably applies to all `@type` annotations, though the bridge descriptors
numbers are most likely to change over time.
- In `PersistenceUtils`, method `storeToFileSystem()`, that check for
`append.length == 0` could be changed to `append.length <= i` to make the
method a tiny bit more robust against wrong usage.
- Same class, are the `dateDependentPath*()` methods used anywhere? If
not and if there are no plans to use them, can you remove them?
- Same class, `dateDependentPathYm()` (assuming that it'll be used) does
not indicate that it also includes the day in the path, and it appends the
`append` part without dash, unlike `dateDependentPathYmd()`.
- `ServerDescriptorPersistence`: see comments for `ExtraInfoPersistence`.
- `StatusPersistence`: why not call it `BridgeNetworkStatusPersistence`?
But regardless of that, the version must be `1.1` and adapted whenever it
updates. See above.
- The following didn't become entirely clear from reading the code, so
let me ask: are files written to temporary files like `.somefile.tmp` and
later renamed to `somefile`? We should make sure we're doing that for
files in `recent/` that we append descriptors, because otherwise those
half-written files will be indexed and made available, which we should
avoid. Basically, if a file may change after being written, let's write
it to a temp file and only rename it to its final file name when we're
sure it won't change anymore.
- Similar to the previous comments, I found a few formattings that should
have been caught by checkstyle.
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
tor-bugs mailing list