#18910: distributing descriptors accross CollecTor instances
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_review
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
Comment (by iwakeh):
Replying to [comment:20 karsten]:
> - It seems we're deprecating a few config options by adding a comment
to the default `collector.properties` and not paying attention to those
options in the code anymore. Shouldn't we instead remove those config
options entirely, so that operators notice for sure that they need to
change these config options? We could mention this in the change log as
This is only a warning that these will be deprecated. They're still used,
but already adressed in #20162. But they should be removed here. That
wouldn't be a small commit, though, and I'd rather have it separate. I'll
add a new branch just up to that commit later today.
> - Speaking of, can you include a change log entry for this commit?
> - I wonder if we could simplify the configuration by avoiding that tri-
state SyncType option and turning it into a boolean. Consider
`ImportCachedRelayDescriptors`, `ImportDirectoryArchives`, and
`DownloadRelayDescriptors` in the relaydescs module. We could just add a
fourth option `SyncRelayDescriptors` that can be `true` or `false`.
Basically, syncing descriptors from other CollecTor instances would be the
fourth source for collecting relay descriptors. This shouldn't change
much in the code you wrote, but it might make things a bit simpler for
future operators. For bridge descriptors and exit lists there would have
to be two new options to activate the current sources, that is, sanitize
bridge descriptors found in a local directory or download exit lists from
the exit list server.
Sync-Options (this description should be added to package-info, when we
There are currently three modules that can be synced: relaydescs,
bridgedescs, and exitlists.
A module can be turned on or off via its `*Activated` option, which is
only configurable at start-up and determines if anything is run from this
The runtime configurable tri-state `Sync*` option values are
* `NoSync`: for simply running the module w/o fetching additional data.
Useful when just the directly (from the Tor network) accessible data is of
interest, or when having access to bridgedescs etc.
* `Sync` performs the module run and then fetches data from other
* `SyncOnly` just fetches data from other instances. Useful for mirrors
that don't have access to bridgedescs, or that fetch from a main instance
they trust and don't have net access otherwise, etc.
`Sync*` options can be adapted during runtime. It is possible to switch
to syncing and then turn it off during runtime.
So, reducing the tri-state would complicate the combination of 'activated'
and sync-settings, which is separated from sync, i.e. `Scheduler` doesn't
and shouldn't now about syncing.
> - I found at least one long line that checkstyle should complain about,
though I didn't run it myself.
Oh, I usually run this at the last commit. Will check again.
> In case you want to start working on any of these comments, can you
please write `--fixup` commits that resolve those issues in this
particular commit? In any case, please don't modify commits 2 to 7 in
that branch at this point, because I already started reviewing those.
As said above, I'll add another branch with commits we agree on and in a
way that the commits make sense in the final master version.
> More tomorrow. Thanks!
Thanks, for reading your way through all that un-commented code!
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:21>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
tor-bugs mailing list