#18910: distributing descriptors accross CollecTor instances
 Reporter:  iwakeh             |          Owner:  iwakeh
     Type:  enhancement        |         Status:  needs_review
 Priority:  High               |      Milestone:  CollecTor 1.1.0
Component:  Metrics/CollecTor  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:  ctip               |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:

Comment (by iwakeh):

 Replying to [comment:20 karsten]:
 > ...
 >  - It seems we're deprecating a few config options by adding a comment
 to the default `collector.properties` and not paying attention to those
 options in the code anymore.  Shouldn't we instead remove those config
 options entirely, so that operators notice for sure that they need to
 change these config options?  We could mention this in the change log as
 medium change.

 This is only a warning that these will be deprecated.  They're still used,
 but already adressed in #20162.  But they should be removed here.  That
 wouldn't be a small commit, though, and I'd rather have it separate.  I'll
 add a new branch just up to that commit later today.

 >  - Speaking of, can you include a change log entry for this commit?


 >  - I wonder if we could simplify the configuration by avoiding that tri-
 state SyncType option and turning it into a boolean.  Consider
 `ImportCachedRelayDescriptors`, `ImportDirectoryArchives`, and
 `DownloadRelayDescriptors` in the relaydescs module.  We could just add a
 fourth option `SyncRelayDescriptors` that can be `true` or `false`.
 Basically, syncing descriptors from other CollecTor instances would be the
 fourth source for collecting relay descriptors.  This shouldn't change
 much in the code you wrote, but it might make things a bit simpler for
 future operators.  For bridge descriptors and exit lists there would have
 to be two new options to activate the current sources, that is, sanitize
 bridge descriptors found in a local directory or download exit lists from
 the exit list server.

 Sync-Options (this description should be added to package-info, when we
 There are currently three modules that can be synced: relaydescs,
 bridgedescs, and exitlists.

 A module can be turned on or off via its `*Activated` option, which is
 only configurable at start-up and determines if anything is run from this

 The runtime configurable tri-state `Sync*` option values are

 * `NoSync`:  for simply running the module w/o fetching additional data.
 Useful when just the directly (from the Tor network) accessible data is of
 interest, or when having access to bridgedescs etc.
 * `Sync` performs the module run and then fetches data from other
 * `SyncOnly` just fetches data from other instances.  Useful for mirrors
 that don't have access to bridgedescs, or that fetch from a main instance
 they trust and don't have net access otherwise, etc.

 `Sync*` options can be adapted during runtime.  It is possible to switch
 to syncing and then turn it off during runtime.

 So, reducing the tri-state would complicate the combination of 'activated'
 and sync-settings, which is separated from sync, i.e. `Scheduler` doesn't
 and shouldn't now about syncing.

 >  - I found at least one long line that checkstyle should complain about,
 though I didn't run it myself.

 Oh, I usually run this at the last commit.  Will check again.

 > In case you want to start working on any of these comments, can you
 please write `--fixup` commits that resolve those issues in this
 particular commit?  In any case, please don't modify commits 2 to 7 in
 that branch at this point, because I already started reviewing those.

 As said above, I'll add another branch with commits we agree on and in a
 way that the commits make sense in the final master version.

 > More tomorrow.  Thanks!

 Thanks, for reading your way through all that un-commented code!

