#18910: distributing descriptors accross CollecTor instances
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_information
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
Comment (by iwakeh):
Thanks for the remarks and suggestions!
I'm replying inline below and also add a wiki page
[wiki:doc/CollecTor/DescriptorDistribution CollecTor Sync] that contains
the current status of the discussion. Please, take a look there to see
the entire picture.
Replying to [comment:14 karsten]:
> Hmm, the suggested config options would imply that there's only one new
sync manager module that syncs all descriptors from the various sources
and that runs, say, once per hour? I wonder how to schedule that in a way
that it does not interfere with the other modules. So far, modules were
pretty much independent, but this new module would create a dependency
You're right, they should stay independent. I intended that, too, but I
had a different (more complicated) architecture in mind.
> Alternative suggestion: we add four (sets of) configurations, one for
each module, that internally re-use the same code for syncing descriptors
and for importing them. For example, `SyncRelayDescriptors`,
`SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`.
Good idea! So we run the sync-function after or instead of the module run
(see wiki page for more).
> We could then provide a remote path where to find descriptor files (like
`/recent/relay-descriptors/`) and could implictly only consider descriptor
types that the respective module understands (like
`RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
Actually, the directory structure of a CollecTor's 'recent' is given, i.e.
the different mirrors won't or shouldn't use a different directory
sructure than the main instance. So, it suffices to activate the module
and set the sync or sync-only option. The path structure for the actual
download is determined. The straightforward paths for torperf and
exitlists and the more complex structure for bridge- and relay-
> Here's a potential policy we could apply to decided whether to keep a
local or remote descriptor: while syncing, if we find out that a remotely
obtained descriptor would be stored under a file name that already exists
locally, we always discard that;...
So, //while syncing// means while retrieving descriptors from a different
instance and writing them to the local `SyncFolder` structure. And,
during this process descriptors already available in the sync-folder are
> ... and while processing descriptors locally, if we find that we already
have a file locally with different content, which we likely received while
syncing, we always overwrite that. This means that we're only adding data
but never replacing data.
This refers to the process of comparing the descriptors fetched from
remote instances with descriptors already in the 'recent' folder of the
syncing instance? Such local descriptors could have been obtained by
direct download or a different syncing operation. Did I miss something
> Regarding deleting synced descriptors, we should never do that, but we
should rather let `DescriptorCollector` clean up the local directory when
it finds that a local file does not exist anymore remotely.
True, if this refers to descriptors in the SyncFolder.
> Here's something else to watch out for while writing this code: whenever
we learn descriptors from syncing, we'll have to include them in our
`/recent/` directory, too. This wasn't entirely clear to me from the
description above, so if this was already the plan, never mind.
That was intended, but should be clearly stated; will be added to the wiki
Hope I don't see things too complicated.
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
tor-bugs mailing list