> Here, I think it is realistic to try and use and import all the fields 
> available from metrics-db-*.
> My PoC is overly simplistic in this regard: only relay descriptors, and only 
> a limited subset of data fields is used in the schema, for the import.

I'm not entirely sure what fields that would include. Two options come
to mind...

* Include just the fields that we need. This would require us to
update the schema and perform another backfill whenever we need
something new. I don't consider this 'frequent backfill' requirement
to be a bad thing though - this would force us to make it extremely
easy to spin up a new instance which is a very nice attribute to have.

* Make the backend a more-or-less complete data store of descriptor
data. This would mean schema updates whenever there's a dir-spec
addition [1]. An advantage of this is that the ORM could provide us
with stem Descriptor instances [2]. For high traffic applications
though we'd probably still want to query the backend directly since we
usually won't care about most descriptor attributes.

> The idea would be import all data as DB fields (so, indexable), but it makes 
> sense to also import raw text lines to be able to e.g. supply the frontend 
> application with raw data if needed, as the current tools do. But I think 
> this could be made to be a separate table, with descriptor id as primary key, 
> which means this can be done later on if need be, would not cause a problem. 
> I guess there's no need to this right now.

I like this idea. A couple advantages that this could provide us are...

* The importer can provide warnings when our present schema is out of
sync with stem's Descriptor attributes (ie. there has been a new
dir-spec addition).

* After making the schema update the importer could then run over this
raw data table, constructing Descriptor instances from it and
performing updates for any missing attributes.

Cheers! -Damian

[1] https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt
[2] This might be a no-go. Stem Descriptor instances are constructed
from the raw descriptor content, and needs it for str(), get_bytes(),
and signature validation. If we don't care about those we can subclass
Descriptor and overwrite those methods.
_______________________________________________
tor-dev mailing list
[email protected]
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Reply via email to