> Here, I think it is realistic to try and use and import all the fields > available from metrics-db-*. > My PoC is overly simplistic in this regard: only relay descriptors, and only > a limited subset of data fields is used in the schema, for the import.
I'm not entirely sure what fields that would include. Two options come to mind... * Include just the fields that we need. This would require us to update the schema and perform another backfill whenever we need something new. I don't consider this 'frequent backfill' requirement to be a bad thing though - this would force us to make it extremely easy to spin up a new instance which is a very nice attribute to have. * Make the backend a more-or-less complete data store of descriptor data. This would mean schema updates whenever there's a dir-spec addition [1]. An advantage of this is that the ORM could provide us with stem Descriptor instances [2]. For high traffic applications though we'd probably still want to query the backend directly since we usually won't care about most descriptor attributes. > The idea would be import all data as DB fields (so, indexable), but it makes > sense to also import raw text lines to be able to e.g. supply the frontend > application with raw data if needed, as the current tools do. But I think > this could be made to be a separate table, with descriptor id as primary key, > which means this can be done later on if need be, would not cause a problem. > I guess there's no need to this right now. I like this idea. A couple advantages that this could provide us are... * The importer can provide warnings when our present schema is out of sync with stem's Descriptor attributes (ie. there has been a new dir-spec addition). * After making the schema update the importer could then run over this raw data table, constructing Descriptor instances from it and performing updates for any missing attributes. Cheers! -Damian [1] https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt [2] This might be a no-go. Stem Descriptor instances are constructed from the raw descriptor content, and needs it for str(), get_bytes(), and signature validation. If we don't care about those we can subclass Descriptor and overwrite those methods. _______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
