> Keep in mind that metrics tarballs can be huge. stem's tests probably > shouldn't download one or more of these tarballs in an automatic integ > test run.
Oops yup. Should have mentioned that. We're just picking out a descriptor that seems to exercise most of the parsing. This is just for a sanity check that 'we can still parse something found in the wild'. Megan, Erik: the layout should be pretty obvious when you take a peek in test/integ/descriptor/data/*. > The Java metrics-lib doesn't > understand microdescriptor consensuses, because they don't contain > anything new for statistical analysis, but I think stem will want to > parse them. Definitely. Microdescriptors are available via the control protocol so we need to be able to parse them. > It probably makes sense to have an abstract > NetworkStatusEntry class that does most of the parsing work but that can > be specialized in its subclasses. Picking names like ConsensusEntry if > the consensus class is called Consensus makes sense. Perfect, thanks. Megan, Erik: if I was in your shoes the first thing that I'd do to approach this is propose the following on this list... - an object hierarchy (we already have a bit of one, ex. ServerDescriptor vs RelayDescriptor/BridgeDescriptor) - a description for each of the classes, preferably something meaty that we can use for the pydocs of each class with the :var: entries - your thoughts on which parsing logic should go where (look at the previous descriptor classes for a pattern that you might want to follow) > If there's a > similar concept to Java's inner classes in Python, maybe using something > like Consensus.Entry might be a good choice, too, because this class > will only be used as part of a Consensus. Yup, there is. >>> class Foo: ... class Bar: ... def __init__(self): ... self.my_value = 5 ... def __init__(self): ... self.my_bar = Foo.Bar() ... >>> f = Foo() >>> f.my_bar.my_value 5 > A related question: can you give us a couple of use-cases for the export > functionality? E.g., is filtering (we only want fields X, Y, and Z when Q = > ...) likely to be of use? Anything beyond just a straight dump of > descriptor/network status/etc entries? I'll mostly leave this question for Fabio since the csv dumping functionality was his idea, though my thoughts on some use cases are... - user writes a script that has stem parse the descriptors, filter the results (say, down to Syrian exit relays), then dumps to a csv so they can make pretty graphs or do other analysis of the data - user has a python script that hourly parses their cached descriptors to get any new exits that only allow plaintext traffic, then dump just the fingerprint and ip to a csv so they can later be scanned for malicious activity > Please use the built-in function vars() instead of __dict__ to retrive > instance attributes. Ah ha, thanks. _______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
