On 1/22/14 4:32 AM, Damian Johnson wrote: >> Damian, can you try to parse these descriptors using stem, to see if the >> descriptor annotations are correct and if stem can parse them without >> issues? > > Hi Karsten, sorry about the delay! Yup, stem parses them just fine > (though processing compressed tarballs still takes an unpleasantly > long time)... > > > % du -h microdescs-2014-01.tar.bz2 > 1.8M microdescs-2014-01.tar.bz2 > > > % cat parse.py > from stem.descriptor.reader import DescriptorReader > > counter = 0 > > with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader: > for desc in reader: > counter += 1 > > print "Found %i microdescriptors" % counter > > > % time python parse.py > Found 14999 microdescriptors > > real 67m15.022s > user 65m50.259s > sys 1m13.717s
Wow, that's indeed time-consuming. Inflating the tarball before feeding it into stem probably solves this problem. (That's what I usually do with metrics-lib, too.) Thanks for testing this! Will deploy the metrics-db changes on yatei. All the best, Karsten _______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
