Oh, and another quick thought - you once mentioned that a descriptor search service would make ExoneraTor obsolete, and in looking it over I agree. The search functionality ExoneraTor provides is trivial. The only reason it requires such a huge database is because it's storing a copy of every descriptor ever made.
I suspect the actual right solution isn't to rewrite ExoneraTor at all, but rather develop a new service that can be queried for this descriptor data. That would make for a *much* more worthwhile project. ExoneraTor? Nice to have. Descriptor archive service? Damn useful. :) On Sun, Jun 8, 2014 at 3:03 PM, Damian Johnson <[email protected]> wrote: > Hi Karsten. This is diving into enough detail that we might as well move this > over to tor-dev@. For the list's benefit, Karsten and I are discussing a > Python rewrite of ExoneraTor... > > https://exonerator.torproject.org/ > https://gitweb.torproject.org/exonerator.git > > > First I think I need to take a step back to figure out exactly what we're > after. From a quick peek at ExoneraTor it looks like it behaves as follows... > > a. User enters an address (IPv4 or IPv6) and a date (either for a day or an > hour). > > b. ExoneraTor lists router status entries for all relays that match the > criteria. These entries link to the consensus they came from and server > descriptors they reference. > > c. The user can then enter a destination address and port to search exit > policies in TorDNSEL entres. > > Step 'a' and 'b' make sense to me. Step 'c' however I'm having a little > difficulty groking. Ignoring TorDNSEL entries for a moment, we already have > all the ingredients to provide the user with three fields to start with... > > * Source Address (required) > * Timestamp (required) > * Destination Address and/or Port (optional) > > The source address and timestamp come from the consensus, and an optional > 'can it exit to destination X' consults the server descriptor's exit policy. > > So what is TorDNSEL providing us and why is it a separate search on the page? > As I understand it the value of TorDNSEL is that we can't trust the address in > the router status entries. If that's the case then our present search fields > don't make sense to me... > > * Our initial search consults consensus information for the address and > timestamp but not the exit policy. This is weird both because the address > this has is faulty, and we have the exit policy so we could trivially > include that in our search criteria. > > * Our second search gives the impression that we're using the earlier > consensus results to query exit criteria from TorDNSEL. As I understand > it though that's not what it's doing. TorDNSEL is completely independent > from the consensus information. > > I could understand a search that just consults consensus information (ignoring > address accuracy, it has everything we need). I could also understand a search > that just consults TorDNSEL information (ignoring its inconsistent poll rate, > it has everything we need). > > However, this hybrid approach and how it's presented really confuses me. > Unless I'm mistaken with something above what I'd expect from ExoneraTor is... > > * The three search fields mentioned above. > > * It shows results based on the consensus information like we presently do. > > * If we have TorDNSEL entries that either indicate that a relay we're > presenting had a different external address or another relay had the > address we're searching for then note that. > > That is to say, the base search is based on consensus information (using > server descriptor exit policies if we want to filter by that), and the > TorDNSEL results are just appended notes since we can't rely on its poll rate. > > Thoughts? > > Cheers! -Damian > > PS. Congratulations on getting me invested. I just spent the last three hours > in front of a whiteboard trying to puzzle out why ExoneraTor works the way it > presently does. ;) > > PPS. Stem's ExitPolicy class has a can_eixt_to() method that would be really > handy for this... > > > https://stem.torproject.org/api/exit_policy.html#stem.exit_policy.ExitPolicy.can_exit_to > > PPPS. I'm still hesitant about actually tackling this project. Arm is midway > through being rewritten, and considering its sudden uptick in usage probably > the most important project on my plate right now. > > That said, I'm happy to discuss this. Even if we don't implement it right now > this thread will be useful so we know where we're going with ticket #8260. > > Concerning the earlier discussion of 'work with Karsten on a python project' > I have a personal bias toward collaborating when the project has few unknowns > for me, but working alone when *I'm* learning something. That is to say, I'd > love to work with you on a straightforward Stem project and I'd also like > to discuss ExoneraTor's design. But when it comes to coding, this has enough > unknowns that if I take it on I'd prefer to experiment alone for a while - at > least until I know enough about the APIs involved that I can avoid > embarrassing myself. :) > > > On Sun, Jun 8, 2014 at 2:56 AM, Karsten Loesing <[email protected]> > wrote: >> On 08/06/14 06:27, Damian Johnson wrote: >>>>> Here's a quick overview of the codebase to facilitate reading through it: >>>> >>>> Ahhh, very useful - thanks. >>> >>> Hmmm. Just took a quick peek at the ExoneraTor codebase and, unless >>> I'm mistaken, it doesn't actually use metrics-lib, does it? >> >> You're right, looks like it doesn't. >> >>> Honestly >>> looking over the code is making me a little hesitant to take this on >>> after all. I was anticipating a small, quick project of DocTor's scope >>> but I've never touched SQLAlchemy or Posgress before. >> >> I don't think we'll even have to touch the Postgres for moving from Java >> to Python. The Python code would simply do SQL calls via its SQL >> library just like Java does. >> >> I just copied all SQL statements that the Python part would have to >> prepare and execute: >> >> CALL insert_descriptor(?, ?); >> CALL insert_statusentry(?, ?, ?, ?, ?, ?, ?); >> CALL insert_consensus(?, ?); >> CALL insert_exitlistentry(?, ?, ?, ?, ?); >> SELECT MIN(validafter) AS first, MAX(validafter) AS last FROM consensus; >> SELECT validafter FROM consensus WHERE validafter >= ? AND validafter <= ?; >> CALL search_statusentries_by_address_date(?, ?); >> CALL search_addresses_in_same_24 (?, ?); >> CALL search_addresses_in_same_48 (?, ?); >> SELECT rawdescriptor FROM descriptor WHERE descriptor = ?; >> SELECT descriptor, rawdescriptor FROM descriptor WHERE descriptor LIKE ?; >> SELECT rawconsensus FROM consensus WHERE validafter = ?; >> >> That's it. No further knowledge about Postgres required. >> >>> Once I wrote this I realized I'm being a damn hypocrite. Here I was >>> saying "Karsten, learn Python so we can leverage each other's >>> codebases!" but then I hightail it once the project delves into areas >>> new to me. New arm users are showing up almost daily on irc and I'm >>> anxious to give them a new release... but then this is exactly the >>> issue, isn't it? Deliverables you'd like to focus on crowding out time >>> to learn new things. >>> >>> So TL;DR I'm gonna eat my own words and suggest we focus on our >>> separate domains for now. I really would like to work on some small >>> metrics projects with you. Each month I eyeball your status reports >>> asking myself "Is there anything here I can work with Karsten on to >>> draw our spaces closer together?" so please let me know if you run >>> across anything in Metrics we can collaborate on. >> >> (Replying below, first replying to the DynamoDB part.) >> >>> Your hypocritical friend, ~Damian >>> >>> PS. When we next meet I'd like to discuss ExoneraTor's design a bit. >>> First thought I had when looking at the code was 'huh... I wonder if >>> this would be a good use case for DynamoDB'. >> >> I'm wary about moving to another database, especially NoSQL ones and/or >> cloud-based ones. They don't magically make things faster, and Postgres >> is something I understand quite well by now. And again, I think that we >> keep the Postgres part entirely unchanged when moving to Python. Not >> saying that DymanoDB can't be the better choice, but switching the >> database is not a priority for me. >> >> >> So, regarding the rewrite: rather than canceling the project before it >> starts, how about we find a role for you that you're more comfortable with? >> >> For example, I'd want to try rewriting it step by step based on your >> suggestion of frameworks/libraries and with some code review of yours. >> >> If you're interested, which framework would I use for the new Python >> ExoneraTor? It's supposed to do the following tasks: >> >> - Provide a simple web site with a web form, backed by the PostgreSQL >> database. >> - Maybe offer a simple RESTful API for lookups that the web form could >> use to compose responses, but that could also be used by other >> applications directly. >> - Return documents from the database by identifier, so without >> providing a search functionality. >> - Run a scheduled task once per hour that fetches data from CollecTor >> and puts it in a database. >> >> Bonus points if the result is as easy to deploy on Debian Wheezy as >> possible. Like, install these few Debian packages, run the setup >> script, done. >> >> Of course, if you'd prefer to focus on other things and not discuss >> ExoneraTor stuff, that's perfectly fine, too. :) >> >> All the best, >> Karsten >> _______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
