#26227: Review existing stem.client code ---------------------------+------------------------------ Reporter: dmr | Owner: dmr Type: task | Status: needs_review Priority: Medium | Milestone: Component: Core Tor/Stem | Version: Severity: Normal | Resolution: Keywords: client | Actual Points: Parent ID: | Points: Reviewer: atagar | Sponsor: ---------------------------+------------------------------
Comment (by dmr): ==== HTTP, potential bugs **Summary**: HTTP parsing has a lot of edge cases **Suggestion**: use a library for this (see architecture ticket, #26227) There are a number of potential bugs with the HTTP handling. HTTP is a nuanced spec. It's very possible that these bugs will never be exercised due to tor's implementation, but it's also hard to know how tor will evolve in the future. Here's the potential bugs I saw: * HTTP defines headers as case-insensitive, but [[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n860|the parsing code creates a dictionary]] that is case-sensitive. This is in contrast to `_download_from_dirport()` which has tests to assure the `reply_headers` attribute functions in a case-insensitive way. See [[https://github.com/dmr-x/stem/commits/26227-client-code-review--HTTP- headers- case|branch]]/[[https://github.com/dmr-x/stem/commit/cba6a8ba508108e8dcbf6f7a0b2e3978524eb2aa|commit]] with a demo test that shows this. * HTTP allows headers to be [[https://tools.ietf.org/html/rfc1945#page-21|split among continuation lines]], as well as for multi-attribute headers to be specified in a CSV or as `key: single_value` with different values on multiple lines. The code currently assumes that [[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n867|each header is completely specified on a single line]] and that [[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n863|there are no continuation lines]]. * Both of the above may mean that the content encoding may not be correctly passed to `_decompress()`. * We're using HTTP/1.0. I'm not super-familiar with the specifics of HTTP/1.0, but there may be some restrictions that we run into, as far as e.g. [[https://tools.ietf.org/html/rfc1945#section-3.5|content encodings allowed]]. Might want to be using HTTP/1.1 to be "safe". (I haven't looked at the dir spec or tor implementation to know if there's any functional impact.) * fixed in [[https://github.com/dmr-x/stem/commit/ba275219bd50dfc408befca81ef4d3116008dbcc|commit `ba275219bd50dfc408befca81ef4d3116008dbcc`]]: the Reason-Phrase of the HTTP response is only recommended to be `OK` for `200`, but that is not required. The parsing code did require it to be `OK`. So there are ways to fix all these cases, but my preference would be to not reinvent the wheel here. This is discussed more in the architecture ticket (#26227). == Phew That's a lot. Sorry - still working on my verbosity! Hopefully it's at least well organized! -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26227#comment:5> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs