#26227: Review existing stem.client code
---------------------------+------------------------------
 Reporter:  dmr            |          Owner:  dmr
     Type:  task           |         Status:  needs_review
 Priority:  Medium         |      Milestone:
Component:  Core Tor/Stem  |        Version:
 Severity:  Normal         |     Resolution:
 Keywords:  client         |  Actual Points:
Parent ID:                 |         Points:
 Reviewer:  atagar         |        Sponsor:
---------------------------+------------------------------

Comment (by dmr):

 ==== HTTP, potential bugs
 **Summary**: HTTP parsing has a lot of edge cases
 **Suggestion**: use a library for this (see architecture ticket, #26227)

 There are a number of potential bugs with the HTTP handling.
 HTTP is a nuanced spec. It's very possible that these bugs will never be
 exercised due to tor's implementation, but it's also hard to know how tor
 will evolve in the future.

 Here's the potential bugs I saw:
 * HTTP defines headers as case-insensitive, but
 
[[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n860|the
 parsing code creates a dictionary]] that is case-sensitive. This is in
 contrast to `_download_from_dirport()` which has tests to assure the
 `reply_headers` attribute functions in a case-insensitive way. See
 [[https://github.com/dmr-x/stem/commits/26227-client-code-review--HTTP-
 headers-
 
case|branch]]/[[https://github.com/dmr-x/stem/commit/cba6a8ba508108e8dcbf6f7a0b2e3978524eb2aa|commit]]
 with a demo test that shows this.
 * HTTP allows headers to be
 [[https://tools.ietf.org/html/rfc1945#page-21|split among continuation
 lines]], as well as for multi-attribute headers to be specified in a CSV
 or as `key: single_value` with different values on multiple lines. The
 code currently assumes that
 
[[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n867|each
 header is completely specified on a single line]] and that
 
[[https://gitweb.torproject.org/stem.git/tree/stem/descriptor/remote.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n863|there
 are no continuation lines]].
 * Both of the above may mean that the content encoding may not be
 correctly passed to `_decompress()`.
 * We're using HTTP/1.0. I'm not super-familiar with the specifics of
 HTTP/1.0, but there may be some restrictions that we run into, as far as
 e.g. [[https://tools.ietf.org/html/rfc1945#section-3.5|content encodings
 allowed]]. Might want to be using HTTP/1.1 to be "safe". (I haven't looked
 at the dir spec or tor implementation to know if there's any functional
 impact.)
 * fixed in
 
[[https://github.com/dmr-x/stem/commit/ba275219bd50dfc408befca81ef4d3116008dbcc|commit
 `ba275219bd50dfc408befca81ef4d3116008dbcc`]]: the Reason-Phrase of the
 HTTP response is only recommended to be `OK` for `200`, but that is not
 required. The parsing code did require it to be `OK`.

 So there are ways to fix all these cases, but my preference would be to
 not reinvent the wheel here. This is discussed more in the architecture
 ticket (#26227).

 == Phew
 That's a lot. Sorry - still working on my verbosity! Hopefully it's at
 least well organized!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26227#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Reply via email to