#22983: Add a Descriptor subinterface and implementation for Tor web server logs -----------------------------+----------------------------------- Reporter: iwakeh | Owner: metrics-team Type: enhancement | Status: needs_revision Priority: Medium | Milestone: metrics-lib 2.2.0 Component: Metrics/Library | Version: Severity: Normal | Resolution: Keywords: metrics-2017 | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -----------------------------+-----------------------------------
Comment (by karsten): Agreed with all points above except one: When ''parsing'' sanitized log lines metrics-lib should not reject log lines that it would discard when ''sanitizing'' original log lines. It's not the job of the ''parser'' to ensure that its input is properly sanitized or to do some sort of post-sanitizing. Of course it needs to perform some basic format verifications to perform its job. But dropping lines because the sanitizer would drop them seems out of place. Imagine a hypothetical situation where we decide at some point that HEAD requests are too sensitive and we take them out in the parser. However, previously sanitized logs would still contain them, including archives that people keep locally and that we can't update. If somebody then takes a recent metrics-lib version to parse their data, they'd suddenly don't get the HEAD lines anymore. That would be rather confusing. I think sanitizing and parsing should be separate things. In this case, discarding lines because of certain field contents should be left to the sanitizer. Does that mean we should provide a general-purpose log parser? Probably not. In the parser we don't have to provide getters for fields that we don't care about, like user-agent string. But we should be prepared to find request methods GET, HEAD, POST, or really anything else in log lines we're given. Does that make sense, or am I overlooking something? (By the way, it's a good thing that we're keeping the spec unchanged with regard to IP addresses not starting with `0.0.0.`. I think it would have been pretty bad to just rewrite the first three octets to `0.0.0` and keep the fourth unchanged. Not very privacy-preserving.) -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:50> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs