On 04/25/2017 02:54 PM, Amos Jeffries wrote: > The class HttpHeader currently provides a mixed bunch of methods to > retrieve headers as a blob of ASCII in a String. This is causing a lot > of code almost-duplication, String copying, validation difficulties, > etc.
Agreed. > Our Parser-NG project seeks to reduce this and make it a lot more > efficient by scoping the parse action inside Parser that can be audited > for accuracy isolated from the complications of header semantic handling. FYI: The "scoping the parse action inside Parser" approach is too vague for me to support or critique. I assume that agreeing on Parser-NG project goals is not really necessary for this specific email thread, but please correct me if I am wrong. > https://tools.ietf.org/html/draft-ietf-httpbis-header-structure > The implementation of this would be refactoring HttpHeader so most > headers can be parsed and syntax validated in one place (inside > HttpHeader) and callers receive a structured tree conforming as much as > possible to the common header structure instead of a String blob. The > callers then (at worst) only need to know what key or parameter it is > looking for and how to scan the tree for that instead of parsing the > whole String. I assume the above plan means adding some kind of a Tree class and that a typical field getter would then look like this (in pseudo code): /// \returns a generic Common Header Structure tree Tree fieldFoo() const { if (const auto field = findEntry(HdrType::FOO)) return FieldParser(field->syntax).parse(field->value); return Tree(); } ... const auto foo = header->fieldFoo(); if (foo.element[3].asAscii().contains("bar"))... If my interpretation of the above plan is anywhere close to what you have in mind, then please do not implement that plan! A tree-focused generic field parser approach is likely to make things worse for Squid. Instead of using tree objects and generic parsers, we should use field-specific [Common Header Structure] syntax directly in _code_: /// \returns a field-specific type/object, like int or HttpHdrCc Foo fieldFoo() const { if (const auto field = findEntry(HdrType::FOO)) { FieldParser parser(field->value); // parse Foo field according to its Common Syntax foo.stamp = parser.timestamp(); // h1-timestamp foo.bar = parser.ascii().contains("bar"); // h1-ascii-string ... return foo; } return Foo(); } ... const auto foo = header->fieldFoo(); if (foo.bar)... Why should we reject "parsing in one place" and "callers that receive trees"? Consider the combination of the following facts: 1. Header field users do not care about the field syntax and do not want to scan generic parser-generated trees. They need access to field-specific values expressed as integers, timestamps, enums, and similar "typed" values, not generic trees. Giving an integer-expecting caller a String is clearly wrong, but giving it a "tree" could be worse! 2. Configuring a generic parser to use [field-]specific syntax is difficult, especially when real-world fields may require adjusting that syntax depending on the being-parsed values. AFIK, all existing solutions to these problems are ugly, inflexible, complex, and/or slow. 3. Squid should not police traffic by default. Many compliant fields do not conform to the Common Header Structure format. Many real fields are not fully compliant, even when they should be. In most cases, Squid should still forward fields as is, even if we cannot parse them. This means that you cannot store most header values using [just] a tree. 4. Each field that Squid manipulates has a fairly simple syntax. A single field type does not normally contain a wide variety of valid inputs (unlike, say, a C++ file that may have numerous valid constructs for a C++ parser to deal with, requiring building a general "tree"). Overall, we most likely need many simple parsers instead of one generic but configurable Common Header Structure parser. Each Foo field getter is essentially a parser. All those simple field parsers will share lots of code, of course (e.g., they will call the same set of FieldParser methods). My sketch above illustrates this approach. HTH, Alex. P.S. I applaud the ongoing Common Header Structure efforts. In Squid context, that work may help define parsing vocabulary in the short term and reduce the number of odd headers we deal with in the long run. _______________________________________________ squid-dev mailing list [email protected] http://lists.squid-cache.org/listinfo/squid-dev
