On 01/03/2014 05:28 AM, Amos Jeffries wrote: >>> 1) avoid the current design of 2-pass parsing: a) generic-parse just to >>> determine it is a request-line, then b) re-parse with HttpRequest to get >>> the fields already found by HttpParser.
>> I do not think we need two passes. The caller knows what it is parsing >> and will use the right parser. That is, the caller will do either >> >> Http1::RequestParser parser(...) >> if (parser.parse(...)) >> >> or >> >> Http1::ResponseParser parser(...) >> if (parser.parse(...)) >> >> The above sketch omits many details, of course; for example, the >> incremental parsers will be created once per message rather than once >> per parse call. However, there is just one pass in the above sketch. > "how" is the problem. The first pass I'm talking of is the discovery of > that information used to run the pass these parsers will do. It may or > may not be a complete pass or just several bytes preview-style into the > data. You may recall we had to add a separate layer of memcmp() to > distinguish between HTTP/1 and HTTP/2 alone. Since there is not much value in making two passes over the same data, we will eliminate them. Let's not assume that they have to be there and designing to accommodate that wrong assumption! Your initial patch may not eliminate the two passes, but your design does not have to suffer from them from the beginning. The initial patch you posted already creates the parser once, when needed. That is the right way to do it. Eventually, you may use approach #1 (below) and give that parser a request message, OR you can use approach #2 (below) and make that parser create the request message. AFAICT, you do not have to make that decision now because you are not modifying/removing the "second pass" code yet. >>> 3) avoid having to allocate a HttpRequest/HttpResponse object just to >>> test parsing for invalid request/responses. >> >> I think there are two valid approaches here: >> >> 1) Always create the right empty HttpR* message object. Always create >> the right parser. Give the message object to the parser so that the >> parser can use it for storage during parsing. >> >> 2) Always create the right parser. Allow the parser to create the right >> empty HttpR* message object if/when needed, so that the parser can use >> the message object for storage during parsing. >> >> Approach (1) is probably simpler: No logic to decide when the message >> object should be created. No code to check whether the message object >> has been created already. It can be modified so that the parser creates >> the right message object unconditionally, to simplify the caller's code. >> >> Approach (2) is more efficient if and only if there are a lot of cases >> where the parser determines that no message object is needed to handle a >> parse() call. Those are the cases where message object creation will be >> delayed or even avoided. >> >> I suspect that there are not enough such cases to warrant the complexity >> of (2), but you probably know this area better than I do. > We have enough abort cases (Request parser generates Response object) to > make me pick (2) ealier as the end goal. The generated response objects are irrelevant IMO. Parsers in both approaches can generate them, whether you supply the request message to the request parser or let the parser create that request message as needed. > Avoids replicating validation > outside the parser which has already been done within and has less info > outside. I do not understand what duplicated validation (of the request being parsed?) you are talking about, but I do not foresee any duplication in either design. Perhaps you are talking about duplicated parsing/validation in the current code, but I do not think it is relevant to this discussion because it will be gone by the time we are done. Overall, given vanishing correlation between our emails, I suspect we are simply talking past each other: discussing different matters while thinking that we are discussing the same thing. To make progress, I suggest the following specific changes as the next step: * Switch to the new namespace scheme. If I were you, I would not bother with adding or renaming any files at this point unless absolutely necessary. That should be done as the last step, to minimize intermediate patch sizes and rererenaming overheads. For the same reason, I would avoid updating test cases at least until the major design issues are settled. * Rename your Http::Http1Parser class to Http::One::Parser class. * Add empty Http::One::RequestParser class, a child of Http::One::Parser. This class will be referred by callers as Http1::RequestParser. Again, to minimize overheads, ease review, and focus on what it important, this class should be initially located in the same header and source files as the Http::Http1Parser (we will split sources later). * Move code (including methods and data members) specific to request parsing from Http::One::Parser to Http::One::RequestParser. Finalize moved methods/data descriptions. Any generic code, such as code isolating the first HTTP message line (if any), can stay in Http::One::Parser, but please do not make any existing code generic even if it is possible to do so (that should be done later if needed). * Please do not add any code (including methods and data members) specific to response parsing at this time. * Address other, non-controversial review issues as usual. * Re-post the patch. I think the above plan will help us focus on what is critical now and minimize iterations. > /* NP: don't be tempted to move this down or remove again. > * It's the only DDoS protection old-String has against long URL */ > if ( hp->bufsiz <= 0) { > debugs(33, 5, "Incomplete request, waiting for end of request line"); > return NULL; > } else if ( (size_t)hp->bufsiz >= Config.maxRequestHeaderSize && > headersEnd(hp->buf, Config.maxRequestHeaderSize) == 0) { > debugs(33, 5, "parseHttpRequest: Too large request"); > hp->request_parse_status = Http::scHeaderTooLarge; > return parseHttpRequestAbort(csd, "error:request-too-large"); > } BTW, the above should be moved into the RequestParser code, with an adjusted comment, and headersEnd() call moved/merged into the overall parsing logic. HTH, Alex.