Hello Pieter, Thank you very much for your extensive and certainly valid feedback.
I agree the approach up until now has been relatively "ad-hoc", as it has grown organically over time for QUIC and HTTP/3. Especially as we're aiming to (properly) start generalizing qlog to other protocols and use cases, we need to have discussions about serialization formats and the datatype language used in the drafts. I have several presentations on qlog planned for this IETF week (in dispatch, maprg, tsvwg, iccrg, saag and opsawg), with the explicit goal of hopefully getting some outside feedback/interest. I specifically mention the format/definition language there as one of the main challenges. I'm well aware that some of this has already been solved for other use cases; I/we just didn't/don't have enough experience with all options to make these decisions before. Your feedback was exactly the type of thing we were hoping for by soliciting outside viewpoints. Some concrete thoughts of me personally on your points: 1) My personal main goal for the datatype definition language in the documents would be to allow automatically generating schema definitions for multiple serialization formats / programming languages. I assume CDDL and/or YANG allow that, so they seem like good candidates. Existing mappings to e.g., JSON/CBOR would definitely also come in handy there. I wonder if there are also (non-standard) mappings to e.g., protocol buffers/flatbuffers for YANG? 2) I sort of disagree with not needing a serialization format indicator. In our tooling, we currently support 4 completely different file types, that (can) all use the .json extension (and the same MIME types) by default. I agree that conceptually it's not needed, but practically it's very useful to have. Of course, that might become moot if JSON isn't the main format going forward. 3) I can appreciate that NDJSON is not an IETF RFC, but I'm also not yet sure we want to move to e.g., CBOR as the default format, as it removes easy human readability. The main thing that has become clear the past months is that streaming should be the main use case (instead of full-file storage/transfer, which we assumed previously), so I would also prefer having a standardized format for that of course. 4) I agree the concrete API endpoints / environment variables might have to be split out of the main document (if we keep them at all). I do note that having a default environment variable name (QLOGDIR) has been useful, as most implementations support this, which is handy for newer users. The well-known URL however has so far not been used by any deployment afaik. In general I think there's value in having some recommendations for this, but agree those might not belong in the main spec. Thank you again for your extensive feedback. I hope you will be part of the continued discussions on this in the future. With best regards, Robin On Fri, 5 Mar 2021 at 17:07, Pieter Lexis <[email protected]> wrote: > Hello Quic-WG, Robin, > > Someone pointed me to draft-marx-qlog-main-schema-02 because "You showed > interest in structured logging". I've had a quick read and have some > initial thoughts. > > The first thought was "yes, there is need for a specified schema for > logs, that can be serialized to a variety of formats". However, the > draft is a bit hand-wavy about the schema and instantiated format. > > For starters, section 1.1 notes the use of a datatype language "inspired > loosely by the "TypeScript" language". This language is not an IETF > standard. The IETF has standardized at least 3 data definition languages: > > 1. ABNF as RFC 5234 [1] > 2. Concise Data Definition Language (CDDL) as RFC 8610 [2] > 3. YANG as RFC 7950 [3] > > Apart from ABNF, both CDDL and YANG have specified how to convert the > instantiated data to JSON (RFC 7951[4] for YANG, CDDL in its own RFC). I > would highly recommend the author to choose either YANG or CDDL to > define all qlog structures. > > Skipping over the schema definition, section 4 deals with the > serialization of qlog. > > The schema has a field that contains the serialization format. But this > serialization is actually metadata. It is up to the parties exchanging > the serialized data to agree on the format (possibly using > Accept/Content-Type headers when using HTTP to transfer and a > file-extension when stored on disk). > > Section 4.1 should be superfluous if the author uses either CDDL or YANG > as a modeling language, as those have defined how to serialize data. > > Section 4.2 then uses a non-IETF serialization format (NDJSON) to > accomplish the streaming property of qlog. In the DNS world, the C-DNS > (RFC 8618[5]) logging format is specified using CDDL, uses CBOR (RFC > 8949[6]) as its primary 'storage' mechanism, using tables inside blocks > to 'compress' repeated data. It implements streaming on a specific level > of the schema. Using such an approach in qlog would mitigate the need of > the "optimization" section (4.3). It is up to the tooling to translate > from CBOR to JSON or any other format the user or tools can read. > > Section 5 then goes into how tools should behave, down to the use of > certain environment variables. This is needlessly restrictive and > stifles any attempt to differentiate between the multitude of tools that > could be developed. > > I hope the WG and author consider these reservations on the draft > seriously. > > Best regards, > > Pieter Lexis > > 1 - https://tools.ietf.org/html/rfc5234 > 2 - https://tools.ietf.org/html/rfc8610 > 3 - https://tools.ietf.org/html/rfc7950 > 4 - https://tools.ietf.org/html/rfc7951 > 5 - https://tools.ietf.org/html/rfc8618 > 6 - https://tools.ietf.org/html/rfc8949 > -- > Pieter Lexis > PowerDNS.COM BV -- https://www.powerdns.com > > -- dr. Robin Marx Postdoc researcher - Web protocols Expertise centre for Digital Media T +32(0)11 26 84 79 - GSM +32(0)497 72 86 94 www.uhasselt.be Universiteit Hasselt - Campus Diepenbeek Agoralaan Gebouw D - B-3590 Diepenbeek Kantoor EDM-2.05
