On 05-Oct-16 08:56, Richard Levitte via RT wrote: > To be noted, there's more in section 2: > > Most extant parsers ignore blanks at the ends of lines; blanks at the > beginnings of lines or in the middle of the base64-encoded data are > far less compatible. These observations are codified in Figure 1. > The most lax parser implementations are not line-oriented at all and > will accept any mixture of whitespace outside of the encapsulation > boundaries (see Figure 2). Such lax parsing may run the risk of > accepting text that was not intended to be accepted in the first > place (e.g., because the text was a snippet or sample). > > I haven't looked enough in our code recently to remember if we're doing > "standard" (figure 1) or "strict" (figure 3) parsing... what I hear is a > request for us to move to "lax" (figure 2) parsing. Yes. Actually, the text is even more lax than the BNF; it says in paragraph 1 that
parsers SHOULD ignore whitespace and other non- base64 characters That is, anything but A-Za-z0-9+/ and = at the end (as pad) should be ignored between the header and the footer. Many decoders do that silently, some warn if the junk isn't whitespace. Let's step back a bit from the letter of the RFCs and consider what brought this up: The real-word issues that drive this are cases like cut and paste of a CSR, certificate, or key from a webpage, terminal window or e-mail. All may re-wrap such that whitespace is introduced or lost. Further, especially with long keys, the text may not all be visible at once, so one ends up scrolling and/or copy/pasting in sections. Again introducing and/or losing white space. And exactly how textboxes on web pages represent EOL and interact with copy/paste varies. Lost newlines can produce long lines, and many base64 encoders (e.g. Perl's MIME::Base64) produce PEM that's longer than 64 characters (e.g. the 72 characters recommended for MIME.) CSRs/Certificates/Keys appear on webpages generated by embedded devices (think NAS, routers), as well as CAs and terminal windows. So while one would like to think that they're never touched by human hands, the reality is that they are. I'm not as concerned about "accepting text that was not intended to be accepted in the first place" because validation of the data will occur. CSRs and certificates are signed, and will fail validation if corrupt. Keys won't work if corrupt. All have to pass ASN.1 parsing, which also will catch many forms of corruption. OpenSSL should accept the CSR that I posted as a test case. Whether to also ignore non-base64 characters is debatable. I vote for warning (e.g. a distinct SUCCESS code that the caller can elect to report or ignore). What's fixed is that there must be a "-----BEGIN" line, and there's little excuse for not having a "-----END" line, though the newline after the "-----END" may be optional. Embedded whitespace must be ignored - which includes that line length is unrestricted. This is something that both humans with a mouse and software can comprehend... The approach I use is to discard all whitespace, check for only base64 + optional pad & ensure that the length, including 0-2 pad (=) at the end is an even multiple of 4 characters long. Otherwise, (non-base64 or not a sane length) I warn but process the input. (A Perl implementation is in the OpenXPKI issue that I cited.) Naturally, I am NOT arguing that PEM can be produced in lax form; this is only about making the input parsing compatible with (RFC-compliant) cases common in the real world. I hope this provides context for your decisions... Timothe Litt ACM Distinguished Engineer -------------------------- This communication may not represent the ACM or my employer's views, if any, on the matters discussed. -- Ticket here: http://rt.openssl.org/Ticket/Display.html?id=4698 Please log in as guest with password guest if prompted -- openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev