Re: (Not-so) hypothetical question: What to do about NULs?
Ken Hornstein wrote in <20230219012125.2e48b1d7...@pb-smtp21.pobox.com>: |>Seems to me this is classifcation of attachment data, which will end up |>as octet-stream in that case. | |It's ... a little confusing! | |>For S-nail we more or less do what Heirloom mailx has done. | |Well, it seems that in the message lexer if you encounter a NUL you |just stop, from a_msg_scan(): | | cp = mslp->msl_cap->ca_arg.ca_str.s; | if((c = *cp++) != '\0') | break; That seems to come from a command argument parser, not mail content. Ah no, no no, wrong code :) I can assure you that the email From reproducible_build Wed Oct 2 01:50:07 1996 Date: Wed, 02 Oct 1996 01:50:07 + From: e...@am.ple Subject: s3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Status: O Alo=00ha Boom. is decoded (of course) and displayed with the NUL converted to the Unicode graphical for NUL. The same of i make it "binary" and put a real NUL in place of the =00. |It does look like to me that for IMAP and POP a NUL character is handled |properly. But that doesn't answer the question, what do you THINK should Uh i really had to look and try out whether binary data on the input side of IMAP or POP3 properly handles embedded NULs. I would assume yes. (More or less.) |happen? Should NULs be passed through? You basically can't use C strings |anywhere if you want to handle embedded NULs. That is true. |>The implementation is total crap. (longjmp codebase, data leaks, |>blocking I/O, all that (it was).) All of these (mailbox read, |>content-transfer decoding, character set conversion, .. display |>preparation) should be "filters" with input and output plugged together, |>with internal buffers as necessary. That is the v15 MIME and I/O layer |>rewrite that is not happening for nine years. | |Sigh, I know the feeling :-/ A nice Sunday is also not a bad thing. Ciao, --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: (Not-so) hypothetical question: What to do about NULs?
>Seems to me this is classifcation of attachment data, which will end up >as octet-stream in that case. It's ... a little confusing! >For S-nail we more or less do what Heirloom mailx has done. Well, it seems that in the message lexer if you encounter a NUL you just stop, from a_msg_scan(): cp = mslp->msl_cap->ca_arg.ca_str.s; if((c = *cp++) != '\0') break; It does look like to me that for IMAP and POP a NUL character is handled properly. But that doesn't answer the question, what do you THINK should happen? Should NULs be passed through? You basically can't use C strings anywhere if you want to handle embedded NULs. >The implementation is total crap. (longjmp codebase, data leaks, >blocking I/O, all that (it was).) All of these (mailbox read, >content-transfer decoding, character set conversion, .. display >preparation) should be "filters" with input and output plugged together, >with internal buffers as necessary. That is the v15 MIME and I/O layer >rewrite that is not happening for nine years. Sigh, I know the feeling :-/ --Ken
Re: (Not-so) hypothetical question: What to do about NULs?
P.S.: Congratulations to your new release btw. I have written an OAuth helper in Python3 that suports OAuth for GMail, Microsoft, Yandex: curl -u moon:mars --basic -O https://git.sdaoden.eu/browse/s-toolbox.git/plain/oauth-helper.py It has a "manual" mode where it documents for GMail -- How to create a Google registration -- Go to console.developers.google.com, and create a new project. The name doesn't matter and could be "mutt registration project". - Go to Library, choose Gmail API, and enable it - Hit left arrow icon to get back to console.developers.google.com - Choose OAuth Consent Screen - Choose Internal for an organizational G Suite - Choose External if that's your only choice - For Application Name, put for example "Mutt" - Under scopes, choose Add scope, scroll all the way down, enable the "https://mail.google.com/; scope [Note this only allow "internal" users; you get the same mail usage scope by selecting those gmail scopes without any lock symbol! Like this application verification is not needed, and "External" can be chosen.] - Fill out additional fields (application logo, etc) if you feel like it (will make the consent screen look nicer) Maybe this helps! --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: (Not-so) hypothetical question: What to do about NULs?
Ken Hornstein wrote in <20230219001921.597ad1e0...@pb-smtp20.pobox.com>: ... |- mutt ... |[.]Internally mutt does |have an idea if the content contains a NUL (the CONTENT structure contains |a 'nulbin' member which contains the number of NUL bytes), but it's not |clear to me what happens when a NUL is encountered. Seems to me this is classifcation of attachment data, which will end up as octet-stream in that case. For S-nail we more or less do what Heirloom mailx has done. For classification purposes we switch to octet-stream. For display purposes we happily display it after passing it through some kind of makeprint. isuni = ((n_psonce & n_PSO_UNICODE) != 0); ... if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ && wc != '\t'){ if ((wc & ~S(wchar_t,037)) == 0) wc = isuni ? 0x2400 | wc : '?'; else if(wc == 0177) wc = isuni ? 0x2421 : '?'; else wc = isuni ? 0x2426 : '?'; }else if(isuni){ /* TODO ctext */ /* Need to filter out L-TO-R and R-TO-R marks TODO ctext */ if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E)) continue; /* And some zero-width messes */ if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D)) continue; /* Oh about the ISO C wide character interfaces, baby! */ if(wc == 0xFEFF) continue; } Or, without mb* and wc* sausage, { int c; while(inp < maxp){ c = *inp++ & 0377; if(!su_cs_is_print(c) && c != '\n' && c != '\r' && c != '\b' && c != '\t') c = '?'; *outp++ = c; } out->l = in->l; } This is even a degression against Heirloom mailx that Jörg Schilling was very dissatisfied about, as the above only handles ASCII printable regardless of the locale. (My plan was to write a CText library for Unicode handling, and it was quite progressed with only about two months until decomposition and normalization were implemented (Christmas 2014), when something very bad happened. Maybe i will do it someday. Or simply do what OpenBSD does and use perl's fantastic Unicode support to generate some tables.) The implementation is total crap. (longjmp codebase, data leaks, blocking I/O, all that (it was).) All of these (mailbox read, content-transfer decoding, character set conversion, .. display preparation) should be "filters" with input and output plugged together, with internal buffers as necessary. That is the v15 MIME and I/O layer rewrite that is not happening for nine years. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
(Not-so) hypothetical question: What to do about NULs?
I've been idly thinking about this for a while, and while the question might be simple I think it gets at some larger meta-issues we have never really agreed on how to resolve it properly. My question is, simply: What should happen when nmh encounters a NUL character (U+) in email? The rules - In theory, a NUL is never permitted in an email message. RFC 5322 (the latest incarnation of RFC 822) says in §4: Finally, certain characters that were formerly allowed in messages appear in this section. The NUL character (ASCII value 0) was once allowed, but is no longer for compatibility reasons. However, in §4.1 a NUL character is added to the BNF for obs-utext and obs-body, so in THEORY you are supposed to handle that if you handle obsolete messages. §4 also says: Note: This section identifies syntactic forms that any implementation MUST reasonably interpret. However, there are certainly Internet messages that do not conform to even the additional syntax given in this section. The fact that a particular form does not appear in any section of this document is not justification for computer programs to crash or for malformed data to be irretrievably lost by any implementation. It is up to the implementation to deal with messages robustly. RFC 5322 punts some of the message syntax back to the MIME RFCs. The "binary" content transfer encoding does allow any octet including NUL characters. But RFC 2045 says in §6.2: Mail transport for unencoded 8bit data is defined in RFC 1652. As of the initial publication of this document, there are no standardized Internet mail transports for which it is legitimate to include unencoded binary data in mail bodies. Thus there are no circumstances in which the "binary" Content-Transfer-Encoding is actually valid in Internet mail. However, in the event that binary mail transport becomes a reality in Internet mail, or when MIME is used in conjunction with any other binary-capable mail transport mechanism, binary bodies must be labelled as such using this mechanism. RFC 9051 (IMAP4rev2) says in §4.3.1: IMAP4rev2 is compatible with [I18N-HDRS]. As a result, the identified charset for header-field values with 8-bit content is UTF-8 [UTF-8]. IMAP4rev2 implementations MUST accept and MAY transmit [UTF-8] text in quoted-strings as long as the string does not contain NUL, CR, or LF. This differs from IMAP4rev1 implementations. Although a BINARY content transfer encoding is defined, unencoded binary strings are not permitted, unless returned in a in response to a BINARY.PEEK[]<> or BINARY[]<> FETCH data item. A "binary string" is any string with NUL characters. A string with an excessive amount of CTL characters MAY also be considered to be binary. Unless returned in response to BINARY.PEEK[...]/BINARY[...] FETCH, client and server implementations MUST encode binary data into a textual form, such as base64, before transmitting the data. So it's ... a bit wishy-washy, but I think the case for NUL not being valid is mostly okay. IMAP, at least, says you can't send a NUL unless you are getting a BINARY response with the special literal8 response format (and BINARY is not defined in RFC 3501). Messages in the real world -- While other rules seem to be violated with impunity (see: 16MB single lines) I am not aware of bare NULs commonly being sent in email messages today. Also, I am not aware of "binary" being used as a C-T-E at all. Now, I could be COMPLETELY wrong about this! It would be interesting to hear about use of the binary CTE or other occurances of NUL characters in the wild. My impression is that if you are getting binary data, it is universally encoded with base64; that it something everyone seems to be doing. And a NUL character doesn't seem to be valid in non-ASCII character sets as anything other than a NUL. How other mail programs deal with NULs -- I was curious, so I took a look. I tried to look at "modern" mail programs, and by that I mean, "Seems to be kept up to date". Which sadly excludes Heirloom mailx as it seems to had it's last release in 2005. I am open to hearing about what other mail program do. - fetchmail Fetchmail uncerimously just smashes any NUL characters it sees, so if you are retrieving messages using fetchmail you never see any NUL characters. From transact.c: /* * Smash out any NULs, they could wreak havoc later on. * Some network stacks seem to generate these at random, * especially (according to reports) at the beginning of the * first read. NULs are illegal in RFC822 format. */ You might get a special header warning you that a message had an embedded NUL, though. - alpine Internally alpine (which uses a lot of
nmh 1.8 is now available!
Greetings all, I am pleased to announce that after nearly five years we are finally releasing nmh 1.8. The source code release is now available and can be downloaded from: http://download-mirror.savannah.gnu.org/releases/nmh/nmh-1.8.tar.gz There is an accompanying .sig file for GPG verification. MIME external-body pointers to the above files are included in this message. This release includes a large number of enhancements and bug fixes. The NEWS file included in the distribution contains greater details, but the highlights are: - Support for Content-MD5 header fields, MIME content cache functionality, and the message/partial MIME type have been removed. - Gmail OAuth2/XOAUTH support for desktop applications has been effectively dropped, so nmh no longer supports it. nmh support for Gmail API access is experimental, please post to nmh-workers@nongnu.org if you'd like to help with test and development. - repl(1) -convertargs now allows editing of the composition draft between translation and any encoding of text content. Because encoding can wrap long lines, the use of a paragraph formatter has been removed from mhn.defaults. This release is dedicated to Norman Z. Shapiro, co-designer of the MH Message Handling System. MH is the predecessor of nmh. Norm was an active supporter of nmh development until he passed away in October of 2021. We are most grateful to Norm for his stewardship of MH and nmh. https://en.wikipedia.org/wiki/Norman_Shapiro Thanks to all of the contributors for their hard work and to everyone who tried out the release candidates and gave feedback; it is very much appreciated. As always, please report feedback to nmh-workers@nongnu.org David Levine on behalf of the nmh development team Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers