Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
Ken Hornstein wrote in
 <20230219012125.2e48b1d7...@pb-smtp21.pobox.com>:
 |>Seems to me this is classifcation of attachment data, which will end up
 |>as octet-stream in that case.
 |
 |It's ... a little confusing!
 |
 |>For S-nail we more or less do what Heirloom mailx has done.
 |
 |Well, it seems that in the message lexer if you encounter a NUL you
 |just stop, from a_msg_scan():
 |
 |  cp = mslp->msl_cap->ca_arg.ca_str.s;
 |  if((c = *cp++) != '\0')
 | break;

That seems to come from a command argument parser, not mail
content.  Ah no, no no, wrong code :)
I can assure you that the email

  From reproducible_build Wed Oct  2 01:50:07 1996
  Date: Wed, 02 Oct 1996 01:50:07 +
  From: e...@am.ple
  Subject: s3
  MIME-Version: 1.0
  Content-Type: text/plain; charset=utf-8
  Content-Transfer-Encoding: quoted-printable
  Status: O

  Alo=00ha
  Boom.

is decoded (of course) and displayed with the NUL converted to the
Unicode graphical for NUL.
The same of i make it "binary" and put a real NUL in place of the
=00.

 |It does look like to me that for IMAP and POP a NUL character is handled
 |properly.  But that doesn't answer the question, what do you THINK should

Uh i really had to look and try out whether binary data on the
input side of IMAP or POP3 properly handles embedded NULs.
I would assume yes.  (More or less.)

 |happen?  Should NULs be passed through?  You basically can't use C strings
 |anywhere if you want to handle embedded NULs.

That is true.

 |>The implementation is total crap. (longjmp codebase, data leaks,
 |>blocking I/O, all that (it was).)  All of these (mailbox read,
 |>content-transfer decoding, character set conversion, .. display
 |>preparation) should be "filters" with input and output plugged together,
 |>with internal buffers as necessary.  That is the v15 MIME and I/O layer
 |>rewrite that is not happening for nine years.
 |
 |Sigh, I know the feeling :-/

A nice Sunday is also not a bad thing.
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Ken Hornstein
>Seems to me this is classifcation of attachment data, which will end up
>as octet-stream in that case.

It's ... a little confusing!

>For S-nail we more or less do what Heirloom mailx has done.

Well, it seems that in the message lexer if you encounter a NUL you
just stop, from a_msg_scan():

  cp = mslp->msl_cap->ca_arg.ca_str.s;
  if((c = *cp++) != '\0')
 break;

It does look like to me that for IMAP and POP a NUL character is handled
properly.  But that doesn't answer the question, what do you THINK should
happen?  Should NULs be passed through?  You basically can't use C strings
anywhere if you want to handle embedded NULs.

>The implementation is total crap. (longjmp codebase, data leaks,
>blocking I/O, all that (it was).)  All of these (mailbox read,
>content-transfer decoding, character set conversion, .. display
>preparation) should be "filters" with input and output plugged together,
>with internal buffers as necessary.  That is the v15 MIME and I/O layer
>rewrite that is not happening for nine years.

Sigh, I know the feeling :-/

--Ken



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
P.S.:

Congratulations to your new release btw.

I have written an OAuth helper in Python3 that suports OAuth for
GMail, Microsoft, Yandex:

  curl -u moon:mars --basic -O 
https://git.sdaoden.eu/browse/s-toolbox.git/plain/oauth-helper.py

It has a "manual" mode where it documents for GMail

  -- How to create a Google registration --

  Go to console.developers.google.com, and create a new project. The name 
doesn't
  matter and could be "mutt registration project".

   - Go to Library, choose Gmail API, and enable it
   - Hit left arrow icon to get back to console.developers.google.com
   - Choose OAuth Consent Screen
  - Choose Internal for an organizational G Suite
  - Choose External if that's your only choice
  - For Application Name, put for example "Mutt"
  - Under scopes, choose Add scope, scroll all the way down, enable the
"https://mail.google.com/; scope
  [Note this only allow "internal" users; you get the same mail usage scope
  by selecting those gmail scopes without any lock symbol!
  Like this application verification is not needed, and "External" can be
  chosen.]
  - Fill out additional fields (application logo, etc) if you feel like 
it
(will make the consent screen look nicer)

Maybe this helps!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: (Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Steffen Nurpmeso
Ken Hornstein wrote in
 <20230219001921.597ad1e0...@pb-smtp20.pobox.com>:
 ...
 |- mutt
 ...
 |[.]Internally mutt does
 |have an idea if the content contains a NUL (the CONTENT structure contains
 |a 'nulbin' member which contains the number of NUL bytes), but it's not
 |clear to me what happens when a NUL is encountered.

Seems to me this is classifcation of attachment data, which will
end up as octet-stream in that case.

For S-nail we more or less do what Heirloom mailx has done.
For classification purposes we switch to octet-stream.
For display purposes we happily display it after passing it
through some kind of makeprint.

  isuni = ((n_psonce & n_PSO_UNICODE) != 0);
  ...
 if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
   wc != '\t'){
if ((wc & ~S(wchar_t,037)) == 0)
   wc = isuni ? 0x2400 | wc : '?';
else if(wc == 0177)
   wc = isuni ? 0x2421 : '?';
else
   wc = isuni ? 0x2426 : '?';
 }else if(isuni){ /* TODO ctext */
/* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
   continue;
/* And some zero-width messes */
if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
   continue;
/* Oh about the ISO C wide character interfaces, baby! */
if(wc == 0xFEFF)
   continue;
 }

Or, without mb* and wc* sausage,

   {
  int c;
  while(inp < maxp){
 c = *inp++ & 0377;
 if(!su_cs_is_print(c) &&
   c != '\n' && c != '\r' && c != '\b' && c != '\t')
c = '?';
 *outp++ = c;
  }
  out->l = in->l;
   }

This is even a degression against Heirloom mailx that Jörg
Schilling was very dissatisfied about, as the above only handles
ASCII printable regardless of the locale.  (My plan was to write
a CText library for Unicode handling, and it was quite progressed
with only about two months until decomposition and normalization
were implemented (Christmas 2014), when something very bad
happened.  Maybe i will do it someday.  Or simply do what OpenBSD
does and use perl's fantastic Unicode support to generate some
tables.)

The implementation is total crap.  (longjmp codebase, data leaks,
blocking I/O, all that (it was).)  All of these (mailbox read,
content-transfer decoding, character set conversion, .. display
preparation) should be "filters" with input and output plugged
together, with internal buffers as necessary.  That is the v15
MIME and I/O layer rewrite that is not happening for nine years.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



(Not-so) hypothetical question: What to do about NULs?

2023-02-18 Thread Ken Hornstein
I've been idly thinking about this for a while, and while the question
might be simple I think it gets at some larger meta-issues we have never
really agreed on how to resolve it properly.

My question is, simply: What should happen when nmh encounters a NUL
character (U+) in email?

The rules
-

In theory, a NUL is never permitted in an email message.  RFC 5322 (the
latest incarnation of RFC 822) says in §4:

   Finally, certain characters that were formerly allowed in messages
   appear in this section.  The NUL character (ASCII value 0) was once
   allowed, but is no longer for compatibility reasons.

However, in §4.1 a NUL character is added to the BNF for obs-utext and
obs-body, so in THEORY you are supposed to handle that if you handle
obsolete messages.  §4 also says:

  Note: This section identifies syntactic forms that any
  implementation MUST reasonably interpret.  However, there are
  certainly Internet messages that do not conform to even the
  additional syntax given in this section.  The fact that a
  particular form does not appear in any section of this document is
  not justification for computer programs to crash or for malformed
  data to be irretrievably lost by any implementation.  It is up to
  the implementation to deal with messages robustly.

RFC 5322 punts some of the message syntax back to the MIME RFCs.
The "binary" content transfer encoding does allow any octet including
NUL characters.  But RFC 2045 says in §6.2:

   Mail transport for unencoded 8bit data is defined in RFC 1652.  As of
   the initial publication of this document, there are no standardized
   Internet mail transports for which it is legitimate to include
   unencoded binary data in mail bodies.  Thus there are no
   circumstances in which the "binary" Content-Transfer-Encoding is
   actually valid in Internet mail.  However, in the event that binary
   mail transport becomes a reality in Internet mail, or when MIME is
   used in conjunction with any other binary-capable mail transport
   mechanism, binary bodies must be labelled as such using this
   mechanism.

RFC 9051 (IMAP4rev2) says in §4.3.1:

   IMAP4rev2 is compatible with [I18N-HDRS]. As a result, the identified
   charset for header-field values with 8-bit content is UTF-8
   [UTF-8]. IMAP4rev2 implementations MUST accept and MAY transmit
   [UTF-8] text in quoted-strings as long as the string does not contain
   NUL, CR, or LF. This differs from IMAP4rev1 implementations.

   Although a BINARY content transfer encoding is defined, unencoded
   binary strings are not permitted, unless returned in a 
   in response to a BINARY.PEEK[]<> or
   BINARY[]<> FETCH data item. A "binary string"
   is any string with NUL characters. A string with an excessive amount
   of CTL characters MAY also be considered to be binary. Unless returned
   in response to BINARY.PEEK[...]/BINARY[...] FETCH, client and server
   implementations MUST encode binary data into a textual form, such as
   base64, before transmitting the data.

So it's ... a bit wishy-washy, but I think the case for NUL not being
valid is mostly okay.  IMAP, at least, says you can't send a NUL unless
you are getting a BINARY response with the special literal8 response
format (and BINARY is not defined in RFC 3501).

Messages in the real world
--

While other rules seem to be violated with impunity (see: 16MB single
lines) I am not aware of bare NULs commonly being sent in email messages
today.  Also, I am not aware of "binary" being used as a C-T-E at all.
Now, I could be COMPLETELY wrong about this!  It would be interesting to
hear about use of the binary CTE or other occurances of NUL characters
in the wild.

My impression is that if you are getting binary data, it is universally
encoded with base64; that it something everyone seems to be doing.  And
a NUL character doesn't seem to be valid in non-ASCII character sets
as anything other than a NUL.

How other mail programs deal with NULs
--

I was curious, so I took a look.  I tried to look at "modern" mail programs,
and by that I mean, "Seems to be kept up to date".  Which sadly excludes
Heirloom mailx as it seems to had it's last release in 2005.  I am open
to hearing about what other mail program do.

- fetchmail

Fetchmail uncerimously just smashes any NUL characters it sees, so if
you are retrieving messages using fetchmail you never see any NUL
characters.  From transact.c:

/*
 * Smash out any NULs, they could wreak havoc later on.
 * Some network stacks seem to generate these at random,
 * especially (according to reports) at the beginning of the
 * first read.  NULs are illegal in RFC822 format.
 */

You might get a special header warning you that a message had an
embedded NUL, though.

- alpine

Internally alpine (which uses a lot of 

nmh 1.8 is now available!

2023-02-18 Thread David Levine
Greetings all,

I am pleased to announce that after nearly five years we are
finally releasing nmh 1.8.  The source code release is now
available and can be downloaded from:

http://download-mirror.savannah.gnu.org/releases/nmh/nmh-1.8.tar.gz

There is an accompanying .sig file for GPG verification.  MIME
external-body pointers to the above files are included in this message.

This release includes a large number of enhancements and bug fixes.
The NEWS file included in the distribution contains greater details,
but the highlights are:

- Support for Content-MD5 header fields, MIME content cache functionality,
  and the message/partial MIME type have been removed.
- Gmail OAuth2/XOAUTH support for desktop applications has been effectively
  dropped, so nmh no longer supports it.  nmh support for Gmail API access
  is experimental, please post to nmh-workers@nongnu.org if you'd like to
  help with test and development.
- repl(1) -convertargs now allows editing of the composition draft between
  translation and any encoding of text content.  Because encoding can wrap
  long lines, the use of a paragraph formatter has been removed from
  mhn.defaults.

This release is dedicated to Norman Z. Shapiro, co-designer of the MH
Message Handling System.  MH is the predecessor of nmh.  Norm was an
active supporter of nmh development until he passed away in October of
2021.  We are most grateful to Norm for his stewardship of MH and nmh.
https://en.wikipedia.org/wiki/Norman_Shapiro

Thanks to all of the contributors for their hard work and to everyone
who tried out the release candidates and gave feedback; it is very much
appreciated.

As always, please report feedback to nmh-workers@nongnu.org

David Levine
on behalf of the nmh development team

Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers