Re: Odd character, was Re: buster, ekiga.

2019-07-25 Thread David Wright
On Tue 23 Jul 2019 at 15:31:12 (-0400), Michael Stone wrote:
> On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:
> > I don't see any NUL characters, but x80 as shown below. I'm reading
> > the cached message that mutt downloaded from an IMAP server. Is that
> > different from you?
> 
> I see it as x80 in mutt and x00 in the raw file on the imap server. I
> assume mutt is trying to defang the nul, similar to java's conversion
> to 0xc0 0x80, but I haven't actually looked through the code to
> confirm.

I don't think mutt is doing that. I downloaded a message directly from
my hosting service's IMAP server¹ and that shows <80>, not <00>,
just as mutt does. My experience with mutt is that if a NUL is sent in
a "legitimate"² manner within an email, it causes truncation. I don't
know whether mutt does it or the pager, but as I said elsewhere
it doesn't make me happy.

I'm not sure whether I can get any "closer" to my IMAP server than
that, in order to find whether there's a NUL there; perhaps by
logging in using my credentials? That would require some research
as I don't normally access the service in that way.

One thing we don't know is whether the routes being used by the MTAs
to communicate with each other are 8-bit transparent or not. As
pointed out by tomás, <80> and <00> only differ in the top bit.

> > So it would appear the OP has pasted the Unicode "RIGHT-POINTING
> > MAGNIFYING GLASS" character into their postings, which seems somewhat
> > reasonable as it's used on the Debian web pages to mark all the
> > Message-IDs and references thereto.
> > 
> > Where that gets mangled along the way, I can't guess. but it would see
> > that 0x80 is a reasonable choice as that's a Latin-1 Control Character
> > with the meaning PAD.
> > https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
> 
> I'm not entirely surprised that an MUA that is unaware of the changes
> to internet mail that have happened since the early 80s (codified back
> in 2001) is also unaware of unicode.

My last paragraph wasn't necessarily limited to the behaviour of the
OP's MUA. It's likely the MTAs are more up-to-date that what is
alleged to be a very old MUA.

¹$ curl --url 'imaps://my-hosting-service:993/INBOX;UID=1234' --user 
'my-username:my-password' -o Documents/raw-message

²eg as =00 in a quoted-printable encoded message.

Cheers,
David.



Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread Michael Stone

On Tue, Jul 23, 2019 at 09:35:59PM +0200, to...@tuxteam.de wrote:

On Tue, Jul 23, 2019 at 03:31:12PM -0400, Michael Stone wrote:

On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:
>I don't see any NUL characters, but x80 as shown below. I'm reading
>the cached message that mutt downloaded from an IMAP server. Is that
>different from you?

I see it as x80 in mutt and x00 in the raw file on the imap server.
I assume mutt is trying to defang the nul, similar to java's
conversion to 0xc0 0x80, but I haven't actually looked through the
code to confirm.


Heh. that is strange: with mutt ("edit raw") I do see an x00 (shown
by vim as ^@).

My message doesn't go through an IMAP server, fwiw. Dunno what exim
does to it, though :-)


I guess it's the imapd defanging then, before it gets to mutt. 



Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread tomas
On Tue, Jul 23, 2019 at 03:38:33PM -0400, Greg Wooledge wrote:
> On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:
> > On Tue 23 Jul 2019 at 11:07:37 (-0400), Greg Wooledge wrote:
> > > Yup.  Two NUL bytes in the body of the message.  How completely bizarre.
> > > 
> > > Apparently what mutt does is truncate that *line* at the first NUL
> > > byte, but then show all the other lines after that just fine.
> 
> > I don't see any NUL characters, but x80 as shown below. I'm reading
> > the cached message that mutt downloaded from an IMAP server. Is that
> > different from you?
> 
> In my case, the email is sent first to a Debian 9 system running qmail +
> magic-smtpd [...]


> I'll try to remember to keep a copy of the next one for hex-dumping.

Looking forward :-)

> Meanwhile, as a test, I ran the following from my home system outside
> the workplace firewall:

[...]

Interesting. Note that both your tests have a Content-Transfer-Encoding
(first: 8bit, second: quoted-printable; the first one gets a slight
indigestion, the second not).

The original messages have no Content-Type, much less Content-Transfer-Encoding
(so x00 as well as x80 should be both a no-no anyway).

Cheers
-- t 


signature.asc
Description: Digital signature


Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread Greg Wooledge
On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:
> On Tue 23 Jul 2019 at 11:07:37 (-0400), Greg Wooledge wrote:
> > Yup.  Two NUL bytes in the body of the message.  How completely bizarre.
> > 
> > Apparently what mutt does is truncate that *line* at the first NUL
> > byte, but then show all the other lines after that just fine.

> I don't see any NUL characters, but x80 as shown below. I'm reading
> the cached message that mutt downloaded from an IMAP server. Is that
> different from you?

In my case, the email is sent first to a Debian 9 system running qmail +
magic-smtpd (with possible interference from corporate firewall products
over which I have no control), and from there to my Debian 10 desktop
system, also running qmail, with qmail-smtpd as the receiver.  Mail is
delivered locally on the Debian 10 system to a Maildir in my home
directory, and mutt reads it directly from there.  No IMAP or POP3 for me.

I'll try to remember to keep a copy of the next one for hex-dumping.

Meanwhile, as a test, I ran the following from my home system outside
the workplace firewall:

printf 'Testing \0nul\0\nDid it work?\n' | mailx -s test wool...@eeg.ccf.org

Here's what hd shows (last few lines only):

0370  38 22 0a 43 6f 6e 74 65  6e 74 2d 54 72 61 6e 73  |8".Content-Trans|
0380  66 65 72 2d 45 6e 63 6f  64 69 6e 67 3a 20 38 62  |fer-Encoding: 8b|
0390  69 74 0a 0a 54 65 73 74  69 6e 67 20 0a 44 69 64  |it..Testing .Did|
03a0  20 69 74 20 77 6f 72 6b  3f 0a| it work?.|
03aa

Which probably means mailx on my sender truncates the line with the
raw NUL bytes, and the test is inconclusive.

So, next test:

printf 'Test two \0nul\0\nDid it work?\n' | mutt -s test wool...@eeg.ccf.org

Here's what I got:

03b0  73 66 65 72 2d 45 6e 63  6f 64 69 6e 67 3a 20 71  |sfer-Encoding: q|
03c0  75 6f 74 65 64 2d 70 72  69 6e 74 61 62 6c 65 0a  |uoted-printable.|
03d0  58 2d 4f 70 65 72 61 74  69 6e 67 2d 53 79 73 74  |X-Operating-Syst|
03e0  65 6d 3a 20 4c 69 6e 75  78 20 34 2e 31 39 2e 30  |em: Linux 4.19.0|
03f0  2d 35 2d 61 6d 64 36 34  0a 55 73 65 72 2d 41 67  |-5-amd64.User-Ag|
0400  65 6e 74 3a 20 4d 75 74  74 2f 31 2e 31 30 2e 31  |ent: Mutt/1.10.1|
0410  20 28 32 30 31 38 2d 30  37 2d 31 33 29 0a 0a 54  | (2018-07-13)..T|
0420  65 73 74 20 74 77 6f 20  3d 30 30 6e 75 6c 3d 30  |est two =00nul=0|
0430  30 0a 44 69 64 20 69 74  20 77 6f 72 6b 3f 0a |0.Did it work?.|
043f

... well, that's self-explanatory, isn't it.  I don't feel like writing
a script to send raw NUL bytes through /usr/sbin/sendmail or through
netcat mxhost 25 at this time, so I'll just leave it at that.



Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread tomas
On Tue, Jul 23, 2019 at 03:31:12PM -0400, Michael Stone wrote:
> On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:
> >I don't see any NUL characters, but x80 as shown below. I'm reading
> >the cached message that mutt downloaded from an IMAP server. Is that
> >different from you?
> 
> I see it as x80 in mutt and x00 in the raw file on the imap server.
> I assume mutt is trying to defang the nul, similar to java's
> conversion to 0xc0 0x80, but I haven't actually looked through the
> code to confirm.

Heh. that is strange: with mutt ("edit raw") I do see an x00 (shown
by vim as ^@).

My message doesn't go through an IMAP server, fwiw. Dunno what exim
does to it, though :-)

Cheers
-- t


signature.asc
Description: Digital signature


Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread Michael Stone

On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:

I don't see any NUL characters, but x80 as shown below. I'm reading
the cached message that mutt downloaded from an IMAP server. Is that
different from you?


I see it as x80 in mutt and x00 in the raw file on the imap server. I 
assume mutt is trying to defang the nul, similar to java's conversion to 
0xc0 0x80, but I haven't actually looked through the code to confirm.



So it would appear the OP has pasted the Unicode "RIGHT-POINTING
MAGNIFYING GLASS" character into their postings, which seems somewhat
reasonable as it's used on the Debian web pages to mark all the
Message-IDs and references thereto.

Where that gets mangled along the way, I can't guess. but it would see
that 0x80 is a reasonable choice as that's a Latin-1 Control Character
with the meaning PAD.
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)


I'm not entirely surprised that an MUA that is unaware of the changes to 
internet mail that have happened since the early 80s (codified back in 
2001) is also unaware of unicode. 



Re: Odd character, was Re: buster, ekiga.

2019-07-23 Thread tomas
On Tue, Jul 23, 2019 at 02:19:27PM -0500, David Wright wrote:

[...]

> I don't see any NUL characters, but x80 as shown below [...]

Oh, that's cute :-)

If I followed along correctly, the questionable mails have
neither Content-Type nor Content-Transfer-Encoding. So the
content type defaults to text/plain; charset=us-ascii, right?

If you kill the high bit in x80 you're left with x00.

Hmmm...

Cheers
-- tomás


signature.asc
Description: Digital signature


Odd character, was Re: buster, ekiga.

2019-07-23 Thread David Wright
On Tue 23 Jul 2019 at 11:07:37 (-0400), Greg Wooledge wrote:
> On Tue, Jul 23, 2019 at 07:41:20AM -0700, pe...@easthope.ca wrote:
> > *   From: Brad Rogers 
> 
> Oh, it's this guy again.
> 
> /me looks at the raw mail message with less(1)
> 
> *   From: Brad Rogers ^@b...@fineby.me.uk^@
> 
> Yup.  Two NUL bytes in the body of the message.  How completely bizarre.
> 
> Apparently what mutt does is truncate that *line* at the first NUL
> byte, but then show all the other lines after that just fine.
> 
> Other people are seeing the entire message truncated at that point, not
> just one line truncated.
> 
> Peter, whatever you're doing with your outgoing mail is really strange,
> and if possible, you should try to stop it.  Embedding raw NUL characters
> in the body of an email is a problem.

I don't see any NUL characters, but x80 as shown below. I'm reading
the cached message that mutt downloaded from an IMAP server. Is that
different from you?

17C0 64 2D 73 65 │ 61 72 63 68 │ 2F 45 31 68 │ 70 76 79 69 │ 2D 30 30 30  
d-search/E1hpvyi-000
17D4 31 6E 78 2D │ 4B 6C 40 64 │ 61 6C 74 6F │ 6E 2E 69 6E │ 76 61 6C 69  
1nx-Kl@dalton.invali
17E8 64 0A 52 65 │ 73 65 6E 74 │ 2D 44 61 74 │ 65 3A 20 54 │ 75 65 2C 20  
d.Resent-Date: Tue,
17FC 32 33 20 4A │ 75 6C 20 32 │ 30 31 39 20 │ 31 34 3A 35 │ 37 3A 32 30  
23 Jul 2019 14:57:20
1810 20 2B 30 30 │ 30 30 20 28 │ 55 54 43 29 │ 0A 0A 2A 09 │ 46 72 6F 6D   
+ (UTC)..*.From
1824 3A 20 42 72 │ 61 64 20 52 │ 6F 67 65 72 │ 73 20 80 62 │ 72 61 64 40  : 
Brad Rogers .brad@
1838 66 69 6E 65 │ 62 79 2E 6D │ 65 2E 75 6B │ 80 0A 2A 09 │ 44 61 74 65  
fineby.me.uk..*.Date
184C 3A 20 46 72 │ 69 2C 20 31 │ 39 20 4A 75 │ 6C 20 32 30 │ 31 39 20 31  : 
Fri, 19 Jul 2019 1
1860 39 3A 33 32 │ 3A 34 36 20 │ 2B 30 31 30 │ 30 0A 3E 20 │ 49 74 20 77  
9:32:46 +0100.> It w
1874 61 73 20 72 │ 65 70 6C 61 │ 63 65 64 20 │ 62 79 20 45 │ 6D 70 61 74  
as replaced by Empat
1888 68 79 2E 0A │ 0A 54 68 61 │ 6E 6B 73 2E │ 20 20 45 6D │ 70 61 74 68  
hy...Thanks.  Empath

Well, here's what I think is going on. The OP wrote "The links are
from the debian mailing list software.  128270(9) =
1F50E(E) or 128270(decimal) = 1F50E(hexadecimal).  U+1F50E is beyond
the list in …"

So it would appear the OP has pasted the Unicode "RIGHT-POINTING
MAGNIFYING GLASS" character into their postings, which seems somewhat
reasonable as it's used on the Debian web pages to mark all the
Message-IDs and references thereto.

Where that gets mangled along the way, I can't guess. but it would see
that 0x80 is a reasonable choice as that's a Latin-1 Control Character
with the meaning PAD.
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)

Converting it to NUL seems hazardous to me, almost asking for trouble.

Cheers,
David.