Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-20 Thread David Levine
Lyndon wrote:

> I'll add it when if/when I need it.

How bout I change the LC_ALL in the setlocale() to
LC_CTYPE?  Then you can override with $LC_ALL.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Lyndon Nerenberg

> On Oct 19, 2016, at 5:51 PM, David Levine  wrote:
> 
> It's easy to add later if we need,
> hard to take away.

Carry on. I'll add it when if/when I need it.

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread David Levine
Lyndon wrote:

> The overhead of $NMH_LANG isn't even measurable, and the code path is trivial.

Yes.  But, I just don't want to burden the maintainers, and add
roughage to the documentation and test suite, for a feature that
I don't think will be used.  It's easy to add later if we need,
hard to take away.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Lyndon Nerenberg
>> 1) command line (-locale)
>> 2) mh-command-specific profile entry (.mh-profile)
>> 3) environment ($NMH_LANG)
>> 4) profile default override (.mh-profile)
>> 5) OS environment default (locale())
>> 
>> It's arguable about the ordering of (2) and (3).  But if I really needed 
>> this level of control in real life, I can't see how I would have both in 
>> play.
> 
> OK.  I'm not going to add a -locale switch, that seems to me to move this
> too close to nmh.  I like the locale profile component.  Let me ask those
> who might use it:  do you also want the NMH_LANG environment variable?  I'd
> rather not add a feature that won't be used.

If we drop (1), then (3) is the only alternative, so I would say drop (1), and 
reverse (2) and (3).  

The overhead of $NMH_LANG isn't even measurable, and the code path is trivial.
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Laura Creighton
In a message of Wed, 19 Oct 2016 11:38:45 -0400, David Levine writes:
>I'll do 1), but the other steps would be up to you.

Thank you very much.

Laura

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Laura Creighton
In a message of Wed, 19 Oct 2016 11:46:31 -0400, David Levine writes:
>OK.  I'm not going to add a -locale switch, that seems to me to move this
>too close to nmh.  I like the locale profile component.  Let me ask those
>who might use it:  do you also want the NMH_LANG environment variable?  I'd
>rather not add a feature that won't be used.
>
>David

I don't think I would be using it.

Laura


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Tom Lane
David Levine  writes:
> OK.  I'm not going to add a -locale switch, that seems to me to move this
> too close to nmh.  I like the locale profile component.  Let me ask those
> who might use it:  do you also want the NMH_LANG environment variable?  I'd
> rather not add a feature that won't be used.

For my purposes, a locale profile component would be sufficient.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread David Levine
Lyndon wrote:

> I.e., the standard (MH-modified) UNIX model is:
>
> 1) command line (-locale)
> 2) mh-command-specific profile entry (.mh-profile)
> 3) environment ($NMH_LANG)
> 4) profile default override (.mh-profile)
> 5) OS environment default (locale())
>
> It's arguable about the ordering of (2) and (3).  But if I really needed this 
> level of control in real life, I can't see how I would have both in play.

OK.  I'm not going to add a -locale switch, that seems to me to move this
too close to nmh.  I like the locale profile component.  Let me ask those
who might use it:  do you also want the NMH_LANG environment variable?  I'd
rather not add a feature that won't be used.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread David Levine
Laura wrote:

> And, of course, I do use attach at the What Now? prompt.

So, another approach to get what you want (8-bit message even if
it contains no 8-bit characters) would be to do all this:

1) Add support for locale profile component to nmh.
2) Use latest nmh (HEAD of master) or upcoming nmh 1.7.
3) Add "locale: " to profile.
4a) Add "mhbuild -headerencoding utf-8" to profile.  This
requires that your SMTP server support SMTPUTF8.  And
it would have the side effect of enabling EAI (RFC 6531),
thereby permitting 8-bit characters in addresses.
or
4b) Put an 8-bit character in the body of every message.  You
could do this in components files.

I'll do 1), but the other steps would be up to you.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-19 Thread Laura Creighton
And, of course, I do use attach at the What Now? prompt.

Laura


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread David Levine
Laura wrote:

> I don't know about Tom, but my problem isn't that I want to send valid
> US-ASCII emails, but that I _never_ want to send valid us-ascii emails.

> And the last thing I want to happen is for nmh to not be able to
> tell what I want, stick in us-ascii (because the thing needs a
> content header, and that's the default),

Just to note that without the Content-Type and MIME-Version
headers, the message is assumed to be us-ascii.

> and then give me grief because the mail I assembled with some script somewhere
> contained lots of invalid us-ascii.

mhbuild should do the right thing if you run it after putting
everything into your draft.  But I understand that some, including
me, add to the draft after running mhbuild (mime at the What Now?
prompt).

> I thought that mh_profile would be a good place to send a love letter to
> nmh.  "Dear nmh. I see that you cleverly concluded that I wanted
> us-ascii.  Alas, you were wrong.  Just be a good chap and give me utf-8.
> Thank you."

I don't think the locale profile entry (and $NMH_LANG) will do
what you want.  Perhaps what you want is to do what Ralph does,
and put something similar this in your components files:

MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

But only if you don't use attach at the What Now? prompt.  If you do, I
think that we want to figure out how to force mhbuild to always go 8bit,
with an appropriate charset.  There should be an easy way, such as a
signature block with at least one 8-bit character in it.  (Or with current
HEAD on master and upcoming 1.7, a message header with such.)

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Laura Creighton
In a message of Tue, 18 Oct 2016 15:13:49 +0100, Ralph Corderoy writes:
>There's a mechanism for telling a hierarchy of programs their locale;
>environment variables.  You're using it, but you're telling some of them
>a different locale to what you really want them to use.

People do this a lot around here.  Due to the extremely poor placement
of the '{' and '}' keys, right-alt-shifted-7 and right-alt-shifted-0
(yuck!) people who need to type braces at speed pop in and out of
us-ascii all the time.  I suspect these days there are less invasive ways
to say 'change my keyboard layout' but this is what people have become
used to.  

Laura


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ralph Corderoy
Hi Tom,

> All of these solutions presuppose that this is my problem and not the
> software's.  I respectfully disagree.

Me too.  :-)

There's a mechanism for telling a hierarchy of programs their locale;
environment variables.  You're using it, but you're telling some of them
a different locale to what you really want them to use.  That's not the
software's fault.  If I use mail(1) here in the C locale to send an
email and give it non-ASCII characters then they are ignored, don't make
it into the email at all, and that email isn't MIME, thus it's US-ASCII
and valid at that because non-ASCII has been (silently) stripped.  It
offers only environment variables to alter its locale.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ken Hornstein
>For the moment, I've worked around the problem by launching exmh (and
>nothing else) in en_US.utf8 locale, so that the nmh calls all inherit
>that.  But I regard that as a hack not a fix.  It affects directory
>listings done by exmh, e.g. in save-to-file dialogs, and there may be
>other side-effects as well; I haven't been using this workaround for
>long enough to know.

If things like the ls sorting order is your concern, as others have
pointed out you could simply just use LC_CTYPE; that shouldn't affect
any collation ordering.

>All of these solutions presuppose that this is my problem and not the
>software's.  I respectfully disagree.  I would like it to "just work"
>whether or not there are stray UTF8 characters.

I do not know how we could make it "just work".  The OLD solution was to
send out incorrectly-formatted messages; that's simply not a reasonable
option anymore.  It wasn't really a reasonable option for 20 years,
honestly, but we're kind of slow with regards to that.  You don't want to
tell nmh what the character set is (the locale is the EASIEST option,
but there are others - all of them involve you changing something).

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Laura Creighton
In a message of Tue, 18 Oct 2016 14:17:24 +0100, Ralph Corderoy writes:
>Hi Tom,
>
>If you just have one long-running Emacs then can't that be in the UTF-8
>locale?  Or is your C-needing ls(1) run from inside that?
>
>Have Emacs highlight non-ASCII characters in that mode wherever they
>come from, e.g. paste from web browser?  Have a function that maps the
>common ones to ASCII, perhaps using recode(1)?  Filter the buffer when
>writing the file, erroring if it can't be written?  Then you can send
>valid US-ASCII emails.
>
>-- 
>Cheers, Ralph.
>https://plus.google.com/+RalphCorderoy

I don't know about Tom, but my problem isn't that I want to send valid
US-ASCII emails, but that I _never_ want to send valid us-ascii emails.  Even
when replying to mail that is encoded us-ascii, or when sitting at
a workstation that isn't mine, which has that as its locale, or, has
no locale set and is defaulting to C or however else you can get nmh
to conclude you want us-ascii ...  And the last thing I want to
happen is for nmh to not be able to tell what I want, stick in us-ascii
(because the thing needs a content header, and that's the default), and
then give me grief because the mail I assembled with some script somewhere
contained lots of invalid us-ascii.  This, I thought, was what was happening
to Tom, or whoever it was who had the bad emacs-nmh experience.

I thought that mh_profile would be a good place to send a love letter to
nmh.  "Dear nmh. I see that you cleverly concluded that I wanted
us-ascii.  Alas, you were wrong.  Just be a good chap and give me utf-8.
Thank you."

Laura

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Tom Lane
Ralph Corderoy  writes:
> If you just have one long-running Emacs then can't that be in the UTF-8
> locale?  Or is your C-needing ls(1) run from inside that?

I'd rather not run it in non-C locale (for one thing, as you say, shells
run inside it would tend to inherit that locale).  And I don't really
see what that would change anyway.  The nmh calls are made from exmh,
which is a sibling not a child of the emacs process.

For the moment, I've worked around the problem by launching exmh (and
nothing else) in en_US.utf8 locale, so that the nmh calls all inherit
that.  But I regard that as a hack not a fix.  It affects directory
listings done by exmh, e.g. in save-to-file dialogs, and there may be
other side-effects as well; I haven't been using this workaround for
long enough to know.  If it's decided that there will be no solution
provided at the nmh level, I'll probably look into injecting extra
code to set the locale envvars in exmh's nmh calls.

> Have Emacs highlight non-ASCII characters in that mode wherever they
> come from, e.g. paste from web browser?  Have a function that maps the
> common ones to ASCII, perhaps using recode(1)?  Filter the buffer when
> writing the file, erroring if it can't be written?  Then you can send
> valid US-ASCII emails.

All of these solutions presuppose that this is my problem and not the
software's.  I respectfully disagree.  I would like it to "just work"
whether or not there are stray UTF8 characters.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ralph Corderoy
Hi Tom,

If you just have one long-running Emacs then can't that be in the UTF-8
locale?  Or is your C-needing ls(1) run from inside that?

Have Emacs highlight non-ASCII characters in that mode wherever they
come from, e.g. paste from web browser?  Have a function that maps the
common ones to ASCII, perhaps using recode(1)?  Filter the buffer when
writing the file, erroring if it can't be written?  Then you can send
valid US-ASCII emails.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Tom Lane
Ken Hornstein  writes:
> As I understand it, Tom said his problem was when he forwarded some
> email to someone else and it contained 8-bit characters.  I suspect this
> was done with "forw" (or the Forward button in exmh).

Just for the record, I didn't say that; I rarely use "forw".  The more
common scenario for me is that I'm replying to someone and quoting bits
of their message in-line (as I'm doing here), and the most common specific
gotcha is that somebody's using fancy quotes rather than plain ASCII ones
in the quoted text.

Most of the text-munging involved in that doesn't use nmh at all AFAIK ---
it's all in Emacs MH-Letter mode macros.  And the macro for yanking a
message into the buffer and prefixing "> " to each line doesn't pay any
attention to the headers of said message, so it's not going to absorb any
character set attributions from it.  Fortunately for me, there's little
enough non-UTF8 stuff in my mail traffic that I can afford to ignore the
possibility that what I'm quoting isn't UTF8.

I could probably teach the Emacs code to insert a Content-type header
just before sending, if the buffer contains any non-ASCII characters.
But I don't really see why I should have to, when nmh already contains
exactly the logic I need, it's just not packaged in a
conveniently-controllable way.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Paul Fox
ken wrote:
 > >> (And it occurs to me that even setting the locale properly probably
 > >> will not fix your specific problem, as you have described it;
 > >> forwarding messages using MIME will).
 > >
 > >I don't think this got addressed.  If a nmh-specific locale doesn't fix
 > >the problem then let's not add it.
 > 
 > As I understand it, Tom said his problem was when he forwarded some
 > email to someone else and it contained 8-bit characters.  I suspect this
 > was done with "forw" (or the Forward button in exmh).
 > 
 > Locale settings aside, there's no way for the editor to know that arbitrary
 > character from another message is UTF-8, ISO-8859-1, or anything else.
 > That information _IS_ in the forwarded message, but with plain old forw
 > it's lost.  If you use forw -mime, then it all works; the downside there is
 > you need to know to run mhbuild on that message.

and both of those things (making "forw -mime" the default, and running
mhbuild) are on the list for 1.7, correct?  so does that mean that this
problem is/was already on the path to being fixed?

paul
=--
paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 52.0 degrees)


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ken Hornstein
>But we aren't. I am saying UTF-8 is the native internal character
>set. What happens at the boundaries becomes everyone else's problem.
>And after all the grief in this discussion over the last five+ years,
>don't you think it should be someone else's problem?

Yeah, this is the part I don't understand.  Let's say UTF-8 becomes the
native internal character set.  What is the gain?  I'm perfectly willing
to say I am missing something here.

AFAIK, it doesn't help _input_; we still have to convert upon reading a
message (even if we store messages in UTF-8, we still have to convert
the first time we get them).  It doesn't help _output_; we still need to
convert to the user's native character set.

While I understand why programs editors and terminals have a native internal
character set, we don't really need to process individual characters
like they do.  We mostly treat the string data as opaque blobs except
for a few circumstances, and those are relatively straightforward to
handle.  So, I'm really trying hard to see the gain here.  Where we tend
to run into problems is the boundary between nmh and the user; I don't
see how using UTF-8 internally fixes it.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ken Hornstein
>> (And it occurs to me that even setting the locale properly probably
>> will not fix your specific problem, as you have described it;
>> forwarding messages using MIME will).
>
>I don't think this got addressed.  If a nmh-specific locale doesn't fix
>the problem then let's not add it.

As I understand it, Tom said his problem was when he forwarded some
email to someone else and it contained 8-bit characters.  I suspect this
was done with "forw" (or the Forward button in exmh).

Locale settings aside, there's no way for the editor to know that arbitrary
character from another message is UTF-8, ISO-8859-1, or anything else.
That information _IS_ in the forwarded message, but with plain old forw
it's lost.  If you use forw -mime, then it all works; the downside there is
you need to know to run mhbuild on that message.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-18 Thread Ralph Corderoy
Hi,

Ken wrote:
> And if we're voting ... I would rather have only one additional way to
> specify a nmh-specific locale (well, I'd rather have ZERO additional
> ways, but I think more than one way is overkill).

I'd rather have zero.  :-)  Anything above that surely warrants an
nmhlocale(1).

> (And it occurs to me that even setting the locale properly probably
> will not fix your specific problem, as you have described it;
> forwarding messages using MIME will).

I don't think this got addressed.  If a nmh-specific locale doesn't fix
the problem then let's not add it.  And if it's only ls's C-locale
collating order that's wanted, then why not Paul's solution of
nearly-all UTF-8, or a ~/bin/ls?  As a user of non-UTF-8 locales for a
long time after the world moved on, it really wasn't that bad switching
to it.  Paul's done the same.  It can be done with exceptions for what
still needs C, not the other way around.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 8:28 PM, Tom Lane  wrote:
> 
> The current state of affairs is that nmh unconditionally assumes that
> non-ASCII input is in the character set specified by the LC_CTYPE
> environment variable (modulo the various ways that that can be specified).
> What I'm suggesting would allow the environment to be overridden by an
> mh_profile entry.  There is zero difference from an epistemologic
> standpoint: either way you're trusting the user to know what her data is.

And what I am arguing is that this override might often be on a per-message 
basis, thus the $NMH_LANG escape for the programs calling the underlying nmh 
commands.

NMH_LANG might be a horribly inappropriate name, and well met.  Figure out the 
colour of the bikeshed, but at least build the damn thing.


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Ken Hornstein  writes:
> I ... do not think this would solve this particular problem.  The issue
> here seems to be a) nmh programs were given 8 bit characters, and b)
> the locale was set to US-ASCII.  If you are going to assume that all
> INPUT is unconditionally UTF-8, then yes, that would solve this problem.

Umm ... I think you are attacking a straw man.

The current state of affairs is that nmh unconditionally assumes that
non-ASCII input is in the character set specified by the LC_CTYPE
environment variable (modulo the various ways that that can be specified).
What I'm suggesting would allow the environment to be overridden by an
mh_profile entry.  There is zero difference from an epistemologic
standpoint: either way you're trusting the user to know what her data is.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 8:12 PM, Lyndon Nerenberg  wrote:
> 
>> If you are going to assume that all
>> INPUT is unconditionally UTF-8,

I don't.  Sorry, I missed that on my original rant.  LC_CTYPE is what we use 
inbound to convert un-labelled characters to UTF-8.  We still use UTF-8 
everywhere, internally.
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg
> If you are going to assume that all
> INPUT is unconditionally UTF-8, then yes, that would solve this problem.
> But you say above you want to use LANG/LC_CTYPE to convert to UTF-8 on
> input; that would have failed given the problem as stated.

Two problems:

1) original input from the nmh user (composition). What I described works.

2) deciphering any external unlabeled content. this cannot be done reliably. as 
others have said, punt.

3) output well formed content. if we have utf8 internally, we can *always* do 
that (according to the locale()).

> And like I've said before: I think this effort would a) require a new
> library dependency (for UTF-8 processing, since we couldn't use the
> locale functions anymore)

I can (have already) import that from plan9port.

> and b) result in no gain in functionality.

> And last time we discussed this, people screamed at
> the thought of assuming UTF-8 for input; I interpreted that suggestion
> as a non-starter.

But we aren't. I am saying UTF-8 is the native internal character set. What 
happens at the boundaries becomes everyone else's problem.  And after all the 
grief in this discussion over the last five+ years, don't you think it should 
be someone else's problem?



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>i had the same issue, and decided that for me, it was the only thing
>in the way of changing fully to a UTF-8 locale.  so i do this:
>
>$ locale
>LANG=en_US.UTF-8
>LANGUAGE=en_US:en
>LC_CTYPE=en_US.utf8

Turning this around ... all we really care about is LC_CTYPE.  Everything
else could be C.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8
>in the same manner, and process (and store!) everything internally
>as UTF-8, all of this nonsense would go away.  Similarly, we could
>convert from UTF-8 -> $LANG/$LC_CTYPE on the way out.  And we could ship
>everything off-site with one of only two character sets: ascii, or utf8.

I ... do not think this would solve this particular problem.  The issue
here seems to be a) nmh programs were given 8 bit characters, and b)
the locale was set to US-ASCII.  If you are going to assume that all
INPUT is unconditionally UTF-8, then yes, that would solve this problem.
But you say above you want to use LANG/LC_CTYPE to convert to UTF-8 on
input; that would have failed given the problem as stated.

And like I've said before: I think this effort would a) require a new
library dependency (for UTF-8 processing, since we couldn't use the
locale functions anymore) and b) result in no gain in functionality.
Like, I'm squinting really hard here, and I can't see how it would have
changed anything.  And last time we discussed this, people screamed at
the thought of assuming UTF-8 for input; I interpreted that suggestion
as a non-starter.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 6:39 PM, Ken Hornstein  wrote:
> 
> What it refuses to do now is create improperly-formatted email messages
> when it cannot identify the character set.  Before it would happily
> send these messages out; THAT has been broken for twenty years and was
> only recently fixed.
> 
> And if we're voting ... I would rather have only one additional way to
> specify a nmh-specific locale (well, I'd rather have ZERO additional
> ways, but I think more than one way is overkill).
> 
> (And it occurs to me that even setting the locale properly probably
> will not fix your specific problem, as you have described it; forwarding
> messages using MIME will).

The underlying problem is that locales were built before anyone really 
understood the problem.  For one, they assume symmetry on input and output; 
there is no LC_CTYPE_INPUT and LC_CTYPE_OUTPUT.

This is why Plan9 punted on the entire issue and said UTF-8 everywhere.  Do 
what you want outside, but it's your job to convert to UTF-8 before you talk to 
or from the tools.  And they provided a command line tool to do just that.  If 
you look at the Plan9 mail system, it's all UTF-8 internally.  When mail comes 
in over the wire, the appropriate MIME charset= parameters are used to convert 
content to UTF-8 for display (upas/fs takes care of this).  By definition, all 
input is UTF-8.

If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8 in the same 
manner, and process (and store!) everything internally as UTF-8, all of this 
nonsense would go away.  Similarly, we could convert from UTF-8 -> 
$LANG/$LC_CTYPE on the way out.  And we could ship everything off-site with one 
of only two character sets: ascii, or utf8.

Good grief, even Microsoft has figured this out :-P  Yes, someone has to write 
the code.  Let's ship 1.7 (if Ralph ever stops committing!), then do 1.8 (the 
SSL/TLS stuff).  And then let's branch for 2.0 and go for a top-to-bottom UTF-8 
runtime.  I've been pharting around with this for a couple of years now in my 
own private branch.  It's not trivial, but it's doable.  And maybe *mh should 
lead the way again, for the first time in a few decades.

--lyndon


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>Well, my problem is that I want the prevailing session locale to be C,
>primarily because I'm used to seeing output from e.g. "ls" in ASCII
>ordering.  But I'm finding that nmh, or at least send, is effectively
>broken in that locale --- unwillingness to cope with non-ASCII data
>at all counts as "broken" for me.

You're missing the point; nmh can handle non-ASCII data perfectly fine.
I use it that way every day, and so do plenty of others.

What it refuses to do now is create improperly-formatted email messages
when it cannot identify the character set.  Before it would happily
send these messages out; THAT has been broken for twenty years and was
only recently fixed.

And if we're voting ... I would rather have only one additional way to
specify a nmh-specific locale (well, I'd rather have ZERO additional
ways, but I think more than one way is overkill).

(And it occurs to me that even setting the locale properly probably
will not fix your specific problem, as you have described it; forwarding
messages using MIME will).

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Paul Fox
tom wrote:
 > Lyndon Nerenberg  writes:
 > > At this point, I think the fact that nobody seems to be able to give a 
 > > simple, clear, and coherent description of the problem suggests that 
 > > nobody really knows what the actual problem is, yet.
 > 
 > Well, my problem is that I want the prevailing session locale to be C,
 > primarily because I'm used to seeing output from e.g. "ls" in ASCII
 > ordering.  But I'm finding that nmh, or at least send, is effectively

i had the same issue, and decided that for me, it was the only thing
in the way of changing fully to a UTF-8 locale.  so i do this:

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.utf8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C  <
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

you've likely considered that and dismissed it, but i mention it just in
case.  i've observed no unwanted side-effects.

 > Obviously you could define this as being ls' problem not nmh's problem,
 > but I respectfully disagree.  It was fine for the last twenty years or
 > thereabouts, and nmh changes are what made it not fine.

but i also can't really argue with your logic there.

paul
=--
paul fox, p...@foxharp.boston.ma.us (arlington, ma)


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread David Levine
Lyndon wrote:

> > On Oct 17, 2016, at 6:13 PM, David Levine  wrote:
> > 
> > I'm not a fan of environment variables when there's an alternative.
> > The profile seems like a good home.
>
> $NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy
> that covers pretty much any scenario.

$NMH_LANG just seems like overkill to me.  And having all these ways to
set locale invites (even more) confusion.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 6:20 PM, Lyndon Nerenberg  wrote:
> 
> $NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy that 
> covers pretty much any scenario.

I.e., the standard (MH-modified) UNIX model is:

1) command line (-locale)
2) mh-command-specific profile entry (.mh-profile)
3) environment ($NMH_LANG)
4) profile default override (.mh-profile)
5) OS environment default (locale())

It's arguable about the ordering of (2) and (3).  But if I really needed this 
level of control in real life, I can't see how I would have both in play.

--lyndon



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Lyndon Nerenberg  writes:
> At this point, I think the fact that nobody seems to be able to give a 
> simple, clear, and coherent description of the problem suggests that nobody 
> really knows what the actual problem is, yet.

Well, my problem is that I want the prevailing session locale to be C,
primarily because I'm used to seeing output from e.g. "ls" in ASCII
ordering.  But I'm finding that nmh, or at least send, is effectively
broken in that locale --- unwillingness to cope with non-ASCII data at
all counts as "broken" for me.  So I want a way of controlling the locale
used by nmh without side-effects on other programs.

Obviously you could define this as being ls' problem not nmh's problem,
but I respectfully disagree.  It was fine for the last twenty years or
thereabouts, and nmh changes are what made it not fine.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 6:13 PM, David Levine  wrote:
> 
> I'm not a fan of environment variables when there's an alternative.
> The profile seems like a good home.

$NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy that 
covers pretty much any scenario.

--lyndon


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 6:08 PM, Ken Hornstein  wrote:
> 
>> What about an $NMH_LANG environment variable that the runtime could
>> look for *very* early on, and use to seed $LANG.  Seems like minimal
>> impact to the rest of the code that's locale aware.  Assuming that is
>> all consolidated in the library routines.  Which I *think* we are ...
> 
> Uhhh ... see my previous email on this subject.  Personally, not
> interested.

At this point, I think the fact that nobody seems to be able to give a simple, 
clear, and coherent description of the problem suggests that nobody really 
knows what the actual problem is, yet.

I think we are conflating two or three different issues, solely because their 
side effects look the same.

--lyndon


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread David Levine
Lyndon wrote:

> What about an $NMH_LANG environment variable that the runtime
> could look for *very* early on, and use to seed $LANG.

I'm not a fan of environment variables when there's an alternative.
The profile seems like a good home.

> Seems like minimal impact to the rest of the code that's locale
> aware.  Assuming that is all consolidated in the library
> routines.  Which I *think* we are ...

Yes, set_locale() is called from one place.

David


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread David Levine
Ken wrote:

> >[Tom wrote:]
> >I thought Laura's suggestion was to be able to put something like
> >
> >locale: en_GB.utf8
> >
> >into ~/.mh_profile, which seems eminently sensible to me.
>
> Sigh.  I won't object if someone does this work ... but it's not something
> I want to tackle, for the aforementioned reasons.  Note that it's not
> completely straightforward, since a few programs (like post(8)) don't
> read the profile.

I'll do it.  post doesn't call iconv, so I'll assume it doesn't need it.
mh-install(1) and slocal(1) don't either, all other MH/nmh programs do.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>What about an $NMH_LANG environment variable that the runtime could
>look for *very* early on, and use to seed $LANG.  Seems like minimal
>impact to the rest of the code that's locale aware.  Assuming that is
>all consolidated in the library routines.  Which I *think* we are ...

Uhhh ... see my previous email on this subject.  Personally, not
interested.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Lyndon Nerenberg

> On Oct 17, 2016, at 5:54 PM, Ken Hornstein  wrote:
> 
>> I thought Laura's suggestion was to be able to put something like
>> 
>> locale: en_GB.utf8
>> 
>> into ~/.mh_profile, which seems eminently sensible to me.
> 
> Sigh.  I won't object if someone does this work ... but it's not something
> I want to tackle, for the aforementioned reasons.  Note that it's not
> completely straightforward, since a few programs (like post(8)) don't
> read the profile.

What about an $NMH_LANG environment variable that the runtime could look for 
*very* early on, and use to seed $LANG.  Seems like minimal impact to the rest 
of the code that's locale aware.  Assuming that is all consolidated in the 
library routines.  Which I *think* we are ...

--lyndon


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>I thought Laura's suggestion was to be able to put something like
>
>locale: en_GB.utf8
>
>into ~/.mh_profile, which seems eminently sensible to me.

Sigh.  I won't object if someone does this work ... but it's not something
I want to tackle, for the aforementioned reasons.  Note that it's not
completely straightforward, since a few programs (like post(8)) don't
read the profile.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Ken Hornstein  writes:
>> I believe that nmh users are sophisticated enough that they actually
>> know what a Locale is, and an encoding, and if they google for more
>> information they will understand what they read.  I think that they
>> are going to want a way to specify what they want in their mh_profile
>> though.

> Wlll ... my reluctance there is I don't want to duplicate Unix
> functionality without a very good reason.  And Unix already umpteen
> ways to do this; you could wrap all of the nmh commands with aliases
> or shell wrappers, for one.  It just feels unnecessary.

I thought Laura's suggestion was to be able to put something like

locale: en_GB.utf8

into ~/.mh_profile, which seems eminently sensible to me.  Yes, there
are other ways to get the same result, but they're hacks.  Wrapping
every MH command with a shell wrapper in order to force its locale
is surely a hack.  The alternative of a session-wide LC/LANG setting
may have side-effects that the user doesn't want, so I'd rate that
as a hack too.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>I believe that nmh users are sophisticated enough that they actually
>know what a Locale is, and an encoding, and if they google for more
>information they will understand what they read.  I think that they
>are going to want a way to specify what they want in their mh_profile
>though.

Wlll ... my reluctance there is I don't want to duplicate Unix
functionality without a very good reason.  And Unix already umpteen
ways to do this; you could wrap all of the nmh commands with aliases
or shell wrappers, for one.  It just feels unnecessary.

>We have a terminal room with shared workstations that all have a
>very restricted number of Mathematica licenses.  I haven't been there
>for a few years, but it used to be the case that Mathematica wanted
>LC_ALL=C which had the result that people couldn't send mail from
>those terminals using some of their favourite mailers, and moreover
>had absolutely no clue as to what was wrong.

Really?  Way to blow it, Wolfram!

>One thing I do not
>know is how common it is, these days, for people to share computers.
>
>A long time ago it was common.  Then it became uncommon, as everybody
>used their own personal laptop.  These days, at least around here, there
>is a sizable segment of the population which doesn't own  anything
>larger than a cell phone or a tablet, so the demand is on, again
>for shared spaces.

We have a very different environment, which is not surprising.  For
stuff like email, everyone has their own workstation.  For higher-power
computing, there are a few centralized systems; email works on those
systems, but it's generally not used.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Laura Creighton
In a message of Mon, 17 Oct 2016 17:21:18 -0400, Ken Hornstein writes:
>I agree that we can't reasonably know what the character set is supposed
>to be in that case.  But I would say that given the choice between
>sending 'something wrong' and 'erroring out', 'erroring out' is the more
>correct option.  But I would be interested in hearing what other people
>think.

I believe that nmh users are sophisticated enough that they actually
know what a Locale is, and an encoding, and if they google for more
information they will understand what they read.  I think that they
are going to want a way to specify what they want in their mh_profile
though.

We have a terminal room with shared workstations that all have a
very restricted number of Mathematica licenses.  I haven't been there
for a few years, but it used to be the case that Mathematica wanted
LC_ALL=C which had the result that people couldn't send mail from
those terminals using some of their favourite mailers, and moreover
had absolutely no clue as to what was wrong.  One thing I do not
know is how common it is, these days, for people to share computers.

A long time ago it was common.  Then it became uncommon, as everybody
used their own personal laptop.  These days, at least around here, there
is a sizable segment of the population which doesn't own  anything
larger than a cell phone or a tablet, so the demand is on, again
for shared spaces.

True where you are?

Laura

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ralph Corderoy
Hi Ken,

> > Valid UTF-8 and valid GB2312 can share the same sequences,
> > especially if it's just the odd `£' or `拢` in ASCII text.
>
> It was just a suggestion, not one I was particularly crazy about ...
> but not all arbitrary 8-bit sequences are valid UTF-8.

Oh, agreed.

> And it looks like for GB2312 (using the EUC-CN encoding, right?) it
> would be harder, but there are certainly invalid sequences for GB2312.

Yep.  But there's a lot of valid sequences for both that look like each
other.  UTF-8 for U+00a3, that `£', is U+62e2, `拢', if the UTF-8 0xc2
0xa3 is treated as (EUC-CN) GB2312.

$ printf '\x00\xa3' |
> iconv -f ucs-2be -t utf-8 |
> iconv -f gb2312 -t ucs-2be |
> hd
  62 e2 |b.|
0002
$

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Laura Creighton
In a message of Mon, 17 Oct 2016 21:41:08 +0100, Ralph Corderoy writes:
>Hi Laura,
>
>> Just giving them utf-8 even though that wasn't what they asked for has
>> fixed a huge number of headaches when running mailing lists around
>> here.
>
>Does that mailing-list software check that what they're sending out is
>valid for the encoding they claim, UTF-8?  Or when replying to an ISO
>8859-13 do they send invalid UTF-8 back?
>
>-- 
>Cheers, Ralph.
>https://plus.google.com/+RalphCorderoy

It does some checking.  I'm pretty sure that I could construct some cases
that would break it, but there was no great demand for that.  While, on
the other hand, mail that claims to be US-ASCII but is really UTF-8
happens all the time.  The weekly report from the Python bug tracker,
for instance, insists that its mail is US-ASCII no matter how many bug
reports that I send about the fact that people are signing their bug
reports with their non-ASCII names.

Things may be different where you are, but around here, when there is a
difference of opinion between what the encoding says, and what the
content has inside it, and the encoding is US-ASCII, the encoding is
always wrong.  "You got an 8-bit char in your mail by mistake" is not
a common problem here, and, when it does occur, it's not a problem that
people care about, or rather they care about it just as much as they do
about any other typo in their mail -- if they didn't run the thing
through a spell checker, then it wasn't one of those important pieces
of mail where perfection in content matters, but more like this one,
where if I make a typo, I will trust that you will suffer through
it with no long term ill effects.

Laura


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a
>goer.  Valid UTF-8 and valid GB2312 can share the same sequences,
>especially if it's just the odd `£' or `拢` in ASCII text.

It was just a suggestion, not one I was particularly crazy about ... but
not all arbitrary 8-bit sequences are valid UTF-8.  And it looks like
for GB2312 (using the EUC-CN encoding, right?) it would be harder, but
there are certainly invalid sequences for GB2312.  Although I do not
think this is a business we should be in; pick your locale properly
or explicitly specify a character set in the draft.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>Well, whether you intentionally type any and whether some happen to creep
>into your email are two different things.  As an example: I am suspicious
>now that my problem really stemmed from exmh choosing to use both -push
>and -forward; the latter is documented as "If -forward is given, then a
>copy of the draft will be attached to this failure notice."  So I am
>thinking that it stuck the UTF8-containing text onto the failure notice,
>and then that send attempt failed for exactly the same reason, ie it was
>rejected by the character set strictness check.  Even if you're right that
>there was no send attempt at all, I'm expecting that once it's there
>it will fail like this :-(

Well, I looked.  It's important to understand the workflow here, and how
things have evolved over time.

First, you have to have special handling to handle -push in send; I didn't
implement that for mhbuild.  So yeah, mhbuild failing and not sending a
notification email is a bug.  At least if that had worked, you would have
gotten something in dead.letter if it couldn't send it.

So, current workflow.  A user creates a "draft file" by whatever means.
Then it gets passed to send(1).  send's job is to turn the draft file
into a RFC 5322-compliant message and then send it to post(8).  That is
done by calling mhbuild(1) on the draft.  This used to be optional; the
result was that nmh users could very easily be sending out messages that
weren't MIME compliant (and that happened a lot).

>So basically the problem here is one of robustness.  Yeah, it would be
>nice to be sure that what you are sending is 100% valid.  But I don't
>really agree with the tradeoff that's been made of failing when you
>can't be sure of that.  Especially since, if you think you know what
>non-ASCII encoding a bit of text is in, you're just fooling yourself
>anyway.  It's impossible to distinguish the ISO 8859 variants from
>each other, and at best heuristic to tell whether text is in UTF-8
>or an ISO 8859 variant.

I agree that we can't reasonably know what the character set is supposed
to be in that case.  But I would say that given the choice between
sending 'something wrong' and 'erroring out', 'erroring out' is the more
correct option.  But I would be interested in hearing what other people
think.

>Maybe we could just leave off the character set spec if it turns out to
>be definitely wrong?

As Ralph pointed out, that means the same as us-ascii ... and we know
that's wrong.  Before, it looks like we would generate a character set
of x-unknown; I'm not in love with that either.  Really, it seems like
this exposes something wrong that the user should correct.  Also, if
your forwarding messages with 8-bit characters, you should really be
using forw -mime.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>The problem is that most of the time people who have the locale US-ASCII
>set do so when what they want is 'English, US or Brit doesn't matter,
>but keep all the extra characters that other people are using when,
>for instance writing their names'.  They don't want their mail to fall over
>because they are replying to Åsa Krigström in Nyköping.

But it seems like en_US.UTF-8 (or en_GB.UTF-8) is what they really want,
then.  I just don't feel comfortable about assuming a character set.  Also,
at least with nmh if they just have LANG=C, they're not going to be able
SEE any 8-bit characters.

>Just giving them utf-8 even though that wasn't what they asked for
>has fixed a huge number of headaches when running mailing lists
>around here.

That's not exactly the same problem, is it?  Also, how did that work?
Did the mailing lists unilaterally just convert everything to utf-8?

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ralph Corderoy
Hi Laura,

> Just giving them utf-8 even though that wasn't what they asked for has
> fixed a huge number of headaches when running mailing lists around
> here.

Does that mailing-list software check that what they're sending out is
valid for the encoding they claim, UTF-8?  Or when replying to an ISO
8859-13 do they send invalid UTF-8 back?

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Laura Creighton
In a message of Mon, 17 Oct 2016 13:42:37 -0400, Ken Hornstein writes:
>Also, it kind of strikes me as the wrong solution, and not just because
>of the additional complexity.  The locale setting is supposed to
>indicate to utilities which character set you're using.  So we (rather
>reasonably, I would argue) use that in nmh to determine the character
>set for input and display.  If you're putting an 8-bit character into
>a message when you've told us that you are always going to be sending
>US-ASCII ... well, what are we supposed to do?  That seems like an error
>condition to me.  You can explicitly override the character set in your
>draft for a single message (see mhbuild(1)) if you want to do something
>different for individual messages, but absent that I think going with the
>locale character set is the only solution.
>
>--Ken

The problem is that most of the time people who have the locale US-ASCII
set do so when what they want is 'English, US or Brit doesn't matter,
but keep all the extra characters that other people are using when,
for instance writing their names'.  They don't want their mail to fall over
because they are replying to Åsa Krigström in Nyköping.

Just giving them utf-8 even though that wasn't what they asked for
has fixed a huge number of headaches when running mailing lists
around here.

Laura

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ralph Corderoy
Hi Tom,

> But I don't really agree with the tradeoff that's been made of failing
> when you can't be sure of that.

I don't think nmh should be complicit to you putting an RFC-invalid
email onto the wire;  there's enough of them in the world already.  :-)

> Especially since, if you think you know what non-ASCII encoding a bit
> of text is in, you're just fooling yourself anyway.

Perhaps I've misremembering, but nmh is trusting you to say what the
encoding is, through your locale, and checking the content against that.
If you're lying and get caught out, nmh is reasonable to get shirty?

> Maybe we could just leave off the character set spec if it turns out
> to be definitely wrong?

IIRC the RFCs say an email on the wire like that is US-ASCII.  :-)

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Valdis . Kletnieks
On Mon, 17 Oct 2016 14:35:57 -0400, Tom Lane said:

> Maybe we could just leave off the character set spec if it turns out to
> be definitely wrong?

Non-starter.  What position does that leave the recipient's MUA in?


pgpbZrhJaGISp.pgp
Description: PGP signature
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ralph Corderoy
Hi Ken,

> I just have a hard time adding a switch to send (or really, any nmh
> utility) when there's already OS-supported mechanism for overriding
> the locale for individual commands by changing the environment
> variable.

I'm surprised whatnow(1) hasn't grown the ability to prefix the command
with environment-variable assignments;  `LANG=en_GB.utf8 send'.  Oh, and
a `shell' would be nice.  Perhaps a '!' prefix for one-off commands?

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ralph Corderoy
Hi Ken,

> > (3) assume charset=utf-8 (maybe allow this to be overridden in
> > profile)
>
> We already do (1) and (2).  (3) is the problem.  Other people who have
> thoughts on this topic are free to weigh in.  Personally, I believe
> that if you're doing LANG=C, you shouldn't be dealing with any 8-bit
> characters at all.  Isn't that's what that means?

Agreed.  I eventually moved from LC_ALL=C to LANG=en_GB.utf8 and it
isn't too painful these days.  GNU grep and others have worked on the
performance hit they had initially and for those times when I do want,
e.g. sort(1), to be in the C locale I use

$ cat ~/bin/C
#! /bin/sh

# LC_ALL has precedence over LANG according to POSIX[1], but we may as
# well stamp out any traces by setting LANG too.
# 1.  The Open Group Base Specifications, Ch. 8 Environment Variables.

LC_ALL=C LANG=C exec -- "$@"
$

BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a
goer.  Valid UTF-8 and valid GB2312 can share the same sequences,
especially if it's just the odd `£' or `拢` in ASCII text.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>The problem I've had with it in the past is that in a situation where *no*
>mail can be sent, you don't get a notification back.  Not much surprise
>there, and I've found that the error message does get left behind in
>a file in the drafts folder.  But this mhbuild failure neither sends
>warning mail nor leaves any file that I can find.

Yeah, that was an oversight on my part; I'll fix that.

>I generally run with LANG=C, which I suppose would have that effect.
>I could probably arrange to override that environment setting while
>calling "send", but it'd be easier if send had a command line switch
>for it.

I just have a hard time adding a switch to send (or really, any nmh utility)
when there's already OS-supported mechanism for overriding the locale
for individual commands by changing the environment variable.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Ken Hornstein  writes:
>> Personally I'd love it if send did something like:
>> (1) if text is entirely 7-bit: specify charset=us-ascii
>> (2) if environment specifies a non-ascii character set, use that
>> (3) assume charset=utf-8 (maybe allow this to be overridden in profile)

> We already do (1) and (2).

OK.

> (3) is the problem.  Other people who have
> thoughts on this topic are free to weigh in.  Personally, I believe that
> if you're doing LANG=C, you shouldn't be dealing with any 8-bit characters
> at all.  Isn't that's what that means?

Well, whether you intentionally type any and whether some happen to creep
into your email are two different things.  As an example: I am suspicious
now that my problem really stemmed from exmh choosing to use both -push
and -forward; the latter is documented as "If -forward is given, then a
copy of the draft will be attached to this failure notice."  So I am
thinking that it stuck the UTF8-containing text onto the failure notice,
and then that send attempt failed for exactly the same reason, ie it was
rejected by the character set strictness check.  Even if you're right that
there was no send attempt at all, I'm expecting that once it's there
it will fail like this :-(

So basically the problem here is one of robustness.  Yeah, it would be
nice to be sure that what you are sending is 100% valid.  But I don't
really agree with the tradeoff that's been made of failing when you
can't be sure of that.  Especially since, if you think you know what
non-ASCII encoding a bit of text is in, you're just fooling yourself
anyway.  It's impossible to distinguish the ISO 8859 variants from
each other, and at best heuristic to tell whether text is in UTF-8
or an ISO 8859 variant.

Maybe we could just leave off the character set spec if it turns out to
be definitely wrong?

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>Personally I'd love it if send did something like:
>
>(1) if text is entirely 7-bit: specify charset=us-ascii
>
>(2) if environment specifies a non-ascii character set, use that
>
>(3) assume charset=utf-8 (maybe allow this to be overridden in profile)

We already do (1) and (2).  (3) is the problem.  Other people who have
thoughts on this topic are free to weigh in.  Personally, I believe that
if you're doing LANG=C, you shouldn't be dealing with any 8-bit characters
at all.  Isn't that's what that means?

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Laura Creighton  writes:
> Since us-ascii is a perfect subset of utf-8, is there any reason that nmh
> couldn't take a look at the locale, and if it is us-ascii just use uft-8?

All modern character sets are supersets of us-ascii, so that argument
doesn't really get us far :-(.

Personally I'd love it if send did something like:

(1) if text is entirely 7-bit: specify charset=us-ascii

(2) if environment specifies a non-ascii character set, use that

(3) assume charset=utf-8 (maybe allow this to be overridden in profile)

but I'm not sure anyone else cares enough about it.

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>Since us-ascii is a perfect subset of utf-8, is there any reason that nmh
>couldn't take a look at the locale, and if it is us-ascii just use uft-8?

Well ... us-ascii is ALSO a perfect subset of iso-8859-1.  Or a whole
lot of character sets, actually.  Some people would argue those are more
correct :-/

I realize we could check to see if a character is a valid utf-8 multibyte
sequence and that's got a very high probability of always being right.  But
what if it isn't; what should we do then?

Also, it kind of strikes me as the wrong solution, and not just because
of the additional complexity.  The locale setting is supposed to
indicate to utilities which character set you're using.  So we (rather
reasonably, I would argue) use that in nmh to determine the character
set for input and display.  If you're putting an 8-bit character into
a message when you've told us that you are always going to be sending
US-ASCII ... well, what are we supposed to do?  That seems like an error
condition to me.  You can explicitly override the character set in your
draft for a single message (see mhbuild(1)) if you want to do something
different for individual messages, but absent that I think going with the
locale character set is the only solution.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Tom Lane
Ken Hornstein  writes:
>> Apparently, it's also trying to enforce that by rejecting any
>> non-plain-ASCII content.  This is a real pain, mainly because whatever
>> it's doing isn't playing well with exmh: the post simply silently doesn't
>> happen.  That's several notches below the already pretty awful handling
>> of post errors that I was used to.

> AFAIK, when send doesn't happen you should always get an error, and an
> exit with a non-zero error code.  Certainly when a send fails for me
> with exmh I always know about it.  This is assuming you don't use -push.
> So if this is failing, then that's a bug.  If you're using -push ... well,
> then what is happening is exactly what is supposed to be happening :-/

Yeah, I've been using exmh's "async" mode, which is documented as doing
the send in background and returning errors via email.  I see that this
appears to boil down to adding "-push -forward" to the arguments to send.
If I switch exmh to the "wait" mode and try a failing case, I get a popup
window with

/usr/bin/mhbuild: exit 1
mhbuild: Text content contains 8 bit characters, but character set is US-ASCII

so I guess I'll be changing over to that.

> Hm, in theory I see that you're supposed to get email back when push
> fails.  I'm not sure that's been tested in like forever.  I'm not actually
> sure what is supposed to do that.  Ah, alright ... I see there's an alert()
> function in uip/sendsbr.c.  I suspect we're not calling that if mhbuild
> fails.

The problem I've had with it in the past is that in a situation where *no*
mail can be sent, you don't get a notification back.  Not much surprise
there, and I've found that the error message does get left behind in
a file in the drafts folder.  But this mhbuild failure neither sends
warning mail nor leaves any file that I can find.

> Which would happen if (a) you put an 8-bit character in your draft, and
> (b) your locale is set to US-ASCII.  Nmh takes the character set to use
> out of the user's locale.

I generally run with LANG=C, which I suppose would have that effect.
I could probably arrange to override that environment setting while
calling "send", but it'd be easier if send had a command line switch
for it.

Thanks for responding!

regards, tom lane

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Laura Creighton
In a message of Mon, 17 Oct 2016 13:09:35 -0400, Ken Hornstein writes:
>>So I updated to the new RHEL6 package of nmh 1.6 (had been on 1.5).
>>I've found that it now wants to mime-ify outgoing mail and among
>>other things attaches
>>  Content-type: text/plain; charset="us-ascii"
>>Apparently, it's also trying to enforce that by rejecting any
>>non-plain-ASCII content.  This is a real pain, mainly because whatever
>>it's doing isn't playing well with exmh: the post simply silently doesn't
>>happen.  That's several notches below the already pretty awful handling
>>of post errors that I was used to.
>
>AFAIK, when send doesn't happen you should always get an error, and an
>exit with a non-zero error code.  Certainly when a send fails for me
>with exmh I always know about it.  This is assuming you don't use -push.
>So if this is failing, then that's a bug.  If you're using -push ... well,
>then what is happening is exactly what is supposed to be happening :-/
>
>Hm, in theory I see that you're supposed to get email back when push
>fails.  I'm not sure that's been tested in like forever.  I'm not actually
>sure what is supposed to do that.  Ah, alright ... I see there's an alert()
>function in uip/sendsbr.c.  I suspect we're not calling that if mhbuild
>fails.
>
>>I don't usually compose mail that isn't straight ASCII, but I've already
>>been burnt twice this morning by trying to forward text that included
>>a stray UTF8 character or two.
>>
>>Any suggestions on how to improve this?  Ideally I'd like it to pass
>>through what it's told to, perhaps changing the charset marking to
>>utf8 when necessary.
>
>Well, that's what supposed to happen, and that's what happens for me.
>
>I have a strong suspicion that if you were to get the error back (e.g.,
>not use -push if you are), it might show something like this:
>
>Text content contains 8 bit characters, but character set is US-ASCII
>
>Which would happen if (a) you put an 8-bit character in your draft, and
>(b) your locale is set to US-ASCII.  Nmh takes the character set to use
>out of the user's locale.  If you're forwarding an email without using
>MIME forwarding, then nmh doesn't have any idea what the character set
>should be; that might be a problem because it could guess wrong.
>
>Solutions include:
>
>- Using MIME forwarding (forw -mime)
>- Setting an 8-bit locale, but you might get the character set wrong there.
>
>If things are really crapping out with no error and you're not using -push,
>clearly that's a bug we need to fix.  Also, I guess we should probably send
>an error email if -push is being used and mhbuild fails.
>
>--Ken

Since us-ascii is a perfect subset of utf-8, is there any reason that nmh
couldn't take a look at the locale, and if it is us-ascii just use uft-8?

Laura


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility

2016-10-17 Thread Ken Hornstein
>So I updated to the new RHEL6 package of nmh 1.6 (had been on 1.5).
>I've found that it now wants to mime-ify outgoing mail and among
>other things attaches
>   Content-type: text/plain; charset="us-ascii"
>Apparently, it's also trying to enforce that by rejecting any
>non-plain-ASCII content.  This is a real pain, mainly because whatever
>it's doing isn't playing well with exmh: the post simply silently doesn't
>happen.  That's several notches below the already pretty awful handling
>of post errors that I was used to.

AFAIK, when send doesn't happen you should always get an error, and an
exit with a non-zero error code.  Certainly when a send fails for me
with exmh I always know about it.  This is assuming you don't use -push.
So if this is failing, then that's a bug.  If you're using -push ... well,
then what is happening is exactly what is supposed to be happening :-/

Hm, in theory I see that you're supposed to get email back when push
fails.  I'm not sure that's been tested in like forever.  I'm not actually
sure what is supposed to do that.  Ah, alright ... I see there's an alert()
function in uip/sendsbr.c.  I suspect we're not calling that if mhbuild
fails.

>I don't usually compose mail that isn't straight ASCII, but I've already
>been burnt twice this morning by trying to forward text that included
>a stray UTF8 character or two.
>
>Any suggestions on how to improve this?  Ideally I'd like it to pass
>through what it's told to, perhaps changing the charset marking to
>utf8 when necessary.

Well, that's what supposed to happen, and that's what happens for me.

I have a strong suspicion that if you were to get the error back (e.g.,
not use -push if you are), it might show something like this:

Text content contains 8 bit characters, but character set is US-ASCII

Which would happen if (a) you put an 8-bit character in your draft, and
(b) your locale is set to US-ASCII.  Nmh takes the character set to use
out of the user's locale.  If you're forwarding an email without using
MIME forwarding, then nmh doesn't have any idea what the character set
should be; that might be a problem because it could guess wrong.

Solutions include:

- Using MIME forwarding (forw -mime)
- Setting an 8-bit locale, but you might get the character set wrong there.

If things are really crapping out with no error and you're not using -push,
clearly that's a bug we need to fix.  Also, I guess we should probably send
an error email if -push is being used and mhbuild fails.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers