Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon wrote: > I'll add it when if/when I need it. How bout I change the LC_ALL in the setlocale() to LC_CTYPE? Then you can override with $LC_ALL. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 19, 2016, at 5:51 PM, David Levine wrote: > > It's easy to add later if we need, > hard to take away. Carry on. I'll add it when if/when I need it. ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon wrote: > The overhead of $NMH_LANG isn't even measurable, and the code path is trivial. Yes. But, I just don't want to burden the maintainers, and add roughage to the documentation and test suite, for a feature that I don't think will be used. It's easy to add later if we need, hard to take away. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>> 1) command line (-locale) >> 2) mh-command-specific profile entry (.mh-profile) >> 3) environment ($NMH_LANG) >> 4) profile default override (.mh-profile) >> 5) OS environment default (locale()) >> >> It's arguable about the ordering of (2) and (3). But if I really needed >> this level of control in real life, I can't see how I would have both in >> play. > > OK. I'm not going to add a -locale switch, that seems to me to move this > too close to nmh. I like the locale profile component. Let me ask those > who might use it: do you also want the NMH_LANG environment variable? I'd > rather not add a feature that won't be used. If we drop (1), then (3) is the only alternative, so I would say drop (1), and reverse (2) and (3). The overhead of $NMH_LANG isn't even measurable, and the code path is trivial. ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Wed, 19 Oct 2016 11:38:45 -0400, David Levine writes: >I'll do 1), but the other steps would be up to you. Thank you very much. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Wed, 19 Oct 2016 11:46:31 -0400, David Levine writes: >OK. I'm not going to add a -locale switch, that seems to me to move this >too close to nmh. I like the locale profile component. Let me ask those >who might use it: do you also want the NMH_LANG environment variable? I'd >rather not add a feature that won't be used. > >David I don't think I would be using it. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
David Levine writes: > OK. I'm not going to add a -locale switch, that seems to me to move this > too close to nmh. I like the locale profile component. Let me ask those > who might use it: do you also want the NMH_LANG environment variable? I'd > rather not add a feature that won't be used. For my purposes, a locale profile component would be sufficient. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon wrote: > I.e., the standard (MH-modified) UNIX model is: > > 1) command line (-locale) > 2) mh-command-specific profile entry (.mh-profile) > 3) environment ($NMH_LANG) > 4) profile default override (.mh-profile) > 5) OS environment default (locale()) > > It's arguable about the ordering of (2) and (3). But if I really needed this > level of control in real life, I can't see how I would have both in play. OK. I'm not going to add a -locale switch, that seems to me to move this too close to nmh. I like the locale profile component. Let me ask those who might use it: do you also want the NMH_LANG environment variable? I'd rather not add a feature that won't be used. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Laura wrote: > And, of course, I do use attach at the What Now? prompt. So, another approach to get what you want (8-bit message even if it contains no 8-bit characters) would be to do all this: 1) Add support for locale profile component to nmh. 2) Use latest nmh (HEAD of master) or upcoming nmh 1.7. 3) Add "locale: " to profile. 4a) Add "mhbuild -headerencoding utf-8" to profile. This requires that your SMTP server support SMTPUTF8. And it would have the side effect of enabling EAI (RFC 6531), thereby permitting 8-bit characters in addresses. or 4b) Put an 8-bit character in the body of every message. You could do this in components files. I'll do 1), but the other steps would be up to you. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
And, of course, I do use attach at the What Now? prompt. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Laura wrote: > I don't know about Tom, but my problem isn't that I want to send valid > US-ASCII emails, but that I _never_ want to send valid us-ascii emails. > And the last thing I want to happen is for nmh to not be able to > tell what I want, stick in us-ascii (because the thing needs a > content header, and that's the default), Just to note that without the Content-Type and MIME-Version headers, the message is assumed to be us-ascii. > and then give me grief because the mail I assembled with some script somewhere > contained lots of invalid us-ascii. mhbuild should do the right thing if you run it after putting everything into your draft. But I understand that some, including me, add to the draft after running mhbuild (mime at the What Now? prompt). > I thought that mh_profile would be a good place to send a love letter to > nmh. "Dear nmh. I see that you cleverly concluded that I wanted > us-ascii. Alas, you were wrong. Just be a good chap and give me utf-8. > Thank you." I don't think the locale profile entry (and $NMH_LANG) will do what you want. Perhaps what you want is to do what Ralph does, and put something similar this in your components files: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit But only if you don't use attach at the What Now? prompt. If you do, I think that we want to figure out how to force mhbuild to always go 8bit, with an appropriate charset. There should be an easy way, such as a signature block with at least one 8-bit character in it. (Or with current HEAD on master and upcoming 1.7, a message header with such.) David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Tue, 18 Oct 2016 15:13:49 +0100, Ralph Corderoy writes: >There's a mechanism for telling a hierarchy of programs their locale; >environment variables. You're using it, but you're telling some of them >a different locale to what you really want them to use. People do this a lot around here. Due to the extremely poor placement of the '{' and '}' keys, right-alt-shifted-7 and right-alt-shifted-0 (yuck!) people who need to type braces at speed pop in and out of us-ascii all the time. I suspect these days there are less invasive ways to say 'change my keyboard layout' but this is what people have become used to. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Tom, > All of these solutions presuppose that this is my problem and not the > software's. I respectfully disagree. Me too. :-) There's a mechanism for telling a hierarchy of programs their locale; environment variables. You're using it, but you're telling some of them a different locale to what you really want them to use. That's not the software's fault. If I use mail(1) here in the C locale to send an email and give it non-ASCII characters then they are ignored, don't make it into the email at all, and that email isn't MIME, thus it's US-ASCII and valid at that because non-ASCII has been (silently) stripped. It offers only environment variables to alter its locale. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>For the moment, I've worked around the problem by launching exmh (and >nothing else) in en_US.utf8 locale, so that the nmh calls all inherit >that. But I regard that as a hack not a fix. It affects directory >listings done by exmh, e.g. in save-to-file dialogs, and there may be >other side-effects as well; I haven't been using this workaround for >long enough to know. If things like the ls sorting order is your concern, as others have pointed out you could simply just use LC_CTYPE; that shouldn't affect any collation ordering. >All of these solutions presuppose that this is my problem and not the >software's. I respectfully disagree. I would like it to "just work" >whether or not there are stray UTF8 characters. I do not know how we could make it "just work". The OLD solution was to send out incorrectly-formatted messages; that's simply not a reasonable option anymore. It wasn't really a reasonable option for 20 years, honestly, but we're kind of slow with regards to that. You don't want to tell nmh what the character set is (the locale is the EASIEST option, but there are others - all of them involve you changing something). --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Tue, 18 Oct 2016 14:17:24 +0100, Ralph Corderoy writes: >Hi Tom, > >If you just have one long-running Emacs then can't that be in the UTF-8 >locale? Or is your C-needing ls(1) run from inside that? > >Have Emacs highlight non-ASCII characters in that mode wherever they >come from, e.g. paste from web browser? Have a function that maps the >common ones to ASCII, perhaps using recode(1)? Filter the buffer when >writing the file, erroring if it can't be written? Then you can send >valid US-ASCII emails. > >-- >Cheers, Ralph. >https://plus.google.com/+RalphCorderoy I don't know about Tom, but my problem isn't that I want to send valid US-ASCII emails, but that I _never_ want to send valid us-ascii emails. Even when replying to mail that is encoded us-ascii, or when sitting at a workstation that isn't mine, which has that as its locale, or, has no locale set and is defaulting to C or however else you can get nmh to conclude you want us-ascii ... And the last thing I want to happen is for nmh to not be able to tell what I want, stick in us-ascii (because the thing needs a content header, and that's the default), and then give me grief because the mail I assembled with some script somewhere contained lots of invalid us-ascii. This, I thought, was what was happening to Tom, or whoever it was who had the bad emacs-nmh experience. I thought that mh_profile would be a good place to send a love letter to nmh. "Dear nmh. I see that you cleverly concluded that I wanted us-ascii. Alas, you were wrong. Just be a good chap and give me utf-8. Thank you." Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ralph Corderoy writes: > If you just have one long-running Emacs then can't that be in the UTF-8 > locale? Or is your C-needing ls(1) run from inside that? I'd rather not run it in non-C locale (for one thing, as you say, shells run inside it would tend to inherit that locale). And I don't really see what that would change anyway. The nmh calls are made from exmh, which is a sibling not a child of the emacs process. For the moment, I've worked around the problem by launching exmh (and nothing else) in en_US.utf8 locale, so that the nmh calls all inherit that. But I regard that as a hack not a fix. It affects directory listings done by exmh, e.g. in save-to-file dialogs, and there may be other side-effects as well; I haven't been using this workaround for long enough to know. If it's decided that there will be no solution provided at the nmh level, I'll probably look into injecting extra code to set the locale envvars in exmh's nmh calls. > Have Emacs highlight non-ASCII characters in that mode wherever they > come from, e.g. paste from web browser? Have a function that maps the > common ones to ASCII, perhaps using recode(1)? Filter the buffer when > writing the file, erroring if it can't be written? Then you can send > valid US-ASCII emails. All of these solutions presuppose that this is my problem and not the software's. I respectfully disagree. I would like it to "just work" whether or not there are stray UTF8 characters. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Tom, If you just have one long-running Emacs then can't that be in the UTF-8 locale? Or is your C-needing ls(1) run from inside that? Have Emacs highlight non-ASCII characters in that mode wherever they come from, e.g. paste from web browser? Have a function that maps the common ones to ASCII, perhaps using recode(1)? Filter the buffer when writing the file, erroring if it can't be written? Then you can send valid US-ASCII emails. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken Hornstein writes: > As I understand it, Tom said his problem was when he forwarded some > email to someone else and it contained 8-bit characters. I suspect this > was done with "forw" (or the Forward button in exmh). Just for the record, I didn't say that; I rarely use "forw". The more common scenario for me is that I'm replying to someone and quoting bits of their message in-line (as I'm doing here), and the most common specific gotcha is that somebody's using fancy quotes rather than plain ASCII ones in the quoted text. Most of the text-munging involved in that doesn't use nmh at all AFAIK --- it's all in Emacs MH-Letter mode macros. And the macro for yanking a message into the buffer and prefixing "> " to each line doesn't pay any attention to the headers of said message, so it's not going to absorb any character set attributions from it. Fortunately for me, there's little enough non-UTF8 stuff in my mail traffic that I can afford to ignore the possibility that what I'm quoting isn't UTF8. I could probably teach the Emacs code to insert a Content-type header just before sending, if the buffer contains any non-ASCII characters. But I don't really see why I should have to, when nmh already contains exactly the logic I need, it's just not packaged in a conveniently-controllable way. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
ken wrote: > >> (And it occurs to me that even setting the locale properly probably > >> will not fix your specific problem, as you have described it; > >> forwarding messages using MIME will). > > > >I don't think this got addressed. If a nmh-specific locale doesn't fix > >the problem then let's not add it. > > As I understand it, Tom said his problem was when he forwarded some > email to someone else and it contained 8-bit characters. I suspect this > was done with "forw" (or the Forward button in exmh). > > Locale settings aside, there's no way for the editor to know that arbitrary > character from another message is UTF-8, ISO-8859-1, or anything else. > That information _IS_ in the forwarded message, but with plain old forw > it's lost. If you use forw -mime, then it all works; the downside there is > you need to know to run mhbuild on that message. and both of those things (making "forw -mime" the default, and running mhbuild) are on the list for 1.7, correct? so does that mean that this problem is/was already on the path to being fixed? paul =-- paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 52.0 degrees) ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>But we aren't. I am saying UTF-8 is the native internal character >set. What happens at the boundaries becomes everyone else's problem. >And after all the grief in this discussion over the last five+ years, >don't you think it should be someone else's problem? Yeah, this is the part I don't understand. Let's say UTF-8 becomes the native internal character set. What is the gain? I'm perfectly willing to say I am missing something here. AFAIK, it doesn't help _input_; we still have to convert upon reading a message (even if we store messages in UTF-8, we still have to convert the first time we get them). It doesn't help _output_; we still need to convert to the user's native character set. While I understand why programs editors and terminals have a native internal character set, we don't really need to process individual characters like they do. We mostly treat the string data as opaque blobs except for a few circumstances, and those are relatively straightforward to handle. So, I'm really trying hard to see the gain here. Where we tend to run into problems is the boundary between nmh and the user; I don't see how using UTF-8 internally fixes it. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>> (And it occurs to me that even setting the locale properly probably >> will not fix your specific problem, as you have described it; >> forwarding messages using MIME will). > >I don't think this got addressed. If a nmh-specific locale doesn't fix >the problem then let's not add it. As I understand it, Tom said his problem was when he forwarded some email to someone else and it contained 8-bit characters. I suspect this was done with "forw" (or the Forward button in exmh). Locale settings aside, there's no way for the editor to know that arbitrary character from another message is UTF-8, ISO-8859-1, or anything else. That information _IS_ in the forwarded message, but with plain old forw it's lost. If you use forw -mime, then it all works; the downside there is you need to know to run mhbuild on that message. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi, Ken wrote: > And if we're voting ... I would rather have only one additional way to > specify a nmh-specific locale (well, I'd rather have ZERO additional > ways, but I think more than one way is overkill). I'd rather have zero. :-) Anything above that surely warrants an nmhlocale(1). > (And it occurs to me that even setting the locale properly probably > will not fix your specific problem, as you have described it; > forwarding messages using MIME will). I don't think this got addressed. If a nmh-specific locale doesn't fix the problem then let's not add it. And if it's only ls's C-locale collating order that's wanted, then why not Paul's solution of nearly-all UTF-8, or a ~/bin/ls? As a user of non-UTF-8 locales for a long time after the world moved on, it really wasn't that bad switching to it. Paul's done the same. It can be done with exceptions for what still needs C, not the other way around. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 8:28 PM, Tom Lane wrote: > > The current state of affairs is that nmh unconditionally assumes that > non-ASCII input is in the character set specified by the LC_CTYPE > environment variable (modulo the various ways that that can be specified). > What I'm suggesting would allow the environment to be overridden by an > mh_profile entry. There is zero difference from an epistemologic > standpoint: either way you're trusting the user to know what her data is. And what I am arguing is that this override might often be on a per-message basis, thus the $NMH_LANG escape for the programs calling the underlying nmh commands. NMH_LANG might be a horribly inappropriate name, and well met. Figure out the colour of the bikeshed, but at least build the damn thing. ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken Hornstein writes: > I ... do not think this would solve this particular problem. The issue > here seems to be a) nmh programs were given 8 bit characters, and b) > the locale was set to US-ASCII. If you are going to assume that all > INPUT is unconditionally UTF-8, then yes, that would solve this problem. Umm ... I think you are attacking a straw man. The current state of affairs is that nmh unconditionally assumes that non-ASCII input is in the character set specified by the LC_CTYPE environment variable (modulo the various ways that that can be specified). What I'm suggesting would allow the environment to be overridden by an mh_profile entry. There is zero difference from an epistemologic standpoint: either way you're trusting the user to know what her data is. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 8:12 PM, Lyndon Nerenberg wrote: > >> If you are going to assume that all >> INPUT is unconditionally UTF-8, I don't. Sorry, I missed that on my original rant. LC_CTYPE is what we use inbound to convert un-labelled characters to UTF-8. We still use UTF-8 everywhere, internally. ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> If you are going to assume that all > INPUT is unconditionally UTF-8, then yes, that would solve this problem. > But you say above you want to use LANG/LC_CTYPE to convert to UTF-8 on > input; that would have failed given the problem as stated. Two problems: 1) original input from the nmh user (composition). What I described works. 2) deciphering any external unlabeled content. this cannot be done reliably. as others have said, punt. 3) output well formed content. if we have utf8 internally, we can *always* do that (according to the locale()). > And like I've said before: I think this effort would a) require a new > library dependency (for UTF-8 processing, since we couldn't use the > locale functions anymore) I can (have already) import that from plan9port. > and b) result in no gain in functionality. > And last time we discussed this, people screamed at > the thought of assuming UTF-8 for input; I interpreted that suggestion > as a non-starter. But we aren't. I am saying UTF-8 is the native internal character set. What happens at the boundaries becomes everyone else's problem. And after all the grief in this discussion over the last five+ years, don't you think it should be someone else's problem? ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>i had the same issue, and decided that for me, it was the only thing >in the way of changing fully to a UTF-8 locale. so i do this: > >$ locale >LANG=en_US.UTF-8 >LANGUAGE=en_US:en >LC_CTYPE=en_US.utf8 Turning this around ... all we really care about is LC_CTYPE. Everything else could be C. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8 >in the same manner, and process (and store!) everything internally >as UTF-8, all of this nonsense would go away. Similarly, we could >convert from UTF-8 -> $LANG/$LC_CTYPE on the way out. And we could ship >everything off-site with one of only two character sets: ascii, or utf8. I ... do not think this would solve this particular problem. The issue here seems to be a) nmh programs were given 8 bit characters, and b) the locale was set to US-ASCII. If you are going to assume that all INPUT is unconditionally UTF-8, then yes, that would solve this problem. But you say above you want to use LANG/LC_CTYPE to convert to UTF-8 on input; that would have failed given the problem as stated. And like I've said before: I think this effort would a) require a new library dependency (for UTF-8 processing, since we couldn't use the locale functions anymore) and b) result in no gain in functionality. Like, I'm squinting really hard here, and I can't see how it would have changed anything. And last time we discussed this, people screamed at the thought of assuming UTF-8 for input; I interpreted that suggestion as a non-starter. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 6:39 PM, Ken Hornstein wrote: > > What it refuses to do now is create improperly-formatted email messages > when it cannot identify the character set. Before it would happily > send these messages out; THAT has been broken for twenty years and was > only recently fixed. > > And if we're voting ... I would rather have only one additional way to > specify a nmh-specific locale (well, I'd rather have ZERO additional > ways, but I think more than one way is overkill). > > (And it occurs to me that even setting the locale properly probably > will not fix your specific problem, as you have described it; forwarding > messages using MIME will). The underlying problem is that locales were built before anyone really understood the problem. For one, they assume symmetry on input and output; there is no LC_CTYPE_INPUT and LC_CTYPE_OUTPUT. This is why Plan9 punted on the entire issue and said UTF-8 everywhere. Do what you want outside, but it's your job to convert to UTF-8 before you talk to or from the tools. And they provided a command line tool to do just that. If you look at the Plan9 mail system, it's all UTF-8 internally. When mail comes in over the wire, the appropriate MIME charset= parameters are used to convert content to UTF-8 for display (upas/fs takes care of this). By definition, all input is UTF-8. If we were to use $LANG/$LC_CTYPE to convert incoming data to UTF-8 in the same manner, and process (and store!) everything internally as UTF-8, all of this nonsense would go away. Similarly, we could convert from UTF-8 -> $LANG/$LC_CTYPE on the way out. And we could ship everything off-site with one of only two character sets: ascii, or utf8. Good grief, even Microsoft has figured this out :-P Yes, someone has to write the code. Let's ship 1.7 (if Ralph ever stops committing!), then do 1.8 (the SSL/TLS stuff). And then let's branch for 2.0 and go for a top-to-bottom UTF-8 runtime. I've been pharting around with this for a couple of years now in my own private branch. It's not trivial, but it's doable. And maybe *mh should lead the way again, for the first time in a few decades. --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>Well, my problem is that I want the prevailing session locale to be C, >primarily because I'm used to seeing output from e.g. "ls" in ASCII >ordering. But I'm finding that nmh, or at least send, is effectively >broken in that locale --- unwillingness to cope with non-ASCII data >at all counts as "broken" for me. You're missing the point; nmh can handle non-ASCII data perfectly fine. I use it that way every day, and so do plenty of others. What it refuses to do now is create improperly-formatted email messages when it cannot identify the character set. Before it would happily send these messages out; THAT has been broken for twenty years and was only recently fixed. And if we're voting ... I would rather have only one additional way to specify a nmh-specific locale (well, I'd rather have ZERO additional ways, but I think more than one way is overkill). (And it occurs to me that even setting the locale properly probably will not fix your specific problem, as you have described it; forwarding messages using MIME will). --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
tom wrote: > Lyndon Nerenberg writes: > > At this point, I think the fact that nobody seems to be able to give a > > simple, clear, and coherent description of the problem suggests that > > nobody really knows what the actual problem is, yet. > > Well, my problem is that I want the prevailing session locale to be C, > primarily because I'm used to seeing output from e.g. "ls" in ASCII > ordering. But I'm finding that nmh, or at least send, is effectively i had the same issue, and decided that for me, it was the only thing in the way of changing fully to a UTF-8 locale. so i do this: $ locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE=en_US.utf8 LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C < LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= you've likely considered that and dismissed it, but i mention it just in case. i've observed no unwanted side-effects. > Obviously you could define this as being ls' problem not nmh's problem, > but I respectfully disagree. It was fine for the last twenty years or > thereabouts, and nmh changes are what made it not fine. but i also can't really argue with your logic there. paul =-- paul fox, p...@foxharp.boston.ma.us (arlington, ma) ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon wrote: > > On Oct 17, 2016, at 6:13 PM, David Levine wrote: > > > > I'm not a fan of environment variables when there's an alternative. > > The profile seems like a good home. > > $NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy > that covers pretty much any scenario. $NMH_LANG just seems like overkill to me. And having all these ways to set locale invites (even more) confusion. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 6:20 PM, Lyndon Nerenberg wrote: > > $NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy that > covers pretty much any scenario. I.e., the standard (MH-modified) UNIX model is: 1) command line (-locale) 2) mh-command-specific profile entry (.mh-profile) 3) environment ($NMH_LANG) 4) profile default override (.mh-profile) 5) OS environment default (locale()) It's arguable about the ordering of (2) and (3). But if I really needed this level of control in real life, I can't see how I would have both in play. --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon Nerenberg writes: > At this point, I think the fact that nobody seems to be able to give a > simple, clear, and coherent description of the problem suggests that nobody > really knows what the actual problem is, yet. Well, my problem is that I want the prevailing session locale to be C, primarily because I'm used to seeing output from e.g. "ls" in ASCII ordering. But I'm finding that nmh, or at least send, is effectively broken in that locale --- unwillingness to cope with non-ASCII data at all counts as "broken" for me. So I want a way of controlling the locale used by nmh without side-effects on other programs. Obviously you could define this as being ls' problem not nmh's problem, but I respectfully disagree. It was fine for the last twenty years or thereabouts, and nmh changes are what made it not fine. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 6:13 PM, David Levine wrote: > > I'm not a fan of environment variables when there's an alternative. > The profile seems like a good home. $NMH_LANG -> <.mh_profile> -> locale() seems like a reasonable hierarchy that covers pretty much any scenario. --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 6:08 PM, Ken Hornstein wrote: > >> What about an $NMH_LANG environment variable that the runtime could >> look for *very* early on, and use to seed $LANG. Seems like minimal >> impact to the rest of the code that's locale aware. Assuming that is >> all consolidated in the library routines. Which I *think* we are ... > > Uhhh ... see my previous email on this subject. Personally, not > interested. At this point, I think the fact that nobody seems to be able to give a simple, clear, and coherent description of the problem suggests that nobody really knows what the actual problem is, yet. I think we are conflating two or three different issues, solely because their side effects look the same. --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Lyndon wrote: > What about an $NMH_LANG environment variable that the runtime > could look for *very* early on, and use to seed $LANG. I'm not a fan of environment variables when there's an alternative. The profile seems like a good home. > Seems like minimal impact to the rest of the code that's locale > aware. Assuming that is all consolidated in the library > routines. Which I *think* we are ... Yes, set_locale() is called from one place. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken wrote: > >[Tom wrote:] > >I thought Laura's suggestion was to be able to put something like > > > >locale: en_GB.utf8 > > > >into ~/.mh_profile, which seems eminently sensible to me. > > Sigh. I won't object if someone does this work ... but it's not something > I want to tackle, for the aforementioned reasons. Note that it's not > completely straightforward, since a few programs (like post(8)) don't > read the profile. I'll do it. post doesn't call iconv, so I'll assume it doesn't need it. mh-install(1) and slocal(1) don't either, all other MH/nmh programs do. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>What about an $NMH_LANG environment variable that the runtime could >look for *very* early on, and use to seed $LANG. Seems like minimal >impact to the rest of the code that's locale aware. Assuming that is >all consolidated in the library routines. Which I *think* we are ... Uhhh ... see my previous email on this subject. Personally, not interested. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
> On Oct 17, 2016, at 5:54 PM, Ken Hornstein wrote: > >> I thought Laura's suggestion was to be able to put something like >> >> locale: en_GB.utf8 >> >> into ~/.mh_profile, which seems eminently sensible to me. > > Sigh. I won't object if someone does this work ... but it's not something > I want to tackle, for the aforementioned reasons. Note that it's not > completely straightforward, since a few programs (like post(8)) don't > read the profile. What about an $NMH_LANG environment variable that the runtime could look for *very* early on, and use to seed $LANG. Seems like minimal impact to the rest of the code that's locale aware. Assuming that is all consolidated in the library routines. Which I *think* we are ... --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>I thought Laura's suggestion was to be able to put something like > >locale: en_GB.utf8 > >into ~/.mh_profile, which seems eminently sensible to me. Sigh. I won't object if someone does this work ... but it's not something I want to tackle, for the aforementioned reasons. Note that it's not completely straightforward, since a few programs (like post(8)) don't read the profile. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken Hornstein writes: >> I believe that nmh users are sophisticated enough that they actually >> know what a Locale is, and an encoding, and if they google for more >> information they will understand what they read. I think that they >> are going to want a way to specify what they want in their mh_profile >> though. > Wlll ... my reluctance there is I don't want to duplicate Unix > functionality without a very good reason. And Unix already umpteen > ways to do this; you could wrap all of the nmh commands with aliases > or shell wrappers, for one. It just feels unnecessary. I thought Laura's suggestion was to be able to put something like locale: en_GB.utf8 into ~/.mh_profile, which seems eminently sensible to me. Yes, there are other ways to get the same result, but they're hacks. Wrapping every MH command with a shell wrapper in order to force its locale is surely a hack. The alternative of a session-wide LC/LANG setting may have side-effects that the user doesn't want, so I'd rate that as a hack too. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>I believe that nmh users are sophisticated enough that they actually >know what a Locale is, and an encoding, and if they google for more >information they will understand what they read. I think that they >are going to want a way to specify what they want in their mh_profile >though. Wlll ... my reluctance there is I don't want to duplicate Unix functionality without a very good reason. And Unix already umpteen ways to do this; you could wrap all of the nmh commands with aliases or shell wrappers, for one. It just feels unnecessary. >We have a terminal room with shared workstations that all have a >very restricted number of Mathematica licenses. I haven't been there >for a few years, but it used to be the case that Mathematica wanted >LC_ALL=C which had the result that people couldn't send mail from >those terminals using some of their favourite mailers, and moreover >had absolutely no clue as to what was wrong. Really? Way to blow it, Wolfram! >One thing I do not >know is how common it is, these days, for people to share computers. > >A long time ago it was common. Then it became uncommon, as everybody >used their own personal laptop. These days, at least around here, there >is a sizable segment of the population which doesn't own anything >larger than a cell phone or a tablet, so the demand is on, again >for shared spaces. We have a very different environment, which is not surprising. For stuff like email, everyone has their own workstation. For higher-power computing, there are a few centralized systems; email works on those systems, but it's generally not used. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Mon, 17 Oct 2016 17:21:18 -0400, Ken Hornstein writes: >I agree that we can't reasonably know what the character set is supposed >to be in that case. But I would say that given the choice between >sending 'something wrong' and 'erroring out', 'erroring out' is the more >correct option. But I would be interested in hearing what other people >think. I believe that nmh users are sophisticated enough that they actually know what a Locale is, and an encoding, and if they google for more information they will understand what they read. I think that they are going to want a way to specify what they want in their mh_profile though. We have a terminal room with shared workstations that all have a very restricted number of Mathematica licenses. I haven't been there for a few years, but it used to be the case that Mathematica wanted LC_ALL=C which had the result that people couldn't send mail from those terminals using some of their favourite mailers, and moreover had absolutely no clue as to what was wrong. One thing I do not know is how common it is, these days, for people to share computers. A long time ago it was common. Then it became uncommon, as everybody used their own personal laptop. These days, at least around here, there is a sizable segment of the population which doesn't own anything larger than a cell phone or a tablet, so the demand is on, again for shared spaces. True where you are? Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Ken, > > Valid UTF-8 and valid GB2312 can share the same sequences, > > especially if it's just the odd `£' or `拢` in ASCII text. > > It was just a suggestion, not one I was particularly crazy about ... > but not all arbitrary 8-bit sequences are valid UTF-8. Oh, agreed. > And it looks like for GB2312 (using the EUC-CN encoding, right?) it > would be harder, but there are certainly invalid sequences for GB2312. Yep. But there's a lot of valid sequences for both that look like each other. UTF-8 for U+00a3, that `£', is U+62e2, `拢', if the UTF-8 0xc2 0xa3 is treated as (EUC-CN) GB2312. $ printf '\x00\xa3' | > iconv -f ucs-2be -t utf-8 | > iconv -f gb2312 -t ucs-2be | > hd 62 e2 |b.| 0002 $ -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Mon, 17 Oct 2016 21:41:08 +0100, Ralph Corderoy writes: >Hi Laura, > >> Just giving them utf-8 even though that wasn't what they asked for has >> fixed a huge number of headaches when running mailing lists around >> here. > >Does that mailing-list software check that what they're sending out is >valid for the encoding they claim, UTF-8? Or when replying to an ISO >8859-13 do they send invalid UTF-8 back? > >-- >Cheers, Ralph. >https://plus.google.com/+RalphCorderoy It does some checking. I'm pretty sure that I could construct some cases that would break it, but there was no great demand for that. While, on the other hand, mail that claims to be US-ASCII but is really UTF-8 happens all the time. The weekly report from the Python bug tracker, for instance, insists that its mail is US-ASCII no matter how many bug reports that I send about the fact that people are signing their bug reports with their non-ASCII names. Things may be different where you are, but around here, when there is a difference of opinion between what the encoding says, and what the content has inside it, and the encoding is US-ASCII, the encoding is always wrong. "You got an 8-bit char in your mail by mistake" is not a common problem here, and, when it does occur, it's not a problem that people care about, or rather they care about it just as much as they do about any other typo in their mail -- if they didn't run the thing through a spell checker, then it wasn't one of those important pieces of mail where perfection in content matters, but more like this one, where if I make a typo, I will trust that you will suffer through it with no long term ill effects. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a >goer. Valid UTF-8 and valid GB2312 can share the same sequences, >especially if it's just the odd `£' or `拢` in ASCII text. It was just a suggestion, not one I was particularly crazy about ... but not all arbitrary 8-bit sequences are valid UTF-8. And it looks like for GB2312 (using the EUC-CN encoding, right?) it would be harder, but there are certainly invalid sequences for GB2312. Although I do not think this is a business we should be in; pick your locale properly or explicitly specify a character set in the draft. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>Well, whether you intentionally type any and whether some happen to creep >into your email are two different things. As an example: I am suspicious >now that my problem really stemmed from exmh choosing to use both -push >and -forward; the latter is documented as "If -forward is given, then a >copy of the draft will be attached to this failure notice." So I am >thinking that it stuck the UTF8-containing text onto the failure notice, >and then that send attempt failed for exactly the same reason, ie it was >rejected by the character set strictness check. Even if you're right that >there was no send attempt at all, I'm expecting that once it's there >it will fail like this :-( Well, I looked. It's important to understand the workflow here, and how things have evolved over time. First, you have to have special handling to handle -push in send; I didn't implement that for mhbuild. So yeah, mhbuild failing and not sending a notification email is a bug. At least if that had worked, you would have gotten something in dead.letter if it couldn't send it. So, current workflow. A user creates a "draft file" by whatever means. Then it gets passed to send(1). send's job is to turn the draft file into a RFC 5322-compliant message and then send it to post(8). That is done by calling mhbuild(1) on the draft. This used to be optional; the result was that nmh users could very easily be sending out messages that weren't MIME compliant (and that happened a lot). >So basically the problem here is one of robustness. Yeah, it would be >nice to be sure that what you are sending is 100% valid. But I don't >really agree with the tradeoff that's been made of failing when you >can't be sure of that. Especially since, if you think you know what >non-ASCII encoding a bit of text is in, you're just fooling yourself >anyway. It's impossible to distinguish the ISO 8859 variants from >each other, and at best heuristic to tell whether text is in UTF-8 >or an ISO 8859 variant. I agree that we can't reasonably know what the character set is supposed to be in that case. But I would say that given the choice between sending 'something wrong' and 'erroring out', 'erroring out' is the more correct option. But I would be interested in hearing what other people think. >Maybe we could just leave off the character set spec if it turns out to >be definitely wrong? As Ralph pointed out, that means the same as us-ascii ... and we know that's wrong. Before, it looks like we would generate a character set of x-unknown; I'm not in love with that either. Really, it seems like this exposes something wrong that the user should correct. Also, if your forwarding messages with 8-bit characters, you should really be using forw -mime. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>The problem is that most of the time people who have the locale US-ASCII >set do so when what they want is 'English, US or Brit doesn't matter, >but keep all the extra characters that other people are using when, >for instance writing their names'. They don't want their mail to fall over >because they are replying to Åsa Krigström in Nyköping. But it seems like en_US.UTF-8 (or en_GB.UTF-8) is what they really want, then. I just don't feel comfortable about assuming a character set. Also, at least with nmh if they just have LANG=C, they're not going to be able SEE any 8-bit characters. >Just giving them utf-8 even though that wasn't what they asked for >has fixed a huge number of headaches when running mailing lists >around here. That's not exactly the same problem, is it? Also, how did that work? Did the mailing lists unilaterally just convert everything to utf-8? --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Laura, > Just giving them utf-8 even though that wasn't what they asked for has > fixed a huge number of headaches when running mailing lists around > here. Does that mailing-list software check that what they're sending out is valid for the encoding they claim, UTF-8? Or when replying to an ISO 8859-13 do they send invalid UTF-8 back? -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Mon, 17 Oct 2016 13:42:37 -0400, Ken Hornstein writes: >Also, it kind of strikes me as the wrong solution, and not just because >of the additional complexity. The locale setting is supposed to >indicate to utilities which character set you're using. So we (rather >reasonably, I would argue) use that in nmh to determine the character >set for input and display. If you're putting an 8-bit character into >a message when you've told us that you are always going to be sending >US-ASCII ... well, what are we supposed to do? That seems like an error >condition to me. You can explicitly override the character set in your >draft for a single message (see mhbuild(1)) if you want to do something >different for individual messages, but absent that I think going with the >locale character set is the only solution. > >--Ken The problem is that most of the time people who have the locale US-ASCII set do so when what they want is 'English, US or Brit doesn't matter, but keep all the extra characters that other people are using when, for instance writing their names'. They don't want their mail to fall over because they are replying to Åsa Krigström in Nyköping. Just giving them utf-8 even though that wasn't what they asked for has fixed a huge number of headaches when running mailing lists around here. Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Tom, > But I don't really agree with the tradeoff that's been made of failing > when you can't be sure of that. I don't think nmh should be complicit to you putting an RFC-invalid email onto the wire; there's enough of them in the world already. :-) > Especially since, if you think you know what non-ASCII encoding a bit > of text is in, you're just fooling yourself anyway. Perhaps I've misremembering, but nmh is trusting you to say what the encoding is, through your locale, and checking the content against that. If you're lying and get caught out, nmh is reasonable to get shirty? > Maybe we could just leave off the character set spec if it turns out > to be definitely wrong? IIRC the RFCs say an email on the wire like that is US-ASCII. :-) -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
On Mon, 17 Oct 2016 14:35:57 -0400, Tom Lane said: > Maybe we could just leave off the character set spec if it turns out to > be definitely wrong? Non-starter. What position does that leave the recipient's MUA in? pgpbZrhJaGISp.pgp Description: PGP signature ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Ken, > I just have a hard time adding a switch to send (or really, any nmh > utility) when there's already OS-supported mechanism for overriding > the locale for individual commands by changing the environment > variable. I'm surprised whatnow(1) hasn't grown the ability to prefix the command with environment-variable assignments; `LANG=en_GB.utf8 send'. Oh, and a `shell' would be nice. Perhaps a '!' prefix for one-off commands? -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Hi Ken, > > (3) assume charset=utf-8 (maybe allow this to be overridden in > > profile) > > We already do (1) and (2). (3) is the problem. Other people who have > thoughts on this topic are free to weigh in. Personally, I believe > that if you're doing LANG=C, you shouldn't be dealing with any 8-bit > characters at all. Isn't that's what that means? Agreed. I eventually moved from LC_ALL=C to LANG=en_GB.utf8 and it isn't too painful these days. GNU grep and others have worked on the performance hit they had initially and for those times when I do want, e.g. sort(1), to be in the C locale I use $ cat ~/bin/C #! /bin/sh # LC_ALL has precedence over LANG according to POSIX[1], but we may as # well stamp out any traces by setting LANG too. # 1. The Open Group Base Specifications, Ch. 8 Environment Variables. LC_ALL=C LANG=C exec -- "$@" $ BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a goer. Valid UTF-8 and valid GB2312 can share the same sequences, especially if it's just the odd `£' or `拢` in ASCII text. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>The problem I've had with it in the past is that in a situation where *no* >mail can be sent, you don't get a notification back. Not much surprise >there, and I've found that the error message does get left behind in >a file in the drafts folder. But this mhbuild failure neither sends >warning mail nor leaves any file that I can find. Yeah, that was an oversight on my part; I'll fix that. >I generally run with LANG=C, which I suppose would have that effect. >I could probably arrange to override that environment setting while >calling "send", but it'd be easier if send had a command line switch >for it. I just have a hard time adding a switch to send (or really, any nmh utility) when there's already OS-supported mechanism for overriding the locale for individual commands by changing the environment variable. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken Hornstein writes: >> Personally I'd love it if send did something like: >> (1) if text is entirely 7-bit: specify charset=us-ascii >> (2) if environment specifies a non-ascii character set, use that >> (3) assume charset=utf-8 (maybe allow this to be overridden in profile) > We already do (1) and (2). OK. > (3) is the problem. Other people who have > thoughts on this topic are free to weigh in. Personally, I believe that > if you're doing LANG=C, you shouldn't be dealing with any 8-bit characters > at all. Isn't that's what that means? Well, whether you intentionally type any and whether some happen to creep into your email are two different things. As an example: I am suspicious now that my problem really stemmed from exmh choosing to use both -push and -forward; the latter is documented as "If -forward is given, then a copy of the draft will be attached to this failure notice." So I am thinking that it stuck the UTF8-containing text onto the failure notice, and then that send attempt failed for exactly the same reason, ie it was rejected by the character set strictness check. Even if you're right that there was no send attempt at all, I'm expecting that once it's there it will fail like this :-( So basically the problem here is one of robustness. Yeah, it would be nice to be sure that what you are sending is 100% valid. But I don't really agree with the tradeoff that's been made of failing when you can't be sure of that. Especially since, if you think you know what non-ASCII encoding a bit of text is in, you're just fooling yourself anyway. It's impossible to distinguish the ISO 8859 variants from each other, and at best heuristic to tell whether text is in UTF-8 or an ISO 8859 variant. Maybe we could just leave off the character set spec if it turns out to be definitely wrong? regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>Personally I'd love it if send did something like: > >(1) if text is entirely 7-bit: specify charset=us-ascii > >(2) if environment specifies a non-ascii character set, use that > >(3) assume charset=utf-8 (maybe allow this to be overridden in profile) We already do (1) and (2). (3) is the problem. Other people who have thoughts on this topic are free to weigh in. Personally, I believe that if you're doing LANG=C, you shouldn't be dealing with any 8-bit characters at all. Isn't that's what that means? --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Laura Creighton writes: > Since us-ascii is a perfect subset of utf-8, is there any reason that nmh > couldn't take a look at the locale, and if it is us-ascii just use uft-8? All modern character sets are supersets of us-ascii, so that argument doesn't really get us far :-(. Personally I'd love it if send did something like: (1) if text is entirely 7-bit: specify charset=us-ascii (2) if environment specifies a non-ascii character set, use that (3) assume charset=utf-8 (maybe allow this to be overridden in profile) but I'm not sure anyone else cares enough about it. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>Since us-ascii is a perfect subset of utf-8, is there any reason that nmh >couldn't take a look at the locale, and if it is us-ascii just use uft-8? Well ... us-ascii is ALSO a perfect subset of iso-8859-1. Or a whole lot of character sets, actually. Some people would argue those are more correct :-/ I realize we could check to see if a character is a valid utf-8 multibyte sequence and that's got a very high probability of always being right. But what if it isn't; what should we do then? Also, it kind of strikes me as the wrong solution, and not just because of the additional complexity. The locale setting is supposed to indicate to utilities which character set you're using. So we (rather reasonably, I would argue) use that in nmh to determine the character set for input and display. If you're putting an 8-bit character into a message when you've told us that you are always going to be sending US-ASCII ... well, what are we supposed to do? That seems like an error condition to me. You can explicitly override the character set in your draft for a single message (see mhbuild(1)) if you want to do something different for individual messages, but absent that I think going with the locale character set is the only solution. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
Ken Hornstein writes: >> Apparently, it's also trying to enforce that by rejecting any >> non-plain-ASCII content. This is a real pain, mainly because whatever >> it's doing isn't playing well with exmh: the post simply silently doesn't >> happen. That's several notches below the already pretty awful handling >> of post errors that I was used to. > AFAIK, when send doesn't happen you should always get an error, and an > exit with a non-zero error code. Certainly when a send fails for me > with exmh I always know about it. This is assuming you don't use -push. > So if this is failing, then that's a bug. If you're using -push ... well, > then what is happening is exactly what is supposed to be happening :-/ Yeah, I've been using exmh's "async" mode, which is documented as doing the send in background and returning errors via email. I see that this appears to boil down to adding "-push -forward" to the arguments to send. If I switch exmh to the "wait" mode and try a failing case, I get a popup window with /usr/bin/mhbuild: exit 1 mhbuild: Text content contains 8 bit characters, but character set is US-ASCII so I guess I'll be changing over to that. > Hm, in theory I see that you're supposed to get email back when push > fails. I'm not sure that's been tested in like forever. I'm not actually > sure what is supposed to do that. Ah, alright ... I see there's an alert() > function in uip/sendsbr.c. I suspect we're not calling that if mhbuild > fails. The problem I've had with it in the past is that in a situation where *no* mail can be sent, you don't get a notification back. Not much surprise there, and I've found that the error message does get left behind in a file in the drafts folder. But this mhbuild failure neither sends warning mail nor leaves any file that I can find. > Which would happen if (a) you put an 8-bit character in your draft, and > (b) your locale is set to US-ASCII. Nmh takes the character set to use > out of the user's locale. I generally run with LANG=C, which I suppose would have that effect. I could probably arrange to override that environment setting while calling "send", but it'd be easier if send had a command line switch for it. Thanks for responding! regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
In a message of Mon, 17 Oct 2016 13:09:35 -0400, Ken Hornstein writes: >>So I updated to the new RHEL6 package of nmh 1.6 (had been on 1.5). >>I've found that it now wants to mime-ify outgoing mail and among >>other things attaches >> Content-type: text/plain; charset="us-ascii" >>Apparently, it's also trying to enforce that by rejecting any >>non-plain-ASCII content. This is a real pain, mainly because whatever >>it's doing isn't playing well with exmh: the post simply silently doesn't >>happen. That's several notches below the already pretty awful handling >>of post errors that I was used to. > >AFAIK, when send doesn't happen you should always get an error, and an >exit with a non-zero error code. Certainly when a send fails for me >with exmh I always know about it. This is assuming you don't use -push. >So if this is failing, then that's a bug. If you're using -push ... well, >then what is happening is exactly what is supposed to be happening :-/ > >Hm, in theory I see that you're supposed to get email back when push >fails. I'm not sure that's been tested in like forever. I'm not actually >sure what is supposed to do that. Ah, alright ... I see there's an alert() >function in uip/sendsbr.c. I suspect we're not calling that if mhbuild >fails. > >>I don't usually compose mail that isn't straight ASCII, but I've already >>been burnt twice this morning by trying to forward text that included >>a stray UTF8 character or two. >> >>Any suggestions on how to improve this? Ideally I'd like it to pass >>through what it's told to, perhaps changing the charset marking to >>utf8 when necessary. > >Well, that's what supposed to happen, and that's what happens for me. > >I have a strong suspicion that if you were to get the error back (e.g., >not use -push if you are), it might show something like this: > >Text content contains 8 bit characters, but character set is US-ASCII > >Which would happen if (a) you put an 8-bit character in your draft, and >(b) your locale is set to US-ASCII. Nmh takes the character set to use >out of the user's locale. If you're forwarding an email without using >MIME forwarding, then nmh doesn't have any idea what the character set >should be; that might be a problem because it could guess wrong. > >Solutions include: > >- Using MIME forwarding (forw -mime) >- Setting an 8-bit locale, but you might get the character set wrong there. > >If things are really crapping out with no error and you're not using -push, >clearly that's a bug we need to fix. Also, I guess we should probably send >an error email if -push is being used and mhbuild fails. > >--Ken Since us-ascii is a perfect subset of utf-8, is there any reason that nmh couldn't take a look at the locale, and if it is us-ascii just use uft-8? Laura ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] nmh 1.6: character set checks and exmh compatibility
>So I updated to the new RHEL6 package of nmh 1.6 (had been on 1.5). >I've found that it now wants to mime-ify outgoing mail and among >other things attaches > Content-type: text/plain; charset="us-ascii" >Apparently, it's also trying to enforce that by rejecting any >non-plain-ASCII content. This is a real pain, mainly because whatever >it's doing isn't playing well with exmh: the post simply silently doesn't >happen. That's several notches below the already pretty awful handling >of post errors that I was used to. AFAIK, when send doesn't happen you should always get an error, and an exit with a non-zero error code. Certainly when a send fails for me with exmh I always know about it. This is assuming you don't use -push. So if this is failing, then that's a bug. If you're using -push ... well, then what is happening is exactly what is supposed to be happening :-/ Hm, in theory I see that you're supposed to get email back when push fails. I'm not sure that's been tested in like forever. I'm not actually sure what is supposed to do that. Ah, alright ... I see there's an alert() function in uip/sendsbr.c. I suspect we're not calling that if mhbuild fails. >I don't usually compose mail that isn't straight ASCII, but I've already >been burnt twice this morning by trying to forward text that included >a stray UTF8 character or two. > >Any suggestions on how to improve this? Ideally I'd like it to pass >through what it's told to, perhaps changing the charset marking to >utf8 when necessary. Well, that's what supposed to happen, and that's what happens for me. I have a strong suspicion that if you were to get the error back (e.g., not use -push if you are), it might show something like this: Text content contains 8 bit characters, but character set is US-ASCII Which would happen if (a) you put an 8-bit character in your draft, and (b) your locale is set to US-ASCII. Nmh takes the character set to use out of the user's locale. If you're forwarding an email without using MIME forwarding, then nmh doesn't have any idea what the character set should be; that might be a problem because it could guess wrong. Solutions include: - Using MIME forwarding (forw -mime) - Setting an 8-bit locale, but you might get the character set wrong there. If things are really crapping out with no error and you're not using -push, clearly that's a bug we need to fix. Also, I guess we should probably send an error email if -push is being used and mhbuild fails. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
[Nmh-workers] nmh 1.6: character set checks and exmh compatibility
So I updated to the new RHEL6 package of nmh 1.6 (had been on 1.5). I've found that it now wants to mime-ify outgoing mail and among other things attaches Content-type: text/plain; charset="us-ascii" Apparently, it's also trying to enforce that by rejecting any non-plain-ASCII content. This is a real pain, mainly because whatever it's doing isn't playing well with exmh: the post simply silently doesn't happen. That's several notches below the already pretty awful handling of post errors that I was used to. I don't usually compose mail that isn't straight ASCII, but I've already been burnt twice this morning by trying to forward text that included a stray UTF8 character or two. Any suggestions on how to improve this? Ideally I'd like it to pass through what it's told to, perhaps changing the charset marking to utf8 when necessary. regards, tom lane ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers