Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Barry Warsaw
On Apr 18, 2014, at 03:07 AM, Stephen J. Turnbull wrote:

>Getting it right by design ... well, that's why we need Mailman 3.

And really, Python 3.  The email package in Python 3.4 rocks.

-Barry
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Stephen J. Turnbull
I see you've already responded, but there are a few things I'd like to
clarify.

Laura Creighton writes:

 > But you and I could quite easily both want English(USA) as the
 > default language for our lists, but you also want us-ascii while I
 > want utf-8.  The way things stand now, we cannot both use the same
 > mailman host, and both get what we want, correct?

In this particular case, you can choose UTF-8, and you both get what
you want.  US-ASCII is a subset of UTF-8.  Anybody else *may* have a
problem if they have sufficiently ancient software that it can't
handle UTF-8, but that's a rapidly vanishing issue.

In fact *right now* you and I and Wang Han Lo can use English,
Japanese, and Mandarin on the same list at the same time, each posting
and setting our subscription options in our preferred language.  It's
only footers and headers that have this issue, and that's at least
partly because even today there's no reliable way to mix charsets in a
message (too many users still use MUAs-that-suck).  For most purposes,
Mailman is pretty well internationalized, it's just that some corner
cases remain ugly.

The other cause is historical accident.  Email is very messy -- it's
one of the oldest Internet protocols.  Mailman itself goes back to a
time when neither Python nor its email package had a coherent way of
dealing with multilingual applications.  So we've been overhauling
various parts of Mailman 2 as necessary.  And users -- well, many
Mailman list admins think that they type Japanese or German rather
than EUC-JP or ISO-8859-15, and undoubtedly the "charset-per-language"
architecture was intended to make life easy for them.

Why nobody ever got around to properly internationalizing the headers
and footers (ie, allowing charsets defined per list) I'm not sure.  I
suspect it's because few users ever tried on international lists: they
just use English in the footers as lingua franca.  This is the first
time I've seen somebody reporting issues with the internationalization
of the footer, and I've been following Mailman since 1999 or so.

Getting it right by design ... well, that's why we need Mailman 3.  We
know a lot more about lots of things than we did when Mailman 2 was
designed.

Steve
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Laura Creighton
In a message of Thu, 17 Apr 2014 15:54:19 +0200, Laura Creighton writes:
>In a message of Thu, 17 Apr 2014 05:41:15 -0700, Mark Sapiro writes:
>>The issue is msg_footer is assumed to be in the character set of the
>>list's language, us-ascii by default for English. I don't think Mailman
>>does the right thing in this case.
>
>>From my perspective, the problem is that by having these things
>defined in mm_cfg.py, all mailman administrators are stuck with 
>whatever decisions their mailman host made for whatever language
>they chose as the default language for their list.  But you and I
>could quite easily both want English(USA) as the default language
>for our lists, but you also want us-ascii while I want utf-8.  The
>way things stand now, we cannot both use the same mailman host, and
>both get what we want, correct?
>
>Now that my problem has gone from 'getting the EP footers to work' to
>'understanding what exactly is going on here'.  And right now I do not
>see why the charset for the lists' language has to be hard coded in
>mm_cfg.py, nor why there has to be exactly one value for any given language
>which mailman supports.
>
>Thank you for your patience,
>Still trying to understand here,
>Laura

Sorry about this note -- mail is arriving in an odd order here.  The mail
where you explained this perfectly arrived after your other mail, so I
was still confused when I wrote this note.

Laura

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Mark Sapiro
On 04/17/2014 06:54 AM, Laura Creighton wrote:
> In a message of Thu, 17 Apr 2014 05:41:15 -0700, Mark Sapiro writes:
> 
> Now that my problem has gone from 'getting the EP footers to work' to
> 'understanding what exactly is going on here'.  And right now I do not
> see why the charset for the lists' language has to be hard coded in
> mm_cfg.py, nor why there has to be exactly one value for any given language
> which mailman supports.


I tried to explain that for any given language, the message catalog and
templates are encoded in some specific character set. If you simply
change the character set for the list, you must change it to one which
is a strict superset or everything breaks unless you also recode the
message catalog and templates. This works for changing us-ascii to,
e.g., iso-8859-1 or utf-8, but not in general.

Things will be easier in MM 3. Most things will be unicode internally
and utf-8 will be the preferred encoding.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Barry Warsaw
On Apr 17, 2014, at 03:54 PM, Laura Creighton wrote:

>Now that my problem has gone from 'getting the EP footers to work' to
>'understanding what exactly is going on here'.  And right now I do not
>see why the charset for the lists' language has to be hard coded in
>mm_cfg.py, nor why there has to be exactly one value for any given language
>which mailman supports.

The big problem is that you can't have multilingual footers.  I'm hoping to
fix this in MM3 by supporting a lookup scheme that would allow different
footers (and other decorations, templates, and messages) per language.

-Barry
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Laura Creighton
In a message of Thu, 17 Apr 2014 05:41:15 -0700, Mark Sapiro writes:
>The issue is msg_footer is assumed to be in the character set of the
>list's language, us-ascii by default for English. I don't think Mailman
>does the right thing in this case.

>From my perspective, the problem is that by having these things
defined in mm_cfg.py, all mailman administrators are stuck with 
whatever decisions their mailman host made for whatever language
they chose as the default language for their list.  But you and I
could quite easily both want English(USA) as the default language
for our lists, but you also want us-ascii while I want utf-8.  The
way things stand now, we cannot both use the same mailman host, and
both get what we want, correct?

Now that my problem has gone from 'getting the EP footers to work' to
'understanding what exactly is going on here'.  And right now I do not
see why the charset for the lists' language has to be hard coded in
mm_cfg.py, nor why there has to be exactly one value for any given language
which mailman supports.

Thank you for your patience,
Still trying to understand here,
Laura

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Mark Sapiro
On 04/17/2014 05:28 AM, Stephen J. Turnbull wrote:
> 
> Mailman already has about 200 lines of logic to handle cases where the
> footer charset is incompatible with the message's charset.  Have you
> tried simply changing the Python escape to a literal EN DASH in the
> web interface?  I hope Mailman is smart enough to convert that to
> Unicode internally, and all should Just Work[tm].


I see Stephen and I are talking over each other again.

The issue is msg_footer is assumed to be in the character set of the
list's language, us-ascii by default for English. I don't think Mailman
does the right thing in this case.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Mark Sapiro
On 04/17/2014 04:19 AM, Laura Creighton wrote:
> 
> But unless I have overlooked something, there is no way to make a charset
> change on a per-list basis through the mailman administrative interface.
> Instead you have to edit mm_cfg.py 


Correct.


> Even if I had root access on python.org, I wouldn't really want to inflict
> utf-8 on everybody else just because it makes things more convenient for
> the EuroPython mailing list.
> 
> But needing to edit mm_cfg.py strikes me as a very odd design choice, odd
> enough that I figure either a) this isn't so and I have overlooked something,
> or b) it absolutely must be done this way for a reason I do not understand.


There are a couple of issues.

Mailman was designed a long time ago in a galaxy far, far away (SciFi
ObRef). There was no Unicode or MIME in common use and email was ASCII
text. Everything was English and us-ascii.

When Mailman was internationalized, other languages required different
character sets so a scheme was developed where each translation had it's
own character set, but that for English was retained as us-ascii.

We can't give a list owner the ability to change the character set for a
list independent of the list language, because the templates and message
catalog for that language are encoded with a particular encoding, and
changing the character set without recoding the message catalog and
templates in the new character set would break everything.

The one exception to this is English. Because utf-8 is a strict superset
of us-ascii, one can change the charset for English to utf-8 and things
will continue to work. We haven't done that for reasons of superstition,
and because we use the Python email library which base64 encodes utf-8
text in message bodies rendering it unreadable by someone with a
non-MIME MUA.

Thus, a site can change the encoding for English to utf-8 if it chooses,
but there is no mechanism to do this per-list.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Stephen J. Turnbull
Hi, Laura!

Laura Creighton writes:

 > But the Europython mailing list is configured so that its messages
 > come out
 > 
 > Content-Type: text/plain; charset="us-ascii"

This isn't from the list or site configuration, this is from the
poster's mail user agent (MUA).  The mailing list does not choose the
charset for the message; the MUA does.  For example, grepping my
archive of python-dev messages I see 3 different variants of UTF-8
(capitalization and quoting), us-ascii, iso-8859-1, and window-1252
(each in several variants).

Mailman already has about 200 lines of logic to handle cases where the
footer charset is incompatible with the message's charset.  Have you
tried simply changing the Python escape to a literal EN DASH in the
web interface?  I hope Mailman is smart enough to convert that to
Unicode internally, and all should Just Work[tm].

If that doesn't work, change the EN DASH to "--", and report it as a
bug.  We'll see what we can do in 2.1.19, before EuroPython is held in
Göteborg or Łódź. :-/

 > Since \x96 is an unrecognised character in us-ascii,

It's not even a character here, it's a raw byte, which may or may not
get recognized correctly by Mailman depending on the list's preferred
charset.  Somebody was way too tricky for their own good.

 > But unless I have overlooked something, there is no way to make a charset
 > change on a per-list basis through the mailman administrative interface.

There's no way to make a charset change in posts at all; it's not
Mailman's job to do that, really.  I suppose we could convert all
posts to UTF-8, which would make the logic mentioned above a lot
simpler, but that would probably annoy a few people and might not work
for some variant charsets.

Steve
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

[Mailman-Users] Trying to understand charset encoding in mailman

2014-04-17 Thread Laura Creighton
I am trying to understand how charset encoding works, and I get the
distinct idea that I must be missing one small, vital piece of
information.

Background: The problem arose as follows:

Somebody changed the footer of the EuroPython Mailing list which is hosted
at python.org to be:

EuroPython 2014 \x96 Berlin, 21th\x9627th July

Note the two \x96 s.  The intent was almost certainly to have this string
interpreted by the windows-1252 charset, where \x96 means a en dash.  But
the Europython mailing list is configured so that its messages come out

Content-Type: text/plain; charset="us-ascii"

Since \x96 is an unrecognised character in us-ascii, my mailer complained
bitterly every time I read an EP message.  Being a list admin, this
bothered me, and I thought it would be my job to fix things.

I thought I would change the charset to utf-8.  After all, most European
languages do not fit into "us-ascii" in any event.  What if the conference
had been held in my home town of Göteborg, for instance?

But unless I have overlooked something, there is no way to make a charset
change on a per-list basis through the mailman administrative interface.
Instead you have to edit mm_cfg.py 

Even if I had root access on python.org, I wouldn't really want to inflict
utf-8 on everybody else just because it makes things more convenient for
the EuroPython mailing list.

But needing to edit mm_cfg.py strikes me as a very odd design choice, odd
enough that I figure either a) this isn't so and I have overlooked something,
or b) it absolutely must be done this way for a reason I do not understand.

Can somebody please explain this?

Thank you very much,
Laura Creighton
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org