Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-29 Thread Richard Stallman
 You can represent one of Emacs' supported Latin alphabets in
 (unencoded) unibyte strings, and Emacs will automatically convert to
 and from multibyte.

AFAIK, Latin-N unibyte strings and iso-8859-N text encoded in Latin-N
use the same numerical codes for the same characters, so they are
indistinguishable.

I think that is true, but if that's what you're doing, you'll
understand it better if you think unibyte representations of these Emacs
characters rather than encoded in a coding system.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-28 Thread Eli Zaretskii
 From: Richard Stallman [EMAIL PROTECTED]
 CC: [EMAIL PROTECTED], emacs-pretest-bug@gnu.org,
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
 Date: Fri, 27 Oct 2006 09:33:35 -0400
 
  There is a big difference between unibyte strings and encoded unibyte
  strings.
 
 What is that difference?
 
 You can represent one of Emacs' supported Latin alphabets in
 (unencoded) unibyte strings, and Emacs will automatically convert to
 and from multibyte.

AFAIK, Latin-N unibyte strings and iso-8859-N text encoded in Latin-N
use the same numerical codes for the same characters, so they are
indistinguishable.

Handa-san, am I right?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-28 Thread Richard Stallman
 However, if you store encoded text in unibyte strings, you are
 responsible for decoding and encoding when necessary.  You have to
 keep track, everywhere, of whether the data is encoded or not.

It's pretty easy to keep track of it: unibyte == encoded, multibyte
== decoded.

What you're proposing is a convention which a certain program could
use internally.  It might be a workable convention for some purposes.
But it is not automatic, and not required by Emacs.

 You can represent one of Emacs' supported Latin alphabets in
 (unencoded) unibyte strings, and Emacs will automatically convert to
 and from multibyte.

And this use was very convenient for Emacs-20 where we wanted to keep some
backward compatibility with code that was not MULE-aware.

But nowadays any code which relies on this is simply broken, AFAIC, because
it'll only work in environments using a iso-8859 encoding (more or less)

I think you're mistaken.  The conversion between unibyte and multibyte
involves internal Emacs characters.  It concerns character sets, not
coding systems.

However, it is true that the use of unibyte strings is only applicable
to alphabets such as could be represented in unibyte strings.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-27 Thread Eli Zaretskii
 From: Richard Stallman [EMAIL PROTECTED]
 CC: [EMAIL PROTECTED], emacs-pretest-bug@gnu.org,
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
 Date: Thu, 26 Oct 2006 04:52:56 -0400
 
 There is a big difference between unibyte strings and encoded unibyte
 strings.

What is that difference?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-27 Thread Richard Stallman
 There is a big difference between unibyte strings and encoded unibyte
 strings.

What is that difference?

You can represent one of Emacs' supported Latin alphabets in
(unencoded) unibyte strings, and Emacs will automatically convert to
and from multibyte.

However, if you store encoded text in unibyte strings, you are
responsible for decoding and encoding when necessary.  You have to
keep track, everywhere, of whether the data is encoded or not.

We implemented the ability to do encoding manually because sometimes
it is necessary to decode parts of a file in different ways (e.g.,
mailboxes).


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-27 Thread Stefan Monnier
 You can represent one of Emacs' supported Latin alphabets in
 (unencoded) unibyte strings, and Emacs will automatically convert to
 and from multibyte.

And this use was very convenient for Emacs-20 where we wanted to keep some
backward compatibility with code that was not MULE-aware.

But nowadays any code which relies on this is simply broken, AFAIC, because
it'll only work in environments using a iso-8859 encoding (more or less) and
will thus be unusable with in asian environments or in utf-8 (which is very
quickly taking over the iso-8859 world).

 However, if you store encoded text in unibyte strings, you are
 responsible for decoding and encoding when necessary.  You have to
 keep track, everywhere, of whether the data is encoded or not.

It's pretty easy to keep track of it: unibyte == encoded, multibyte
== decoded.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-26 Thread Richard Stallman
Gnus stored a name of a news group in encoded form.

There is a big difference between unibyte strings and encoded unibyte
strings.  The latter indeed requires a lot of special care.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-26 Thread Kenichi Handa
In article [EMAIL PROTECTED], Richard Stallman [EMAIL PROTECTED] writes:

 C-y/M-y uses `insert' somewhere internally.  My suggestion is to make
 `insert' signal an error when faced with the need to insert a multibyte
 string in a unibyte buffer.  This doesn't mean that C-y/M-y should 
 propagate
 this error.

 That might work.  We could try it, after the release.

Stefan, how about start trying it in emacs-unicode-2 now?  I
generally agree with your view about unibyte-multibyte
problem.  You also proposed to change the current automatic
unibyte-multibyte conversion from string-make-multibyte
method to string-to-multibyte method a while ago, didn't
you?  I think that change is good too.

---
Kenichi Handa
[EMAIL PROTECTED]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-25 Thread Richard Stallman
If one uses the default multibyte session, using unibyte strings is
prone to subtle problems as described in this thread.

I was not following the thread.  Could you explain the problem
that was encountered?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-25 Thread Richard Stallman
Code which implicitly converts text from multibyte to unibyte (and vice
versa), using nonascii-*, will presumably be used in all kinds of locales,
including BIG5 ones.  So knowing what happens in this case is
still relevant.

It is not hard to know what happens--that is documented in the Lisp
Manual.  (Do you think any of it is not clear?)

Meanwhile, I think that the presumption of the above text is incorrect.
Unibyte text can only handle certain European alphabets.  If you use
unibyte text, you should make sure to use it only for them.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-25 Thread Richard Stallman
C-y/M-y uses `insert' somewhere internally.  My suggestion is to make
`insert' signal an error when faced with the need to insert a multibyte
string in a unibyte buffer.  This doesn't mean that C-y/M-y should propagate
this error.

That might work.  We could try it, after the release.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-24 Thread Stefan Monnier
  My point was simply if you stay 100% within multibyte, it all works,
  and if you stay 100% in unibyte it all works
 
  The former is true, the latter isn't, AFAIK.  ``Normal'' Emacs
  primitives and subroutines always do TRT with multibyte strings, while
  with unibyte you need to be careful which ones you call.
 
 Care to give an example of what you're thinking about, where purely unibyte
 strings and buffers are not properly handled?

 Are you talking about a unibyte Emacs session?  If so, that's not what
 I had in mind.  I'm talking about using unibyte strings in a multibyte
 session.

I'm not quite sure what is a unibyte session, but I think stay 100% in
unibyte is fairly clear: only use unibyte buffers and strings in the
relevant code (while other unrelated buffers and strings may be multibyte).
So I think we're thinking about the same situation.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-24 Thread Eli Zaretskii
 Cc: [EMAIL PROTECTED],  emacs-pretest-bug@gnu.org,  [EMAIL PROTECTED],
 [EMAIL PROTECTED]
 From: Stefan Monnier [EMAIL PROTECTED]
 Date: Tue, 24 Oct 2006 11:22:51 -0400
 
 I'm not quite sure what is a unibyte session

A.k.a. emacs --unibyte.

 but I think stay 100% in
 unibyte is fairly clear: only use unibyte buffers and strings in the
 relevant code (while other unrelated buffers and strings may be multibyte).

I think it's practically impossible to use only unibyte buffers for
any serious work, and therefore I don't consider this a feasible
solution.

If one uses the default multibyte session, using unibyte strings is
prone to subtle problems as described in this thread.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-24 Thread Richard Stallman
 It works correctly, provided the characters in that string can be
 expressed in the unibyte buffer.

But which characters can be expressed is poorly specified.  E.g. Tell me
which chars can be expressed in a unibyte buffer in a BIG5 locale?

Mentioning the locale is somewhat of a red herring, since what controls
this conversion is (effectively) nonascii-insert-offset.

Mentioning BIG5 is a second red herring.  You can't represent Chinese
in 8-bit characters, but that is not Emacs' fault.

Do you think that we need to document nonascii-insert-offset more
prominently?  If so, where else should we talk about it?

 If people generally agree it would be better to signal an error,
 we could do that.  However, that would cause trouble trying to use
 M-y to move past multibyte entries in the kill ring to reach the
 unibyte entry you really want.

When the insertion is a user-level operation, the elisp code should make
sure to manually do the encoding/decoding, using e.g. the default file
coding-system.

I don't understand -- could you be more specific?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-24 Thread Stefan Monnier
 I'm not quite sure what is a unibyte session
 A.k.a. emacs --unibyte.

I know that, but I'm not quite sure what it entails.  This discussion is
within the scope of code such as Gnus's, i.e. code which should work
either way.

 but I think stay 100% in
 unibyte is fairly clear: only use unibyte buffers and strings in the
 relevant code (while other unrelated buffers and strings may be multibyte).

 I think it's practically impossible to use only unibyte buffers for
 any serious work, and therefore I don't consider this a feasible
 solution.

The operative term there is in the relevant code.  E.g. Gnus could easily
(as opposed to practically impossible) use unibyte for all its buffers
and strings.  It's also very common (and often necessary) to use unibyte
buffers and strings to interact with underlying processes or network
connections.  Typically because the data passed backforth may use mixes of
various encodings.

 If one uses the default multibyte session, using unibyte strings is
 prone to subtle problems as described in this thread.

But those problems are not specific to unibyte, but to the mix of unibyte
and multibyte.  In most packages such as Gnus it's just as hard/impossible
to use only multibyte as it is to use only unibyte.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-24 Thread Stefan Monnier
 It works correctly, provided the characters in that string can be
 expressed in the unibyte buffer.

 But which characters can be expressed is poorly specified.  E.g. Tell me
 which chars can be expressed in a unibyte buffer in a BIG5 locale?

 Mentioning the locale is somewhat of a red herring, since what controls
 this conversion is (effectively) nonascii-insert-offset.

The nonascii-insert-offset and noonascii-translation-table is AFAIK
initialized differently depending on the locale (and/or language
environment) and users typically don't fidle with that table directly but
via their locale setting instead.

 Mentioning BIG5 is a second red herring.  You can't represent Chinese
 in 8-bit characters, but that is not Emacs' fault.

Code which implicitly converts text from multibyte to unibyte (and vice
versa), using nonascii-*, will presumably be used in all kinds of locales,
including BIG5 ones.  So knowing what happens in this case is
still relevant.

 Do you think that we need to document nonascii-insert-offset more
 prominently?  If so, where else should we talk about it?

No, I think we should kill it instead and declare in error any code which
tries to use it.  It made sense in Emacs-20 when the multibyte support was
weaker, but nowadays it just encourages sloppy code which breaks down in
different language environments.

 If people generally agree it would be better to signal an error,
 we could do that.  However, that would cause trouble trying to use
 M-y to move past multibyte entries in the kill ring to reach the
 unibyte entry you really want.

 When the insertion is a user-level operation, the elisp code should make
 sure to manually do the encoding/decoding, using e.g. the default file
 coding-system.

 I don't understand -- could you be more specific?

C-y/M-y uses `insert' somewhere internally.  My suggestion is to make
`insert' signal an error when faced with the need to insert a multibyte
string in a unibyte buffer.  This doesn't mean that C-y/M-y should propagate
this error.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Richard Stallman
I agree with making Gnus encode non-ASCII group names only when
communicating with nntp servers, and I (or someone?) will try it
in the future.  I think it should be done in the Gnus trunk
first, and it will take time for coding, testing, and possibly
bug fixing.

If the existing code works for the users, I'd prefer that we not
install a further redesign before the Emacs 22 release.

Thanks.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Richard Stallman
This said, I agree that Emacs should help more.  E.g. by signalling an error
when trying to insert multibyte text into a unibyte buffer.

This operation converts the string to unibyte.  It works correctly,
provided the characters in that string can be expressed in the unibyte
buffer.

If people generally agree it would be better to signal an error,
we could do that.  However, that would cause trouble trying to use
M-y to move past multibyte entries in the kill ring to reach the
unibyte entry you really want.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Stefan Monnier
 Agreed, but note that this problem is as much on the unibyte side as it is
 on the multibyte side

 Not if I never let unibyte strings into my buffers and strings (modulo
 bugs, of course).

I don't follow.  Not that it matters.

My point was simply if you stay 100% within multibyte, it all works, and if
you stay 100% in unibyte it all works, and it's only when you mix them two
that things don't work.  So the problem is neither with unibyte nor with
multibyte but with their interaction: the problem takes its root in the
conflation of the concept of byte and the concept of char.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Stefan Monnier
 This said, I agree that Emacs should help more.  E.g. by signalling an
 error when trying to insert multibyte text into a unibyte buffer.

 This operation converts the string to unibyte.

Indeed.  Using a default (and poorly specified) encoding method.

 It works correctly, provided the characters in that string can be
 expressed in the unibyte buffer.

But which characters can be expressed is poorly specified.  E.g. Tell me
which chars can be expressed in a unibyte buffer in a BIG5 locale?

 If people generally agree it would be better to signal an error,
 we could do that.  However, that would cause trouble trying to use
 M-y to move past multibyte entries in the kill ring to reach the
 unibyte entry you really want.

When the insertion is a user-level operation, the elisp code should make
sure to manually do the encoding/decoding, using e.g. the default file
coding-system.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Eli Zaretskii
 Cc: [EMAIL PROTECTED],  emacs-pretest-bug@gnu.org,  [EMAIL PROTECTED],
 [EMAIL PROTECTED]
 From: Stefan Monnier [EMAIL PROTECTED]
 Date: Mon, 23 Oct 2006 15:11:09 -0400
 
 My point was simply if you stay 100% within multibyte, it all works, and if
 you stay 100% in unibyte it all works

The former is true, the latter isn't, AFAIK.  ``Normal'' Emacs
primitives and subroutines always do TRT with multibyte strings, while
with unibyte you need to be careful which ones you call.  That was my
point, and the case that started this thread is my evidence.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Stefan Monnier
 My point was simply if you stay 100% within multibyte, it all works, and if
 you stay 100% in unibyte it all works

 The former is true, the latter isn't, AFAIK.  ``Normal'' Emacs
 primitives and subroutines always do TRT with multibyte strings, while
 with unibyte you need to be careful which ones you call.

Care to give an example of what you're thinking about, where purely unibyte
strings and buffers are not properly handled?
After all, such cases are probably bugs.

 That was my point, and the case that started this thread is my evidence.

I must have misunderstood because from what I read in this thread I thought
the problem was due to the fact that one part of the code is using unibyte
strings (for group names) and it's apparently messed up somewhere because it
gets mixed with multibyte data.

Sorry I misunderstood and went on with a rant.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-23 Thread Eli Zaretskii
 Cc: [EMAIL PROTECTED],  emacs-pretest-bug@gnu.org,  [EMAIL PROTECTED],
 [EMAIL PROTECTED]
 From: Stefan Monnier [EMAIL PROTECTED]
 Date: Mon, 23 Oct 2006 16:49:59 -0400
 
  My point was simply if you stay 100% within multibyte, it all works, and if
  you stay 100% in unibyte it all works
 
  The former is true, the latter isn't, AFAIK.  ``Normal'' Emacs
  primitives and subroutines always do TRT with multibyte strings, while
  with unibyte you need to be careful which ones you call.
 
 Care to give an example of what you're thinking about, where purely unibyte
 strings and buffers are not properly handled?

Are you talking about a unibyte Emacs session?  If so, that's not what
I had in mind.  I'm talking about using unibyte strings in a multibyte
session.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-22 Thread Katsumi Yamaoka
 In [EMAIL PROTECTED] Richard Stallman wrote:

 I'd say this design decision will certainly cause subtle bugs, such as
 the one we are discussing in this thread.  I suggest to modify the
 design to not use encoded strings internally.

 I hastened to change the nndoc code so as to use encoded group
 names but I agree with you.  Though to implement it will take
 efforts and a long time, I think it is a subject to have to be
 solved in the future anyway.

 I don't entirely understand that statement.
 Are you about to fix this now, or do you think it should be
 delayed?

I've already fixed the nndoc code in both the Gnus CVS trunk and
the v5-10 branch (it will be merged into the Emacs CVS soon).
Although I haven't yet changed the handling of non-ASCII group
names (that is, Gnus still represents them in the utf-8 encoded
style internally), it won't trouble users.

I agree with making Gnus encode non-ASCII group names only when
communicating with nntp servers, and I (or someone?) will try it
in the future.  I think it should be done in the Gnus trunk
first, and it will take time for coding, testing, and possibly
bug fixing.  So, importing it into Emacs will probably be
inevitably delayed.  At the present time, I don't know whether
it is days, weeks or years.

Regards,


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-22 Thread Stefan Monnier
  It could be, although it would make sense to manipulate group names in
  encoded form, in the sense of not decoded.
 
  It could ``make sense'', but it's IMO a bad idea, since, as we both
  know, Emacs is not well suited to handling unibyte strings.
 
 Huh?  Unibyte strings are perfectly well supported as far as I know.
 
 You have to be careful to remember which strings are unibyte and which are
 multibyte, so you don't decode multibyte strings or encode unibyte strings,
 and especially not implicitly (by inserting a unibyte string in a multibyte
 buffer or vice versa).  So if you mean that it requires discipline, then
 I agree, but otherwise I don't know what you're referring to.

 To me, the second paragraph is precisely the meaning of ``not well
 suited'' and ``not perfectly supported''.  What kind of ``well
 supported'' is that if I as a programmer need to carry with each
 string additional information, and make sure I know _exactly_ what
 primitives are invoked by every function I call, to take care that I
 don't inadvertently call something that deep inside assumes I passed a
 multibyte string?

 That way lies madness.

Agreed, but note that this problem is as much on the unibyte side as it is
on the multibyte side, so that seems to imply that you also thing that Emacs
is not well suited to handling multibyte strings.

This said, I agree that Emacs should help more.  E.g. by signalling an error
when trying to insert multibyte text into a unibyte buffer.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-21 Thread Eli Zaretskii
 Cc: [EMAIL PROTECTED],  emacs-pretest-bug@gnu.org,  [EMAIL PROTECTED],
 [EMAIL PROTECTED]
 From: Stefan Monnier [EMAIL PROTECTED]
 Date: Fri, 20 Oct 2006 18:06:09 -0400
 
  It could be, although it would make sense to manipulate group names in
  encoded form, in the sense of not decoded.
 
  It could ``make sense'', but it's IMO a bad idea, since, as we both
  know, Emacs is not well suited to handling unibyte strings.
 
 Huh?  Unibyte strings are perfectly well supported as far as I know.
 
 You have to be careful to remember which strings are unibyte and which are
 multibyte, so you don't decode multibyte strings or encode unibyte strings,
 and especially not implicitly (by inserting a unibyte string in a multibyte
 buffer or vice versa).  So if you mean that it requires discipline, then
 I agree, but otherwise I don't know what you're referring to.

To me, the second paragraph is precisely the meaning of ``not well
suited'' and ``not perfectly supported''.  What kind of ``well
supported'' is that if I as a programmer need to carry with each
string additional information, and make sure I know _exactly_ what
primitives are invoked by every function I call, to take care that I
don't inadvertently call something that deep inside assumes I passed a
multibyte string?

That way lies madness.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-20 Thread Eli Zaretskii
 Cc: Katsumi Yamaoka [EMAIL PROTECTED],  emacs-pretest-bug@gnu.org,
 [EMAIL PROTECTED],  [EMAIL PROTECTED]
 From: Stefan Monnier [EMAIL PROTECTED]
 Date: Fri, 20 Oct 2006 15:19:43 -0400
 
  I'd say this design decision will certainly cause subtle bugs, such as
  the one we are discussing in this thread.  I suggest to modify the
  design to not use encoded strings internally.
 
 It could be, although it would make sense to manipulate group names in
 encoded form, in the sense of not decoded.

It could ``make sense'', but it's IMO a bad idea, since, as we both
know, Emacs is not well suited to handling unibyte strings.


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-20 Thread Stefan Monnier
  I'd say this design decision will certainly cause subtle bugs, such as
  the one we are discussing in this thread.  I suggest to modify the
  design to not use encoded strings internally.
 
 It could be, although it would make sense to manipulate group names in
 encoded form, in the sense of not decoded.

 It could ``make sense'', but it's IMO a bad idea, since, as we both
 know, Emacs is not well suited to handling unibyte strings.

Huh?  Unibyte strings are perfectly well supported as far as I know.

You have to be careful to remember which strings are unibyte and which are
multibyte, so you don't decode multibyte strings or encode unibyte strings,
and especially not implicitly (by inserting a unibyte string in a multibyte
buffer or vice versa).  So if you mean that it requires discipline, then
I agree, but otherwise I don't know what you're referring to.


Stefan


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-20 Thread Kenichi Handa
In article [EMAIL PROTECTED], Eli Zaretskii [EMAIL PROTECTED] writes:

  Date: Fri, 20 Oct 2006 15:21:53 +0900
  From: Katsumi Yamaoka [EMAIL PROTECTED]
  Cc: emacs-pretest-bug@gnu.org, [EMAIL PROTECTED], [EMAIL PROTECTED]
  
  IIRC, nntp servers understand utf-8 encoded group names.  So,
  someone might have considered making Gnus use them internally is
  convenient to communicate with nntp servers.

 I'd say this design decision will certainly cause subtle bugs, such as
 the one we are discussing in this thread.  I suggest to modify the
 design to not use encoded strings internally.

I agree.  Keeping around encoded strings quite easily leads
to bugs.  String/buffer should be encoded only just before
writing out.

---
Kenichi Handa
[EMAIL PROTECTED]


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-20 Thread Richard Stallman
 I'd say this design decision will certainly cause subtle bugs, such as
 the one we are discussing in this thread.  I suggest to modify the
 design to not use encoded strings internally.

I hastened to change the nndoc code so as to use encoded group
names but I agree with you.  Though to implement it will take
efforts and a long time, I think it is a subject to have to be
solved in the future anyway.

I don't entirely understand that statement.
Are you about to fix this now, or do you think it should be
delayed?


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


Re: `.newsrc.eld' saves chinese group name in wrong coding

2006-10-19 Thread Reiner Steib
On Thu, Oct 19 2006, Katsumi Yamaoka wrote:

 Gnus uses utf-8 encoded non-ASCII group names internally, those
 encoded names are saved in the .newsrc.eld file, and they are
 decoded by utf-8 when displaying.  I had no problem when I once
 tried nnrss groups with Japanese names.  So, I cannot imagine
 what is happening with Zhang Wei, sorry.
[...]
 (push '(\\`nndoc\\(?:\\+[^:]+\\)?:)
   gnus-group-name-charset-group-alist)

 In addition, just now I noticed it is insufficient to solve the
 problem.  Maybe we need to do the fix here and there in Gnus to
 enable it to work with non-ASCII nndoc group names.

The default value of `gnus-group-name-charset-group-alist' is ((.*
. utf-8)), so it should cover all groups, IIUC.  Or am I
misunderstanding the issue?

Why is setting it to nil for nndoc necessary?  Is nndoc handled
differently than other backends?

Bye, Reiner.
-- 
   ,,,
  (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/


___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug


`.newsrc.eld' saves chinese group name in wrong coding

2006-10-17 Thread Zhang Wei

`.newsrc.eld' can't save chinese group name in proper coding. When gnus
is restarted, all of the articles in groups with chinese name are marked
unread. But enter that group, you will find all of the articles are old
articles (marked by an `O'). The file in the attachment is the wrong
formatted `.newsrc.eld', hope that will be helpful.

Same problem with Emacs22 and emacs-unicode-2.

;; -*- mode:emacs-lisp; coding: utf-8-emacs-dos; -*-
;; Gnus startup file.
;; Never delete this file -- if you want to force Gnus to read the
;; .newsrc file (if you have one), touch .newsrc instead.
(setq gnus-newsrc-file-version Gnus v5.11)
(setq gnus-newsrc-last-checked-date 'Sun, 20 Aug 2006 14:34:25 +0800)
(setq gnus-newsrc-alist 
'((\301\367\320\30799.\261\276\265\330\262\342\312\324 3 ((1 . 8)) ((seen (1 
. 8 (#(nnml:2006-10.list.emacs-devel 5 29 (auto-composed nil)) 3 ((1 . 
176)) ((seen (1 . 176))) nnml:) (#(nnml:2006-09.list.emacs-devel 5 29 
(auto-composed nil)) 3 ((1 . 1428)) ((seen (534 . 852) (855 . 1428))) nnml:) 
(#(nnml:sent.news 5 14 (auto-composed nil)) 3 ((1 . 14)) ((seen (1 . 14))) 
nnml:) (#(nnml:list.xemacs 5 16 (auto-composed nil)) 3 ((1 . 56)) ((seen (1 
. 56))) nnml:) (#(nnml:list.debian 5 16 (auto-composed nil)) 3 ((1 . 164)) 
((seen (1 . 164))) nnml:) (#(nnml:sent.mail 5 14 (auto-composed nil)) 3 ((1 
. 19)) ((seen (1 . 19))) nnml:) (#(nnml:mail.gmail 5 15 (auto-composed 
nil)) 3 ((1 . 71)) ((seen (1 . 71)) (reply 10 13 (15 . 16) 22 25 38 49 52)) 
nnml:) (#(nnml:mail.tsinghua 5 18 (auto-composed nil)) 3 ((1 . 927)) 
((reply 896) (seen (881 . 927))) nnml:) (#(nnml:2006-08.list.emacs-devel 5 
29 (auto-composed nil)) 3 ((1 . 854)) ((seen (1 . 600) (602 . 854))) nnml:) 
(#(nnml:mail.misc 5 14 (auto-composed nil)) 3 ((1 . 245)) ((seen (61 . 245))) 
nnml:) (cn.comp.os.linux 3 ((1 . 1129)) ((seen (119 . 1129)) (reply 789 791 
795 797 851 854 867 880 894 975 990 1007 1014 1105 1107 1110 1127))) 
(nndoc+gnus-help:gnus-help 3 ((1 . 9)) ((seen (1 . 9))) (nndoc gnus-help 
(nndoc-address c:/Emacs/etc/gnus-tut.txt) (nndoc-article-type mbox))) 
(nndraft:queue 1 nil nil (nndraft ) ((gnus-dummy (gnus-draft-mode 
(nndraft:drafts 1 nil nil (nndraft ) ((gnus-dummy (gnus-draft-mode))
(setq ___
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug