subject:"\[gentoo\-user\] Kernel update messed up console encoding"

Re: [gentoo-user] Kernel update messed up console encoding

2009-03-01 Thread Florian v. Savigny


Hi Sebastian,

   That is a problem of the consolefont, since the console can't display it 
   with cp1250...

Maybe - if this font has codepage 1250, as one would assume, it should
normally display a capital A with a short accent (I think that's a
slavonic letter) in position hex c3. True, that is different from the
capital A tilde it should have in iso-8859-1. But this is hardly the
heart of the matter- the c3 shouldn't be there in the first place.

   echo äöüÄÖÜß  console.test
   then write the same in emacs and save as emacs.test.
   
   And then compare the output of
   
   file console.test
   and
   file emace.test
   
   If there are differences, somewhere here lies the Problem

But I have already described the result of the first procedure in my
first posting (UTF-8 when echoed under the new kernel, iso-8859-1 when
echoed under the old kernel) and the result of the second one - IN
DETAIL - in my last posting (too long to repeat; see there), which I
assume you have read. Have I missed something?

   locale 
   should shown it to you

Thanks. $LANG and $LC_ALL are not set (i.e. locale simply shows
LANG= and LC_ALL= with no values). All other LC_... variables are
set to POSIX.

Does nobody know where the kernel controls what the keys of the
console keyboard send when pressed?

(BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.)
   
   Exactly there.

Could you explain that, please (do you perhaps mean this is where the
kernel's behaviour IS CONTROLLED)? As I have repeatedly said, all
variable settings are of course the same under both kernels, so both
definitely behave differently with the same settings.

Regards,
Florian

PS: Just one thing: do you think you could cite only those portions of
postings that you are replying to? Having to wade through tons of
cited material to find any replies is quite hard on the eyes,
especially when understanding one another seems to be difficult.

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-28 Thread Florian v. Savigny



Dear Sebastian,

thank you for your thoughts. I am afraid switching to UTF-8 for
everything, although I see that this is the sound thing to do
eventually, is not currently an option for me - there are far too many
things which depend on that.  (Also, it would tend to obscure or
complicate the problem rather than fix it, since Emacs obviously gets
confused by the console behaviour).

 there still is /etc/conf.d/consolefont that could mess up things

The only variable that's set there is CONSOLEFONT=cp1250. I would
not understand how the font could have an influence on the characters
*produced* by the console, and it seems also difficult to explain why
the shell and Emacs, which of course use the same console font, behave
differently. (Under the shell, it looks fine while you type it,
i.e. you cannot tell that your u umlaut actually consists of two
bytes. But Emacs displays the lower-case umlauts followed by a space
(i.e. two characters, but not those that most of us are probably quite
familiar with, i.e. which you see when UTF-8 is displayed as if it
were ASCII), while for upper-case umlauts and the eszett complains
that e.g. \204 is undefined.)

It definitely looks to me as if the core of the problem is what the
console produces, not what it shows, i.e. what a keypress
produces. The variable CONSOLETRANSLATION is commented out, meaning I
am using the default one, whichever that is.

As to the locale, where can I look that up ... ? I seem to remember I
purposely use no locale (or C, I think), but I don't remember where
I set that.

CONFIG_NLS_DEFAULT is indeed different for the two kernels, but not in
a way that seems to explain anything, as those two encodings differ
only on a few positions (not umlauts or eszett):

linux-2.6.17-gentoo-r7: iso8859-15
linux-2.6.27-gentoo-r8: iso8859-1

Also, I think what I said last time holds: that only applies to
filenames in the filesystem, doesn't it?

I'll follow your suggestion and re-post the problem on gentoo-user-de,
although I think running into that sort of problem might happen to
anybody who uses a European language other than English (one of those
covered by iso-8859-1, more precisely), so comments here are still
welcome! But who still sometimes uses the console, except me?

I think I'll also write a small script that compares the settings in
the two kernel .configs systematically. Could also be of use for later
kernel updates ...

Thanks very much!

Florian

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-28 Thread Eray Aslan

On 28.02.2009 12:34, Florian v. Savigny wrote:
[...]
 I'll follow your suggestion and re-post the problem on gentoo-user-de,
 although I think running into that sort of problem might happen to
 anybody who uses a European language other than English (one of those
 covered by iso-8859-1, more precisely), so comments here are still
 welcome! But who still sometimes uses the console, except me?

A lot of people use the console.  I certainly do.  But I, and I would
assume majority of console users, switched to UTF-8 quiet some time ago
as was suggested earlier in the thread.  Hence, the lack of useful advice.

Good luck.
-- 
Eray

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-28 Thread Sebastian Günther

* Florian v. Savigny (lor...@fsavigny.de) [28.02.09 11:35]:
 
 
 Dear Sebastian,
 
 
  there still is /etc/conf.d/consolefont that could mess up things
 
 The only variable that's set there is CONSOLEFONT=cp1250. I would
 not understand how the font could have an influence on the characters
 *produced* by the console, and it seems also difficult to explain why
 the shell and Emacs, which of course use the same console font, behave
 differently. (Under the shell, it looks fine while you type it,
 i.e. you cannot tell that your u umlaut actually consists of two
 bytes. But Emacs displays the lower-case umlauts followed by a space
 (i.e. two characters, but not those that most of us are probably quite
 familiar with, i.e. which you see when UTF-8 is displayed as if it
 were ASCII), while for upper-case umlauts and the eszett complains
 that e.g. \204 is undefined.)
 
what does file say about the offending files?
Emacs always uses the enconding of the file, where as an redirect uses 
the locale, iirc.

I assume you know the options-mule menu in emacs, there is a lot to 
help with encoding issues...

 As to the locale, where can I look that up ... ? I seem to remember I
 purposely use no locale (or C, I think), but I don't remember where
 I set that.
 
.bashrc

 
 Thanks very much!
 
 Florian
 
 

Sebastian

-- 
  Religion ist das Opium des Volkes.   Karl Marx

 s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de


pgp9w7lDT9keJ.pgp
Description: PGP signature

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-28 Thread Florian v. Savigny


Hi Sebastian,

But Emacs displays the lower-case umlauts followed by a space
etc. etc. ...

   what does file say about the offending files?

I was not actually talking about files when I mentioned Emacs, but
what I see when I *type* into Emacs (such as in this mail
message). But in case you mean what that produces when I save the
result of what I typed into a file, I ran a few tests, and the results
were mixed:

For the 3 lower-case umlauts, file reports UTF-8, consistent with the
number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex
representation of the 6 bytes is: c3 a4 c3 b6 c3 3c.

For the three upper-case umlauts and for the eszett, file reports
iso-8859, also consistent with the number of bytes: 3 characters, 3
bytes. The code position is, however, definitely wrong: it is always
hex c3 (which would be the upper-case A tilde in iso-8859-1, and four
different letters can hardly have the same code position.)

To me this looks as if Emacs puts the first half of the byte sequences
(always the hex c3) into the buffer, while trying to interpret the
other half (see list below) as a command: it will say something like
\204 is undefined. I am quite certain \nnn is an octal number.

eszett: \237 (hex 9f, dec 159)
A uml: \204 (hex 84, dec 132)
O uml: \226 (hex 96, dec 150)
Uuml: \234 (hex 9c, dec 156)

If I am right, the keys thus send:

eszett: c3 9f
A uml: c3 84
O uml: c3 96
U uml: c3 9c
a uml: c3 a4
o uml: c3 b6
u uml: c3 3c

I would assume that these sequences are the UTF-8 representation of
the respective characters (but I don't have a table to figure that
out).

Sorry if the whole thing was diffcult to follow. I should perhaps have
mentioned that for the upper-case umlauts and the eszett, Emacs not
only complains, but also inputs an unknown character into the
buffer, represented by a '?' in reverse video. That's apparently the
hex c3 byte.

   Emacs always uses the enconding of the file, where as an redirect
   uses the locale, iirc.

I know; normally it can figure it out - I think this ability is not
compromised in any way (I can e.g. open an XML file encoded in utf-8,
and will see 11u in the mode line). Also, please note that under X,
Emacs behaves completely as before.

By redirect, you mean shell redirection?  Does that do any character
conversion?

   I assume you know the options-mule menu in emacs, there is a lot to
   help with encoding issues...

Yes, I know, but I don't see how set-input-method would fix this. Do you?

As to the locale, where can I look that up ... ?
   .bashrc

Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting
... hmm.

But very frankly, would the solution not focus on the kernel, at least
partly? As I said, I can reverse the phenomenon by simply booting the
old kernel!

Does nobody know where the kernel controls what the keys of the
console keyboard send when pressed?

(BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.)

Regards, Florian

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-28 Thread Sebastian Günther

* Florian v. Savigny (lor...@fsavigny.de) [28.02.09 18:39]:
 
 Hi Sebastian,
 
 But Emacs displays the lower-case umlauts followed by a space
 etc. etc. ...
 
what does file say about the offending files?
 
 I was not actually talking about files when I mentioned Emacs, but
 what I see when I *type* into Emacs (such as in this mail
 message). But in case you mean what that produces when I save the
 result of what I typed into a file, I ran a few tests, and the results
 were mixed:
 
 For the 3 lower-case umlauts, file reports UTF-8, consistent with the
 number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex
 representation of the 6 bytes is: c3 a4 c3 b6 c3 3c.
 
 For the three upper-case umlauts and for the eszett, file reports
 iso-8859, also consistent with the number of bytes: 3 characters, 3
 bytes. The code position is, however, definitely wrong: it is always
 hex c3 (which would be the upper-case A tilde in iso-8859-1, and four
 different letters can hardly have the same code position.)
 
 To me this looks as if Emacs puts the first half of the byte sequences
 (always the hex c3) into the buffer, while trying to interpret the
 other half (see list below) as a command: it will say something like
 \204 is undefined. I am quite certain \nnn is an octal number.
 
 eszett: \237 (hex 9f, dec 159)
 A uml: \204 (hex 84, dec 132)
 O uml: \226 (hex 96, dec 150)
 Uuml: \234 (hex 9c, dec 156)
 
 If I am right, the keys thus send:
 
 eszett: c3 9f
 A uml: c3 84
 O uml: c3 96
 U uml: c3 9c
 a uml: c3 a4
 o uml: c3 b6
 u uml: c3 3c
 
 I would assume that these sequences are the UTF-8 representation of
 the respective characters (but I don't have a table to figure that
 out).
 
 Sorry if the whole thing was diffcult to follow. I should perhaps have
 mentioned that for the upper-case umlauts and the eszett, Emacs not
 only complains, but also inputs an unknown character into the
 buffer, represented by a '?' in reverse video. That's apparently the
 hex c3 byte.
 
That is a problem of the consolefont, since the console can't display it 
with cp1250...


Emacs always uses the enconding of the file, where as an redirect
uses the locale, iirc.
 
 I know; normally it can figure it out - I think this ability is not
 compromised in any way (I can e.g. open an XML file encoded in utf-8,
 and will see 11u in the mode line). Also, please note that under X,
 Emacs behaves completely as before.
 
 By redirect, you mean shell redirection?  Does that do any character
 conversion?

yes.

echo äöüÄÖÜß  console.test
then write the same in emacs and save as emacs.test.

And then compare the output of

file console.test
and
file emace.test

If there are differences, somewhere here lies the Problem

 
I assume you know the options-mule menu in emacs, there is a lot to
help with encoding issues...
 
 Yes, I know, but I don't see how set-input-method would fix this. Do you?
 
No but set-coding-system for saving the file might help to achieve the 
right encoding.

 As to the locale, where can I look that up ... ?
.bashrc
 
 Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting
 ... hmm.

locale 
should shown it to you

 
 But very frankly, would the solution not focus on the kernel, at least
 partly? As I said, I can reverse the phenomenon by simply booting the
 old kernel!
 
 Does nobody know where the kernel controls what the keys of the
 console keyboard send when pressed?
 
 (BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.)

Exactly there.

 
 Regards, Florian
 
 
 

Sebastian

-- 
  Religion ist das Opium des Volkes.   Karl Marx

 s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de


pgp22GZSxujGu.pgp
Description: PGP signature

[gentoo-user] Kernel update messed up console encoding

2009-02-27 Thread Florian v. Savigny


Dear listmates,

(I did try to use a more specific mailing list, and tried
gentoo-admin, but it seems there's nobody around.)

I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that
the new kernel causes the encoding of the console to behave weird: 

I used to use the default Unix encoding, i.e. iso-8859-1, because this
was fine for German (now I want to stick to it because I have so much
legacy material in that encoding).  Now, when I type a string with
Non-ASCII characters on the commandline, it looks normal, but when I
redirect this to a file, the file command identifies the contents of
that file (correctly, it seems to me) as UTF-8. When I boot the old
kernel (which I kept), the same procedure results in a file identified
as iso-8859-1 (and with accordingly fewer bytes). Here are the
contents (the same sentence):

Kernel 2.6.17:

Ich kann es außerdem nicht ändern

Kernel 2.6.27:

Ich kann es auÃerdem nicht Ã¤ndern

I grepped the .config files for any options that might have a bearing
on this. The only difference I found was in the first of these four
lines:

linux-2.6.17:

# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y

linux-2.6.27

CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y

So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far
as I understand, these refer to the handling of file names (it's in
the section file systems), and only specify what is supported, so I
don't see how this could have an effect on console encoding.

The only thing I am dead sure about is that the kernel itself must be
the culprit, because when I boot the old kernel, this behaviour goes
away. There is absolutely no change in the system otherwise. (The
$UNICODE variable in /etc/rc.conf is set to no.)

Can anyone give me a hint where to look what I have messed up? Emacs,
which I sometimes like to use on the console, is particularly
uncomfortable with this, and I seem to write confusing e-mails.

Many thanks in advance for any hint,

Florian

Re: [gentoo-user] Kernel update messed up console encoding

2009-02-27 Thread Sebastian Günther

* Florian v. Savigny (lor...@fsavigny.de) [27.02.09 18:30]:
 
 Dear listmates,
 
 (I did try to use a more specific mailing list, and tried
 gentoo-admin, but it seems there's nobody around.)
 
 I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that
 the new kernel causes the encoding of the console to behave weird: 
 
 I used to use the default Unix encoding, i.e. iso-8859-1, because this
 was fine for German (now I want to stick to it because I have so much
 legacy material in that encoding).  Now, when I type a string with
 Non-ASCII characters on the commandline, it looks normal, but when I
 redirect this to a file, the file command identifies the contents of
 that file (correctly, it seems to me) as UTF-8. When I boot the old
 kernel (which I kept), the same procedure results in a file identified
 as iso-8859-1 (and with accordingly fewer bytes). Here are the
 contents (the same sentence):
 
 Kernel 2.6.17:
 
 Ich kann es au��erdem nicht ��ndern
 
 Kernel 2.6.27:
 
 Ich kann es außerdem nicht ändern
 
 I grepped the .config files for any options that might have a bearing
 on this. The only difference I found was in the first of these four
 lines:
 
 linux-2.6.17:
 
 # CONFIG_NLS_ASCII is not set
 CONFIG_NLS_ISO8859_1=y
 CONFIG_NLS_ISO8859_15=y
 CONFIG_NLS_UTF8=y
 
 linux-2.6.27
 
 CONFIG_NLS_ASCII=y
 CONFIG_NLS_ISO8859_1=y
 CONFIG_NLS_ISO8859_15=y
 CONFIG_NLS_UTF8=y
 
 So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far
 as I understand, these refer to the handling of file names (it's in
 the section file systems), and only specify what is supported, so I
 don't see how this could have an effect on console encoding.
 
 The only thing I am dead sure about is that the kernel itself must be
 the culprit, because when I boot the old kernel, this behaviour goes
 away. There is absolutely no change in the system otherwise. (The
 $UNICODE variable in /etc/rc.conf is set to no.)
 
 Can anyone give me a hint where to look what I have messed up? Emacs,
 which I sometimes like to use on the console, is particularly
 uncomfortable with this, and I seem to write confusing e-mails.
 
 Many thanks in advance for any hint,
 
 Florian
 
 

Genrally speaking: switch to utf-8! There are many tools which can 
convert your files automatically.

To your issue:

Well, there still is /etc/conf.d/consolefont which could mess up things. 
Or the locales...

But the different bahavior of the two kernels is strange...
Is CONFIG_NLS_DEFAULT different of the two kernels? Maybe it's also 
related to the kernel build in keymap...

Maybe you should try the gentoo-user-de list, maybe there is someone 
whon ran into the same problem...

HTH
Sebastian

-- 
  Religion ist das Opium des Volkes.   Karl Marx

 s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de


pgpPDhROIIS0D.pgp
Description: PGP signature

Re: [gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

[gentoo-user] Kernel update messed up console encoding

Re: [gentoo-user] Kernel update messed up console encoding

8 matches

Site Navigation

Mail list logo

Footer information