Re: [gentoo-user] Kernel update messed up console encoding
Hi Sebastian, That is a problem of the consolefont, since the console can't display it with cp1250... Maybe - if this font has codepage 1250, as one would assume, it should normally display a capital A with a short accent (I think that's a slavonic letter) in position hex c3. True, that is different from the capital A tilde it should have in iso-8859-1. But this is hardly the heart of the matter- the c3 shouldn't be there in the first place. echo äöüÄÖÜß console.test then write the same in emacs and save as emacs.test. And then compare the output of file console.test and file emace.test If there are differences, somewhere here lies the Problem But I have already described the result of the first procedure in my first posting (UTF-8 when echoed under the new kernel, iso-8859-1 when echoed under the old kernel) and the result of the second one - IN DETAIL - in my last posting (too long to repeat; see there), which I assume you have read. Have I missed something? locale should shown it to you Thanks. $LANG and $LC_ALL are not set (i.e. locale simply shows LANG= and LC_ALL= with no values). All other LC_... variables are set to POSIX. Does nobody know where the kernel controls what the keys of the console keyboard send when pressed? (BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.) Exactly there. Could you explain that, please (do you perhaps mean this is where the kernel's behaviour IS CONTROLLED)? As I have repeatedly said, all variable settings are of course the same under both kernels, so both definitely behave differently with the same settings. Regards, Florian PS: Just one thing: do you think you could cite only those portions of postings that you are replying to? Having to wade through tons of cited material to find any replies is quite hard on the eyes, especially when understanding one another seems to be difficult.
Re: [gentoo-user] Kernel update messed up console encoding
Dear Sebastian, thank you for your thoughts. I am afraid switching to UTF-8 for everything, although I see that this is the sound thing to do eventually, is not currently an option for me - there are far too many things which depend on that. (Also, it would tend to obscure or complicate the problem rather than fix it, since Emacs obviously gets confused by the console behaviour). there still is /etc/conf.d/consolefont that could mess up things The only variable that's set there is CONSOLEFONT=cp1250. I would not understand how the font could have an influence on the characters *produced* by the console, and it seems also difficult to explain why the shell and Emacs, which of course use the same console font, behave differently. (Under the shell, it looks fine while you type it, i.e. you cannot tell that your u umlaut actually consists of two bytes. But Emacs displays the lower-case umlauts followed by a space (i.e. two characters, but not those that most of us are probably quite familiar with, i.e. which you see when UTF-8 is displayed as if it were ASCII), while for upper-case umlauts and the eszett complains that e.g. \204 is undefined.) It definitely looks to me as if the core of the problem is what the console produces, not what it shows, i.e. what a keypress produces. The variable CONSOLETRANSLATION is commented out, meaning I am using the default one, whichever that is. As to the locale, where can I look that up ... ? I seem to remember I purposely use no locale (or C, I think), but I don't remember where I set that. CONFIG_NLS_DEFAULT is indeed different for the two kernels, but not in a way that seems to explain anything, as those two encodings differ only on a few positions (not umlauts or eszett): linux-2.6.17-gentoo-r7: iso8859-15 linux-2.6.27-gentoo-r8: iso8859-1 Also, I think what I said last time holds: that only applies to filenames in the filesystem, doesn't it? I'll follow your suggestion and re-post the problem on gentoo-user-de, although I think running into that sort of problem might happen to anybody who uses a European language other than English (one of those covered by iso-8859-1, more precisely), so comments here are still welcome! But who still sometimes uses the console, except me? I think I'll also write a small script that compares the settings in the two kernel .configs systematically. Could also be of use for later kernel updates ... Thanks very much! Florian
Re: [gentoo-user] Kernel update messed up console encoding
On 28.02.2009 12:34, Florian v. Savigny wrote: [...] I'll follow your suggestion and re-post the problem on gentoo-user-de, although I think running into that sort of problem might happen to anybody who uses a European language other than English (one of those covered by iso-8859-1, more precisely), so comments here are still welcome! But who still sometimes uses the console, except me? A lot of people use the console. I certainly do. But I, and I would assume majority of console users, switched to UTF-8 quiet some time ago as was suggested earlier in the thread. Hence, the lack of useful advice. Good luck. -- Eray
Re: [gentoo-user] Kernel update messed up console encoding
* Florian v. Savigny (lor...@fsavigny.de) [28.02.09 11:35]: Dear Sebastian, there still is /etc/conf.d/consolefont that could mess up things The only variable that's set there is CONSOLEFONT=cp1250. I would not understand how the font could have an influence on the characters *produced* by the console, and it seems also difficult to explain why the shell and Emacs, which of course use the same console font, behave differently. (Under the shell, it looks fine while you type it, i.e. you cannot tell that your u umlaut actually consists of two bytes. But Emacs displays the lower-case umlauts followed by a space (i.e. two characters, but not those that most of us are probably quite familiar with, i.e. which you see when UTF-8 is displayed as if it were ASCII), while for upper-case umlauts and the eszett complains that e.g. \204 is undefined.) what does file say about the offending files? Emacs always uses the enconding of the file, where as an redirect uses the locale, iirc. I assume you know the options-mule menu in emacs, there is a lot to help with encoding issues... As to the locale, where can I look that up ... ? I seem to remember I purposely use no locale (or C, I think), but I don't remember where I set that. .bashrc Thanks very much! Florian Sebastian -- Religion ist das Opium des Volkes. Karl Marx s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de pgp9w7lDT9keJ.pgp Description: PGP signature
Re: [gentoo-user] Kernel update messed up console encoding
Hi Sebastian, But Emacs displays the lower-case umlauts followed by a space etc. etc. ... what does file say about the offending files? I was not actually talking about files when I mentioned Emacs, but what I see when I *type* into Emacs (such as in this mail message). But in case you mean what that produces when I save the result of what I typed into a file, I ran a few tests, and the results were mixed: For the 3 lower-case umlauts, file reports UTF-8, consistent with the number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex representation of the 6 bytes is: c3 a4 c3 b6 c3 3c. For the three upper-case umlauts and for the eszett, file reports iso-8859, also consistent with the number of bytes: 3 characters, 3 bytes. The code position is, however, definitely wrong: it is always hex c3 (which would be the upper-case A tilde in iso-8859-1, and four different letters can hardly have the same code position.) To me this looks as if Emacs puts the first half of the byte sequences (always the hex c3) into the buffer, while trying to interpret the other half (see list below) as a command: it will say something like \204 is undefined. I am quite certain \nnn is an octal number. eszett: \237 (hex 9f, dec 159) A uml: \204 (hex 84, dec 132) O uml: \226 (hex 96, dec 150) Uuml: \234 (hex 9c, dec 156) If I am right, the keys thus send: eszett: c3 9f A uml: c3 84 O uml: c3 96 U uml: c3 9c a uml: c3 a4 o uml: c3 b6 u uml: c3 3c I would assume that these sequences are the UTF-8 representation of the respective characters (but I don't have a table to figure that out). Sorry if the whole thing was diffcult to follow. I should perhaps have mentioned that for the upper-case umlauts and the eszett, Emacs not only complains, but also inputs an unknown character into the buffer, represented by a '?' in reverse video. That's apparently the hex c3 byte. Emacs always uses the enconding of the file, where as an redirect uses the locale, iirc. I know; normally it can figure it out - I think this ability is not compromised in any way (I can e.g. open an XML file encoded in utf-8, and will see 11u in the mode line). Also, please note that under X, Emacs behaves completely as before. By redirect, you mean shell redirection? Does that do any character conversion? I assume you know the options-mule menu in emacs, there is a lot to help with encoding issues... Yes, I know, but I don't see how set-input-method would fix this. Do you? As to the locale, where can I look that up ... ? .bashrc Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting ... hmm. But very frankly, would the solution not focus on the kernel, at least partly? As I said, I can reverse the phenomenon by simply booting the old kernel! Does nobody know where the kernel controls what the keys of the console keyboard send when pressed? (BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.) Regards, Florian
Re: [gentoo-user] Kernel update messed up console encoding
* Florian v. Savigny (lor...@fsavigny.de) [28.02.09 18:39]: Hi Sebastian, But Emacs displays the lower-case umlauts followed by a space etc. etc. ... what does file say about the offending files? I was not actually talking about files when I mentioned Emacs, but what I see when I *type* into Emacs (such as in this mail message). But in case you mean what that produces when I save the result of what I typed into a file, I ran a few tests, and the results were mixed: For the 3 lower-case umlauts, file reports UTF-8, consistent with the number of bytes (i.e. the file length): 3 characters, 6 bytes. The hex representation of the 6 bytes is: c3 a4 c3 b6 c3 3c. For the three upper-case umlauts and for the eszett, file reports iso-8859, also consistent with the number of bytes: 3 characters, 3 bytes. The code position is, however, definitely wrong: it is always hex c3 (which would be the upper-case A tilde in iso-8859-1, and four different letters can hardly have the same code position.) To me this looks as if Emacs puts the first half of the byte sequences (always the hex c3) into the buffer, while trying to interpret the other half (see list below) as a command: it will say something like \204 is undefined. I am quite certain \nnn is an octal number. eszett: \237 (hex 9f, dec 159) A uml: \204 (hex 84, dec 132) O uml: \226 (hex 96, dec 150) Uuml: \234 (hex 9c, dec 156) If I am right, the keys thus send: eszett: c3 9f A uml: c3 84 O uml: c3 96 U uml: c3 9c a uml: c3 a4 o uml: c3 b6 u uml: c3 3c I would assume that these sequences are the UTF-8 representation of the respective characters (but I don't have a table to figure that out). Sorry if the whole thing was diffcult to follow. I should perhaps have mentioned that for the upper-case umlauts and the eszett, Emacs not only complains, but also inputs an unknown character into the buffer, represented by a '?' in reverse video. That's apparently the hex c3 byte. That is a problem of the consolefont, since the console can't display it with cp1250... Emacs always uses the enconding of the file, where as an redirect uses the locale, iirc. I know; normally it can figure it out - I think this ability is not compromised in any way (I can e.g. open an XML file encoded in utf-8, and will see 11u in the mode line). Also, please note that under X, Emacs behaves completely as before. By redirect, you mean shell redirection? Does that do any character conversion? yes. echo äöüÄÖÜß console.test then write the same in emacs and save as emacs.test. And then compare the output of file console.test and file emace.test If there are differences, somewhere here lies the Problem I assume you know the options-mule menu in emacs, there is a lot to help with encoding issues... Yes, I know, but I don't see how set-input-method would fix this. Do you? No but set-coding-system for saving the file might help to achieve the right encoding. As to the locale, where can I look that up ... ? .bashrc Neither ~/.bashrc nor /etc/bash/bashrc contain any locale setting ... hmm. locale should shown it to you But very frankly, would the solution not focus on the kernel, at least partly? As I said, I can reverse the phenomenon by simply booting the old kernel! Does nobody know where the kernel controls what the keys of the console keyboard send when pressed? (BTW, KEYMAP=de-latin1-nodeadkeys, in /etc/conf.d/keymaps.) Exactly there. Regards, Florian Sebastian -- Religion ist das Opium des Volkes. Karl Marx s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de pgp22GZSxujGu.pgp Description: PGP signature
[gentoo-user] Kernel update messed up console encoding
Dear listmates, (I did try to use a more specific mailing list, and tried gentoo-admin, but it seems there's nobody around.) I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that the new kernel causes the encoding of the console to behave weird: I used to use the default Unix encoding, i.e. iso-8859-1, because this was fine for German (now I want to stick to it because I have so much legacy material in that encoding). Now, when I type a string with Non-ASCII characters on the commandline, it looks normal, but when I redirect this to a file, the file command identifies the contents of that file (correctly, it seems to me) as UTF-8. When I boot the old kernel (which I kept), the same procedure results in a file identified as iso-8859-1 (and with accordingly fewer bytes). Here are the contents (the same sentence): Kernel 2.6.17: Ich kann es außerdem nicht ändern Kernel 2.6.27: Ich kann es auÃerdem nicht ändern I grepped the .config files for any options that might have a bearing on this. The only difference I found was in the first of these four lines: linux-2.6.17: # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y linux-2.6.27 CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far as I understand, these refer to the handling of file names (it's in the section file systems), and only specify what is supported, so I don't see how this could have an effect on console encoding. The only thing I am dead sure about is that the kernel itself must be the culprit, because when I boot the old kernel, this behaviour goes away. There is absolutely no change in the system otherwise. (The $UNICODE variable in /etc/rc.conf is set to no.) Can anyone give me a hint where to look what I have messed up? Emacs, which I sometimes like to use on the console, is particularly uncomfortable with this, and I seem to write confusing e-mails. Many thanks in advance for any hint, Florian
Re: [gentoo-user] Kernel update messed up console encoding
* Florian v. Savigny (lor...@fsavigny.de) [27.02.09 18:30]: Dear listmates, (I did try to use a more specific mailing list, and tried gentoo-admin, but it seems there's nobody around.) I recently updated my kernel from 2.6.17 to 2.6.27, and it seems that the new kernel causes the encoding of the console to behave weird: I used to use the default Unix encoding, i.e. iso-8859-1, because this was fine for German (now I want to stick to it because I have so much legacy material in that encoding). Now, when I type a string with Non-ASCII characters on the commandline, it looks normal, but when I redirect this to a file, the file command identifies the contents of that file (correctly, it seems to me) as UTF-8. When I boot the old kernel (which I kept), the same procedure results in a file identified as iso-8859-1 (and with accordingly fewer bytes). Here are the contents (the same sentence): Kernel 2.6.17: Ich kann es au��erdem nicht ��ndern Kernel 2.6.27: Ich kann es außerdem nicht ändern I grepped the .config files for any options that might have a bearing on this. The only difference I found was in the first of these four lines: linux-2.6.17: # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y linux-2.6.27 CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y So I set $CONFIG_NLS_ASCII differently for the new kernel. But as far as I understand, these refer to the handling of file names (it's in the section file systems), and only specify what is supported, so I don't see how this could have an effect on console encoding. The only thing I am dead sure about is that the kernel itself must be the culprit, because when I boot the old kernel, this behaviour goes away. There is absolutely no change in the system otherwise. (The $UNICODE variable in /etc/rc.conf is set to no.) Can anyone give me a hint where to look what I have messed up? Emacs, which I sometimes like to use on the console, is particularly uncomfortable with this, and I seem to write confusing e-mails. Many thanks in advance for any hint, Florian Genrally speaking: switch to utf-8! There are many tools which can convert your files automatically. To your issue: Well, there still is /etc/conf.d/consolefont which could mess up things. Or the locales... But the different bahavior of the two kernels is strange... Is CONFIG_NLS_DEFAULT different of the two kernels? Maybe it's also related to the kernel build in keymap... Maybe you should try the gentoo-user-de list, maybe there is someone whon ran into the same problem... HTH Sebastian -- Religion ist das Opium des Volkes. Karl Marx s...@sti@N GÜNTHER mailto:sam...@guenther-roetgen.de pgpPDhROIIS0D.pgp Description: PGP signature