Re: [gentoo-dev] enable UTF8 per default?
On Tuesday 28 February 2006 12:58, Patrick Lauer wrote: > Hi all, > > at FOSDEM we had a nice discussion about languages, translations etc. > Having people from the US (wolf31o2) who never have problems and people > from Japan (usata) who always have problems with encodings / > charsets / ... was quite interesting. > > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. > > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? I've been hit by a bug in egroupware that's related to unicode. unicode-enabled mysql reserves string keys multiplied by 3, egroupware assumes (wrongly) that it won't cross the 1000 bytes key length boundry... But that's really not a big deal. -- Eldad Zack <[EMAIL PROTECTED]> Key/Fingerprint at pgp.mit.edu, ID 0x96EA0A93 pgp6ISPOVrVrE.pgp Description: PGP signature
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 28 Feb 2006 11:58:03 +0100 Patrick Lauer <[EMAIL PROTECTED]> wrote: > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. > > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? Enabling support for utf-8 should be fine, but I'd like to sound a note of caution about using a utf-8 locale as a system-wide setting. Since UTF-8 contains "holes" in the representation (i.e. some sequences of 8-bit values are invalid), when something is asked to parse such invalid data unexpected results can ensue. For an example, see bug #125375 - it turns out that invalid sequences do not match '.' in sed regular expressions (sed-4.1.4). The other gnu tools probably behave similarly. Up to a point this is in line with the UTF-8 spec, which says, "When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition, and shall not interpret such sequences as characters." (chapter 3 para 2 rule C12a). This clearly means that the invalid bytes cannot match "." (or anything else for that matter). However sed should either generate an error, filter the illegal bytes out of its input, or replace them with a marker (replacement character) - instead it leaves the non-conformant bytes alone. -- Kevin F. Quinn signature.asc Description: PGP signature
Re: [gentoo-dev] enable UTF8 per default?
Alexander Simonov wrote: > On Wed, Mar 01, 2006 at 01:24:26AM +0900, Kalin KOZHUHAROV wrote: >> Well there are a few problems, but yes I cannot name them now. >> Using Japanese, Cyrillic and English in a few encodings each is a big >> nightmare. >> > > It's true! We in xUSSR use KOI8-R, KOI8-U, CP1251 ( aka Windows-1251), > CP866. > >> Nowadays I try to move everything to UTF-8, but there are those >> windoze users >> and webdevs that make all Japanese in Shift_JIS ... So support of wide >> range of >> encodings is a must, but UTF-8 is the truth. >> >>> The only thing that's nasty: we don't have any good utf8-fonts for >>> the console. >> And not only the console. >> Even for xterm there are not many good fonts (known to me) that >> display both Japanese >> and Cyrillic in regular and bold. Currently there is only on >> combination that works for me. >> > What about terminus and UniCyr (unicode font from console-tools-cyrillic)? > I am use this fonts and most of russian speaking people says what this > font is the best font for cyrilic charsets. > I am don't see any issues in fonts for me. Yes, then the problem is with Japanese... >> So fonts, font config and related stuff is what has to be fixed first. Kalin. -- |[ ~~ ]| +-> http://ThinRope.net/ <-+ |[ __ ]| -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Wed, Mar 01, 2006 at 01:24:26AM +0900, Kalin KOZHUHAROV wrote: Well there are a few problems, but yes I cannot name them now. Using Japanese, Cyrillic and English in a few encodings each is a big nightmare. It's true! We in xUSSR use KOI8-R, KOI8-U, CP1251 ( aka Windows-1251), CP866. Nowadays I try to move everything to UTF-8, but there are those windoze users and webdevs that make all Japanese in Shift_JIS ... So support of wide range of encodings is a must, but UTF-8 is the truth. The only thing that's nasty: we don't have any good utf8-fonts for the console. And not only the console. Even for xterm there are not many good fonts (known to me) that display both Japanese and Cyrillic in regular and bold. Currently there is only on combination that works for me. What about terminus and UniCyr (unicode font from console-tools-cyrillic)? I am use this fonts and most of russian speaking people says what this font is the best font for cyrilic charsets. I am don't see any issues in fonts for me. So fonts, font config and related stuff is what has to be fixed first. -- WBR, Alexander Simonov (DEVL-UANIC) Ukrainian Gentoo Community Coordinator -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Patrick Lauer skrev: > Hi all, > > at FOSDEM we had a nice discussion about languages, translations etc. > Having people from the US (wolf31o2) who never have problems and people > from Japan (usata) who always have problems with encodings / > charsets / ... was quite interesting. > > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. > > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? > > If there are no objections this should be a small but helpful change. > > On a tangent I wonder if pulling in extra fonts as a dependency of X > makes sense (useflag controlled, enabled by default) - that way the > unicode capabilities are available without any configuration. > > Patrick I think it would be nice to have it enabled too :-) You got my vote. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEBOHkO+Ewtpi9rLERAralAJoD2y5E9U6rVKV5WMKyjg/3u6baOACeKXba dOAfrKDeV4ci9W9ykNwtKCQ= =4Qkm -END PGP SIGNATURE- -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 2006-02-28 at 20:18 +0100, Kevin F. Quinn (Gentoo) wrote: > On Tue, 28 Feb 2006 12:47:33 -0500 > solar <[EMAIL PROTECTED]> wrote: > > > I forget where I read it but I thought that unicode lead to overflows > > and was considered a general security risk. I wish I knew where I read > > that but I'm unable to find it. > > Well, stuff I could find includes: > > http://www.kde.org/info/security/advisory-20060119-1.txt > buggy UTF-8 decoder in KDE - this is an overflow error, which as > ciaranm says is a risk applicable to anything. It's a bug in KDE, not > in UTF-8 as such. Perhaps this is what was at the back of your mind. > > > http://www.izerv.net/idwg-public/archive/0181.html > risks of using UTF-8; in particular the use of separate validators > which won't process things exactly the same way the application does. > Also homograph risks associated with allowing more than one encoding for > a character. > > http://www.eeye.com/html/Research/Advisories/AD20010705.html > example of UTF-8(ish) used to fool IDSs by using alternative > non-standard encodings that IDSs aren't aware of. > This actually is another example of issues with secondary validators > described in the link above - they're not guaranteed to parse things > exactly the same way the application does. > > http://www.microsoft.com/mspress/books/sampchap/5612b.asp > describes a number of risks of accepting UTF-8, including the above. > > > So far I haven't found anything that could be considered a general > security risk, but that doesn't prove much :) Thanks Kevin. I think whatever I was thinking of had todo with widechar support. Maybe on phrack, vuln-dev, DD I forget. But the second link was a pretty good read and perhaps can give us some sort of reasonable checks that we can use before we opt to allow the use flag to be enabled in our hardened profiles. Think we can automate any checks using the UTF-8-test.txt ? -- solar <[EMAIL PROTECTED]> Gentoo Linux -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 28 Feb 2006 12:47:33 -0500 solar <[EMAIL PROTECTED]> wrote: > I forget where I read it but I thought that unicode lead to overflows > and was considered a general security risk. I wish I knew where I read > that but I'm unable to find it. Well, stuff I could find includes: http://www.kde.org/info/security/advisory-20060119-1.txt buggy UTF-8 decoder in KDE - this is an overflow error, which as ciaranm says is a risk applicable to anything. It's a bug in KDE, not in UTF-8 as such. Perhaps this is what was at the back of your mind. http://www.izerv.net/idwg-public/archive/0181.html risks of using UTF-8; in particular the use of separate validators which won't process things exactly the same way the application does. Also homograph risks associated with allowing more than one encoding for a character. http://www.eeye.com/html/Research/Advisories/AD20010705.html example of UTF-8(ish) used to fool IDSs by using alternative non-standard encodings that IDSs aren't aware of. This actually is another example of issues with secondary validators described in the link above - they're not guaranteed to parse things exactly the same way the application does. http://www.microsoft.com/mspress/books/sampchap/5612b.asp describes a number of risks of accepting UTF-8, including the above. So far I haven't found anything that could be considered a general security risk, but that doesn't prove much :) -- Kevin F. Quinn signature.asc Description: PGP signature
Re: [gentoo-dev] enable UTF8 per default?
On Tue, Feb 28, 2006 at 12:47:33PM -0500, solar wrote: > I forget where I read it but I thought that unicode lead to overflows > and was considered a general security risk. I wish I knew where I read > that but I'm unable to find it. > > Any list readers know anything relating to that? > It's true that many overflows have been found in unicode aware applications, like the zillion unicode overflows in Internet Explorer for example. But that shouldn't lead to considering unicode a general security risk in my mind even though the apache team uses ascii in the default configuration to protect against bugs in poorly written applications. Regards, Bryan Østergaard -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 28 Feb 2006 12:47:33 -0500 solar <[EMAIL PROTECTED]> wrote: | I forget where I read it but I thought that unicode lead to overflows | and was considered a general security risk. I wish I knew where I read | that but I'm unable to find it. | | Any list readers know anything relating to that? Eh, not really. With non-utf-8 you could argue that it's an increased risk, since you get non-string-terminating nulls, but with utf-8 those aren't an issue. It's not really a very well substantiated claim. It's like saying "GUI programming leads to bugs" or "internationalisation leads to program crashes". Yes, it's possible (in C, anyway) to screw up your buffer routines when converting code to handle utf-8, but then it's always possible to screw up buffer routines. -- Ciaran McCreesh : Gentoo Developer (Wearer of the shiny hat) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm signature.asc Description: PGP signature
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 2006-02-28 at 11:58 +0100, Patrick Lauer wrote: > Hi all, > > at FOSDEM we had a nice discussion about languages, translations etc. > Having people from the US (wolf31o2) who never have problems and people > from Japan (usata) who always have problems with encodings / > charsets / ... was quite interesting. > > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. > > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? > > If there are no objections this should be a small but helpful change. > > On a tangent I wonder if pulling in extra fonts as a dependency of X > makes sense (useflag controlled, enabled by default) - that way the > unicode capabilities are available without any configuration. I forget where I read it but I thought that unicode lead to overflows and was considered a general security risk. I wish I knew where I read that but I'm unable to find it. Any list readers know anything relating to that? -- solar <[EMAIL PROTECTED]> Gentoo Linux -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ooh, I'm very much in favor of unicode being enabled by default. It's not like users would be limited *only* to UTF-8 on their new installs, anyway. I'd love to see this implemented. ++ for the suggestion. :) Patrick Lauer wrote: > Hi all, > > at FOSDEM we had a nice discussion about languages, translations etc. > Having people from the US (wolf31o2) who never have problems and people > from Japan (usata) who always have problems with encodings / > charsets / ... was quite interesting. > > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. > > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? > > If there are no objections this should be a small but helpful change. > > On a tangent I wonder if pulling in extra fonts as a dependency of X > makes sense (useflag controlled, enabled by default) - that way the > unicode capabilities are available without any configuration. > > Patrick -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEBH+YrsJQqN81j74RAr1+AJ44WIZB6nSljue+RC//KWAvAFyFUwCdG5cB khBaPU69f8gAhn1MFN+grLs= =0DAj -END PGP SIGNATURE- -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
Lars Weiler wrote: > * Patrick Lauer <[EMAIL PROTECTED]> [06/02/28 11:58 +0100]: >> Enabling the unicode useflag in the profiles should help our >> international users and should not cause any problems. Are there any >> known bugs / problems this would trigger? Any reasons against that? > > It is enabled by default. At least on ppc. And that since, > uhm, summer 2004? > > I can't say if there are any problems, as I didn't received > a bug for a long time. Well there are a few problems, but yes I cannot name them now. Using Japanese, Cyrillic and English in a few encodings each is a big nightmare. Nowadays I try to move everything to UTF-8, but there are those windoze users and webdevs that make all Japanese in Shift_JIS ... So support of wide range of encodings is a must, but UTF-8 is the truth. > The only thing that's nasty: we don't have any good utf8-fonts for the > console. And not only the console. Even for xterm there are not many good fonts (known to me) that display both Japanese and Cyrillic in regular and bold. Currently there is only on combination that works for me. So fonts, font config and related stuff is what has to be fixed first. Kalin. P.S. And before fixed, it has to be filed... Promise to take notes (again) when I see something. -- |[ ~~ ]| +-> http://ThinRope.net/ <-+ |[ __ ]| -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
I can't say if there are any problems, as I didn't received a bug for a long time. The only thing that's nasty: we don't have any good utf8-fonts for the console. I think that's acceptable. The only issue related to that we really have is this bug, which is annoying but not fatal: http://bugs.gentoo.org/show_bug.cgi?id=107235 -Joe -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tuesday 28 February 2006 06:47, Patrick Lauer wrote: > On Tue, 2006-02-28 at 12:32 +0100, Diego 'Flameeyes' Pettenò wrote: > > On Tuesday 28 February 2006 11:58, Patrick Lauer wrote: > > > During that discussion we realized that having utf-8 not enabled by > > > default and no utf8 fonts available by default causes lots of > > > recompilation and reconfiguration. > > > > At the same time, you'll probably hear people bitching about UTF-8 being > > enabled because "it's not needed for me, should be the rest of the world > > to change" > > It is still optional, just enabled by default :-) > All the people with non-ASCII charsets will have less work, only that we > switch the load from, say, 75% of the users fixing their environment to > 25% of users having to switch. hopefully people will fix their packages to respect USE=unicode as to whether they link against libncurses or libncursesw -mike -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 2006-02-28 at 13:50 +0100, Lars Weiler wrote: > * Patrick Lauer <[EMAIL PROTECTED]> [06/02/28 11:58 +0100]: > > Enabling the unicode useflag in the profiles should help our > > international users and should not cause any problems. Are there any > > known bugs / problems this would trigger? Any reasons against that? > > It is enabled by default. At least on ppc. As far as I can tell that's not the case on x86 > And that since, > uhm, summer 2004? Ok, so it should be quite well-tested. > I can't say if there are any problems, as I didn't received > a bug for a long time. The only thing that's nasty: we > don't have any good utf8-fonts for the console. I think that's acceptable. -- Stand still, and let the rest of the universe move signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] enable UTF8 per default?
* Patrick Lauer <[EMAIL PROTECTED]> [06/02/28 11:58 +0100]: > Enabling the unicode useflag in the profiles should help our > international users and should not cause any problems. Are there any > known bugs / problems this would trigger? Any reasons against that? It is enabled by default. At least on ppc. And that since, uhm, summer 2004? I can't say if there are any problems, as I didn't received a bug for a long time. The only thing that's nasty: we don't have any good utf8-fonts for the console. Regards, Lars -- Lars Weiler <[EMAIL PROTECTED]> +49-171-1963258 Gentoo Linux PowerPC: Developer and Release Engineer Gentoo Infrastructure : CVS Administrator Gentoo Foundation : Trustee -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] enable UTF8 per default?
On Tuesday 28 February 2006 12:47, Patrick Lauer wrote: > It is still optional, just enabled by default :-) Would be enough to be criticized probably, mainly by english-speaking users that doesn't care of extended characters. Although, this would follow also the direction of both Apple and Microsoft, the first providing, the other saying that will provide, an always-unicoded system. That is probably the way to allow an easier access to Gentoo for non-english speaking people, too. > So - apart from some users maybe not wanting it, any technical reasons > against? I'll wait for the "clutter" comment by users and maybe devs. I was criticized for enabling unicode forcefully on vlc because of a source-code bug that prevented non-unicode wxGTK to be used to build it, after that I'm always expecting some sort of problem :P > > I'd be the first to be interested in having it enabled by default, tho. > Yes, otherwise spelling your name is almost impossible :-) That, and I'm actually trying to find time to learn Japanese :P But time is something I don't have abundant :| -- Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/ Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE pgpP7dQ0bttdT.pgp Description: PGP signature
Re: [gentoo-dev] enable UTF8 per default?
On Tue, 2006-02-28 at 12:32 +0100, Diego 'Flameeyes' Pettenò wrote: > On Tuesday 28 February 2006 11:58, Patrick Lauer wrote: > > During that discussion we realized that having utf-8 not enabled by > > default and no utf8 fonts available by default causes lots of > > recompilation and reconfiguration. > At the same time, you'll probably hear people bitching about UTF-8 being > enabled because "it's not needed for me, should be the rest of the world to > change" It is still optional, just enabled by default :-) All the people with non-ASCII charsets will have less work, only that we switch the load from, say, 75% of the users fixing their environment to 25% of users having to switch. And who doesn't want UTF-8? Just being able to see a Japanese Website as it was intended (even if I can't read it) is a nice feature. So - apart from some users maybe not wanting it, any technical reasons against? > I'd be the first to be interested in having it enabled by default, tho. Yes, otherwise spelling your name is almost impossible :-) Patrick -- Stand still, and let the rest of the universe move signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] enable UTF8 per default?
On Tuesday 28 February 2006 11:58, Patrick Lauer wrote: > During that discussion we realized that having utf-8 not enabled by > default and no utf8 fonts available by default causes lots of > recompilation and reconfiguration. At the same time, you'll probably hear people bitching about UTF-8 being enabled because "it's not needed for me, should be the rest of the world to change" I'd be the first to be interested in having it enabled by default, tho. -- Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/ Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE pgpmhzBPU7ZAR.pgp Description: PGP signature
[gentoo-dev] enable UTF8 per default?
Hi all, at FOSDEM we had a nice discussion about languages, translations etc. Having people from the US (wolf31o2) who never have problems and people from Japan (usata) who always have problems with encodings / charsets / ... was quite interesting. During that discussion we realized that having utf-8 not enabled by default and no utf8 fonts available by default causes lots of recompilation and reconfiguration. Enabling the unicode useflag in the profiles should help our international users and should not cause any problems. Are there any known bugs / problems this would trigger? Any reasons against that? If there are no objections this should be a small but helpful change. On a tangent I wonder if pulling in extra fonts as a dependency of X makes sense (useflag controlled, enabled by default) - that way the unicode capabilities are available without any configuration. Patrick -- Stand still, and let the rest of the universe move signature.asc Description: This is a digitally signed message part