Re: [gentoo-dev] enable UTF8 per default?

2006-03-11 Thread Eldad Zack
On Tuesday 28 February 2006 12:58, Patrick Lauer wrote:
 Hi all,

 at FOSDEM we had a nice discussion about languages, translations etc.
 Having people from the US (wolf31o2) who never have problems and people
 from Japan (usata) who always have problems with encodings /
 charsets / ... was quite interesting.

 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration.

 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?

I've been hit by a bug in egroupware that's related to unicode. 
unicode-enabled mysql reserves string keys multiplied by 3, egroupware 
assumes (wrongly) that it won't cross the 1000 bytes key length boundry...

But that's really not a big deal.


-- 
Eldad Zack [EMAIL PROTECTED]
Key/Fingerprint at pgp.mit.edu, ID 0x96EA0A93


pgp6ISPOVrVrE.pgp
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-03-09 Thread Kevin F. Quinn (Gentoo)
On Tue, 28 Feb 2006 11:58:03 +0100
Patrick Lauer [EMAIL PROTECTED] wrote:

 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration. 
 
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?

Enabling support for utf-8 should be fine, but I'd like to sound a note
of caution about using a utf-8 locale as a system-wide setting.  Since
UTF-8 contains holes in the representation (i.e. some sequences of
8-bit values are invalid), when something is asked to parse such
invalid data unexpected results can ensue.

For an example, see bug #125375 - it turns out that invalid sequences
do not match '.' in sed regular expressions (sed-4.1.4).  The other gnu
tools probably behave similarly.  Up to a point this is in line with the
UTF-8 spec, which says, When a process interprets a code unit sequence
which purports to be in a Unicode character encoding form, it shall
treat ill-formed code unit sequences as an error condition, and shall
not interpret such sequences as characters. (chapter 3 para 2 rule
C12a).  This clearly means that the invalid bytes cannot match . (or
anything else for that matter).  However sed should either generate an
error, filter the illegal bytes out of its input, or replace them with
a marker (replacement character) - instead it leaves the non-conformant
bytes alone.

-- 
Kevin F. Quinn


signature.asc
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-03-04 Thread Alexander Simonov

On Wed, Mar 01, 2006 at 01:24:26AM +0900, Kalin KOZHUHAROV wrote:

Well there are a few problems, but yes I cannot name them now.
Using Japanese, Cyrillic and English in a few encodings each is a big nightmare.



It's true! We in xUSSR use KOI8-R, KOI8-U, CP1251 ( aka Windows-1251),
CP866.


Nowadays I try to move everything to UTF-8, but there are those windoze users
and webdevs that make all Japanese in Shift_JIS ... So support of wide range of
encodings is a must, but UTF-8 is the truth.


The only thing that's nasty: we don't have any good utf8-fonts for the console.

And not only the console.
Even for xterm there are not many good fonts (known to me) that display both 
Japanese
and Cyrillic in regular and bold. Currently there is only on combination that 
works for me.


What about terminus and UniCyr (unicode font from console-tools-cyrillic)?
I am use this fonts and most of russian speaking people says what this
font is the best font for cyrilic charsets.
I am don't see any issues in fonts for me.


So fonts, font config and related stuff is what has to be fixed first.



--
   WBR, Alexander Simonov (DEVL-UANIC)
   Ukrainian Gentoo Community Coordinator
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Diego 'Flameeyes' Pettenò
On Tuesday 28 February 2006 11:58, Patrick Lauer wrote:
 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration.
At the same time, you'll probably hear people bitching about UTF-8 being 
enabled because it's not needed for me, should be the rest of the world to 
change
I'd be the first to be interested in having it enabled by default, tho.

-- 
Diego Flameeyes Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE


pgpmhzBPU7ZAR.pgp
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Diego 'Flameeyes' Pettenò
On Tuesday 28 February 2006 12:47, Patrick Lauer wrote:
 It is still optional, just enabled by default :-)
Would be enough to be criticized probably, mainly by english-speaking users 
that doesn't care of extended characters.
Although, this would follow also the direction of both Apple and Microsoft, 
the first providing, the other saying that will provide, an always-unicoded 
system.
That is probably the way to allow an easier access to Gentoo for non-english 
speaking people, too.

 So - apart from some users maybe not wanting it, any technical reasons
 against?
I'll wait for the clutter comment by users and maybe devs.
I was criticized for enabling unicode forcefully on vlc because of a 
source-code bug that prevented non-unicode wxGTK to be used to build it, 
after that I'm always expecting some sort of problem :P

  I'd be the first to be interested in having it enabled by default, tho.
 Yes, otherwise spelling your name is almost impossible :-)
That, and I'm actually trying to find time to learn Japanese :P
But time is something I don't have abundant :|

-- 
Diego Flameeyes Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE


pgpP7dQ0bttdT.pgp
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Lars Weiler
* Patrick Lauer [EMAIL PROTECTED] [06/02/28 11:58 +0100]:
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?

It is enabled by default.  At least on ppc.  And that since,
uhm, summer 2004?

I can't say if there are any problems, as I didn't received
a bug for a long time.  The only thing that's nasty: we
don't have any good utf8-fonts for the console.

Regards, Lars

-- 
Lars Weiler  [EMAIL PROTECTED]  +49-171-1963258
Gentoo Linux PowerPC: Developer and Release Engineer
Gentoo Infrastructure   : CVS Administrator
Gentoo Foundation   : Trustee
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Patrick Lauer
On Tue, 2006-02-28 at 13:50 +0100, Lars Weiler wrote:
 * Patrick Lauer [EMAIL PROTECTED] [06/02/28 11:58 +0100]:
  Enabling the unicode useflag in the profiles should help our
  international users and should not cause any problems. Are there any
  known bugs / problems this would trigger? Any reasons against that?
 
 It is enabled by default.  At least on ppc.
As far as I can tell that's not the case on x86
   And that since,
 uhm, summer 2004?
Ok, so it should be quite well-tested.

 I can't say if there are any problems, as I didn't received
 a bug for a long time.  The only thing that's nasty: we
 don't have any good utf8-fonts for the console.
I think that's acceptable.
-- 
Stand still, and let the rest of the universe move


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Mike Frysinger
On Tuesday 28 February 2006 06:47, Patrick Lauer wrote:
 On Tue, 2006-02-28 at 12:32 +0100, Diego 'Flameeyes' Pettenò wrote:
  On Tuesday 28 February 2006 11:58, Patrick Lauer wrote:
   During that discussion we realized that having utf-8 not enabled by
   default and no utf8 fonts available by default causes lots of
   recompilation and reconfiguration.
 
  At the same time, you'll probably hear people bitching about UTF-8 being
  enabled because it's not needed for me, should be the rest of the world
  to change

 It is still optional, just enabled by default :-)
 All the people with non-ASCII charsets will have less work, only that we
 switch the load from, say, 75% of the users fixing their environment to
 25% of users having to switch.

hopefully people will fix their packages to respect USE=unicode as to whether 
they link against libncurses or libncursesw
-mike

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Joseph Jezak

I can't say if there are any problems, as I didn't received
a bug for a long time.  The only thing that's nasty: we
don't have any good utf8-fonts for the console.


I think that's acceptable.


The only issue related to that we really have is this bug, which is 
annoying but not fatal:

http://bugs.gentoo.org/show_bug.cgi?id=107235

-Joe


--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Kalin KOZHUHAROV
Lars Weiler wrote:
 * Patrick Lauer [EMAIL PROTECTED] [06/02/28 11:58 +0100]:
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?
 
 It is enabled by default.  At least on ppc.  And that since,
 uhm, summer 2004?
 
 I can't say if there are any problems, as I didn't received
 a bug for a long time.
Well there are a few problems, but yes I cannot name them now.
Using Japanese, Cyrillic and English in a few encodings each is a big nightmare.

Nowadays I try to move everything to UTF-8, but there are those windoze users
and webdevs that make all Japanese in Shift_JIS ... So support of wide range of
encodings is a must, but UTF-8 is the truth.

 The only thing that's nasty: we don't have any good utf8-fonts for the 
 console.
And not only the console.
Even for xterm there are not many good fonts (known to me) that display both 
Japanese
and Cyrillic in regular and bold. Currently there is only on combination that 
works for me.

So fonts, font config and related stuff is what has to be fixed first.

Kalin.

P.S. And before fixed, it has to be filed... Promise to take notes (again) when 
I see something.
-- 
|[ ~~ ]|
+- http://ThinRope.net/ -+
|[ __ ]|

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Josh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ooh, I'm very much in favor of unicode being enabled by default. It's not like
users would be limited *only* to UTF-8 on their new installs, anyway. I'd love
to see this implemented.

++ for the suggestion. :)

Patrick Lauer wrote:
 Hi all,
 
 at FOSDEM we had a nice discussion about languages, translations etc.
 Having people from the US (wolf31o2) who never have problems and people
 from Japan (usata) who always have problems with encodings /
 charsets / ... was quite interesting.
 
 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration. 
 
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?
 
 If there are no objections this should be a small but helpful change.
 
 On a tangent I wonder if pulling in extra fonts as a dependency of X
 makes sense (useflag controlled, enabled by default) - that way the
 unicode capabilities are available without any configuration.
 
 Patrick
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEBH+YrsJQqN81j74RAr1+AJ44WIZB6nSljue+RC//KWAvAFyFUwCdG5cB
khBaPU69f8gAhn1MFN+grLs=
=0DAj
-END PGP SIGNATURE-
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread solar
On Tue, 2006-02-28 at 11:58 +0100, Patrick Lauer wrote:
 Hi all,
 
 at FOSDEM we had a nice discussion about languages, translations etc.
 Having people from the US (wolf31o2) who never have problems and people
 from Japan (usata) who always have problems with encodings /
 charsets / ... was quite interesting.
 
 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration. 
 
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?
 
 If there are no objections this should be a small but helpful change.
 
 On a tangent I wonder if pulling in extra fonts as a dependency of X
 makes sense (useflag controlled, enabled by default) - that way the
 unicode capabilities are available without any configuration.


I forget where I read it but I thought that unicode lead to overflows
and was considered a general security risk. I wish I knew where I read
that but I'm unable to find it.

Any list readers know anything relating to that?

-- 
solar [EMAIL PROTECTED]
Gentoo Linux

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Ciaran McCreesh
On Tue, 28 Feb 2006 12:47:33 -0500 solar [EMAIL PROTECTED] wrote:
| I forget where I read it but I thought that unicode lead to overflows
| and was considered a general security risk. I wish I knew where I read
| that but I'm unable to find it.
| 
| Any list readers know anything relating to that?

Eh, not really. With non-utf-8 you could argue that it's an increased
risk, since you get non-string-terminating nulls, but with utf-8 those
aren't an issue.

It's not really a very well substantiated claim. It's like saying GUI
programming leads to bugs or internationalisation leads to program
crashes. Yes, it's possible (in C, anyway) to screw up your buffer
routines when converting code to handle utf-8, but then it's always
possible to screw up buffer routines.

-- 
Ciaran McCreesh : Gentoo Developer (Wearer of the shiny hat)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



signature.asc
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Bryan Østergaard
On Tue, Feb 28, 2006 at 12:47:33PM -0500, solar wrote:
 I forget where I read it but I thought that unicode lead to overflows
 and was considered a general security risk. I wish I knew where I read
 that but I'm unable to find it.
 
 Any list readers know anything relating to that?
 
It's true that many overflows have been found in unicode aware
applications, like the zillion unicode overflows in Internet Explorer
for example. But that shouldn't lead to considering unicode a general
security risk in my mind even though the apache team uses ascii in the
default configuration to protect against bugs in poorly written
applications.

Regards,
Bryan Østergaard
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Kevin F. Quinn (Gentoo)
On Tue, 28 Feb 2006 12:47:33 -0500
solar [EMAIL PROTECTED] wrote:

 I forget where I read it but I thought that unicode lead to overflows
 and was considered a general security risk. I wish I knew where I read
 that but I'm unable to find it.

Well, stuff I could find includes:

http://www.kde.org/info/security/advisory-20060119-1.txt
buggy UTF-8 decoder in KDE - this is an overflow error, which as
ciaranm says is a risk applicable to anything. It's a bug in KDE, not
in UTF-8 as such.  Perhaps this is what was at the back of your mind.


http://www.izerv.net/idwg-public/archive/0181.html
risks of using UTF-8; in particular the use of separate validators
which won't process things exactly the same way the application does.
Also homograph risks associated with allowing more than one encoding for
a character.

http://www.eeye.com/html/Research/Advisories/AD20010705.html
example of UTF-8(ish) used to fool IDSs by using alternative
non-standard encodings that IDSs aren't aware of.
This actually is another example of issues with secondary validators
described in the link above - they're not guaranteed to parse things
exactly the same way the application does.

http://www.microsoft.com/mspress/books/sampchap/5612b.asp
describes a number of risks of accepting UTF-8, including the above.


So far I haven't found anything that could be considered a general
security risk, but that doesn't prove much :)

-- 
Kevin F. Quinn


signature.asc
Description: PGP signature


Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread solar
On Tue, 2006-02-28 at 20:18 +0100, Kevin F. Quinn (Gentoo) wrote:
 On Tue, 28 Feb 2006 12:47:33 -0500
 solar [EMAIL PROTECTED] wrote:
 
  I forget where I read it but I thought that unicode lead to overflows
  and was considered a general security risk. I wish I knew where I read
  that but I'm unable to find it.
 
 Well, stuff I could find includes:
 
 http://www.kde.org/info/security/advisory-20060119-1.txt
 buggy UTF-8 decoder in KDE - this is an overflow error, which as
 ciaranm says is a risk applicable to anything. It's a bug in KDE, not
 in UTF-8 as such.  Perhaps this is what was at the back of your mind.
 
 
 http://www.izerv.net/idwg-public/archive/0181.html
 risks of using UTF-8; in particular the use of separate validators
 which won't process things exactly the same way the application does.
 Also homograph risks associated with allowing more than one encoding for
 a character.
 
 http://www.eeye.com/html/Research/Advisories/AD20010705.html
 example of UTF-8(ish) used to fool IDSs by using alternative
 non-standard encodings that IDSs aren't aware of.
 This actually is another example of issues with secondary validators
 described in the link above - they're not guaranteed to parse things
 exactly the same way the application does.
 
 http://www.microsoft.com/mspress/books/sampchap/5612b.asp
 describes a number of risks of accepting UTF-8, including the above.
 
 
 So far I haven't found anything that could be considered a general
 security risk, but that doesn't prove much :)

Thanks Kevin. I think whatever I was thinking of had todo with widechar
support. Maybe on phrack, vuln-dev, DD I forget.

But the second link was a pretty good read and perhaps can give us some
sort of reasonable checks that we can use before we opt to allow the use
flag to be enabled in our hardened profiles.

Think we can automate any checks using the UTF-8-test.txt ?

-- 
solar [EMAIL PROTECTED]
Gentoo Linux

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] enable UTF8 per default?

2006-02-28 Thread Bjarke Istrup Pedersen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Patrick Lauer skrev:
 Hi all,
 
 at FOSDEM we had a nice discussion about languages, translations etc.
 Having people from the US (wolf31o2) who never have problems and people
 from Japan (usata) who always have problems with encodings /
 charsets / ... was quite interesting.
 
 During that discussion we realized that having utf-8 not enabled by
 default and no utf8 fonts available by default causes lots of
 recompilation and reconfiguration. 
 
 Enabling the unicode useflag in the profiles should help our
 international users and should not cause any problems. Are there any
 known bugs / problems this would trigger? Any reasons against that?
 
 If there are no objections this should be a small but helpful change.
 
 On a tangent I wonder if pulling in extra fonts as a dependency of X
 makes sense (useflag controlled, enabled by default) - that way the
 unicode capabilities are available without any configuration.
 
 Patrick

I think it would be nice to have it enabled too :-)
You got my vote.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.1 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEBOHkO+Ewtpi9rLERAralAJoD2y5E9U6rVKV5WMKyjg/3u6baOACeKXba
dOAfrKDeV4ci9W9ykNwtKCQ=
=4Qkm
-END PGP SIGNATURE-
-- 
gentoo-dev@gentoo.org mailing list