Re: login.conf -- UTF-8
Few explanations to clarify maybe non-obvious moments: On 05.04.2014 7:35, Andrey Chernov wrote: big hack and slowing sorting down up to 10 times. Because our search for chains is linear because common single byte table have no more than 2-3 chains. I don't think it worth efforts to optimize search here, because better way to spend them is to implement UCA: http://unicode.org/reports/tr10/ No code situation doesn't mean wrong code can be committed. Since we plan to change defaults from KOI8-R to UTF-8 (russian login class), breaking sort order for non-alphabetic chars will violate POLA. Sort order will be broken because only CP1251 is used to construct Alex chains collation without merging with KOI8-R table. Merging KOI8-R collation is absolute minimum, but proper hack will be merging CP866 and ISO8859-5 too, as I already mention. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: login.conf -- UTF-8
On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: A On 02.04.2014 21:15, Gleb Smirnoff wrote: A S + :lang=en_US.UTF-8:\ A S + :charset=UTF-8: A A And I'd like to do same change for the 'russian' login class A in /etc/login.conf. A A Please everybody remember that we don't have UTF-8 collation A implemented, just fallback to bytecode comparison. Any objections on checking in FreeBSD-compatible[1] UTF-8 collation implementation from Alex Tutubalin? http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html [1] ABC..ZАБВГД...Я...abcd...zабвгд...я -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Fri, 2014-04-04 at 16:46 +0400, Gleb Smirnoff wrote: On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: A On 02.04.2014 21:15, Gleb Smirnoff wrote: A S + :lang=en_US.UTF-8:\ A S + :charset=UTF-8: A A And I'd like to do same change for the 'russian' login class A in /etc/login.conf. A A Please everybody remember that we don't have UTF-8 collation A implemented, just fallback to bytecode comparison. Any objections on checking in FreeBSD-compatible[1] UTF-8 collation implementation from Alex Tutubalin? http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html [1] ABC..ZАБВГД...Я...abcd...zабвгд...я I say go for it. so, no objection here. sean signature.asc Description: This is a digitally signed message part
Re: login.conf -- UTF-8
On 04.04.2014 16:46, Gleb Smirnoff wrote: On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: A On 02.04.2014 21:15, Gleb Smirnoff wrote: A S + :lang=en_US.UTF-8:\ A S + :charset=UTF-8: A A And I'd like to do same change for the 'russian' login class A in /etc/login.conf. A A Please everybody remember that we don't have UTF-8 collation A implemented, just fallback to bytecode comparison. Any objections on checking in FreeBSD-compatible[1] UTF-8 collation implementation from Alex Tutubalin? http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html Even his version 2 have my objections. I already reply Alex about this in 2008. In short: 1) It is error there: almost all single chars above ASCII should be chains, i.t. two bytes minimum, since there almost no intersections with ISO8859-1 as UTF-8 subset. 2) The table itself is very incomplete, f.e. not covering either whole KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting regression. Russian UTF-8 collation should be able to sort all major Russian charsets mentioned, i.e. we need combined table. 3) charmap map.ISO8859-1 declaration is missing (needed mainly for using pure ASCII chars mnemonic names). Even in case above mentioned errors will be removed and the code will be committed afterwards, we should understand that this way (implementing multibyte collation via single byte one) even while being possible is a big hack and slowing sorting down up to 10 times. Proper Unicode collation algorithm is already implemented by ICU and other projects. See http://unicode.org/reports/tr10/ It will be better if someone adopt it instead of hacks. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Sat, 2014-04-05 at 05:35 +0400, Andrey Chernov wrote: On 04.04.2014 16:46, Gleb Smirnoff wrote: On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: A On 02.04.2014 21:15, Gleb Smirnoff wrote: A S + :lang=en_US.UTF-8:\ A S + :charset=UTF-8: A A And I'd like to do same change for the 'russian' login class A in /etc/login.conf. A A Please everybody remember that we don't have UTF-8 collation A implemented, just fallback to bytecode comparison. Any objections on checking in FreeBSD-compatible[1] UTF-8 collation implementation from Alex Tutubalin? http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html Even his version 2 have my objections. I already reply Alex about this in 2008. In short: 1) It is error there: almost all single chars above ASCII should be chains, i.t. two bytes minimum, since there almost no intersections with ISO8859-1 as UTF-8 subset. 2) The table itself is very incomplete, f.e. not covering either whole KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting regression. Russian UTF-8 collation should be able to sort all major Russian charsets mentioned, i.e. we need combined table. 3) charmap map.ISO8859-1 declaration is missing (needed mainly for using pure ASCII chars mnemonic names). Even in case above mentioned errors will be removed and the code will be committed afterwards, we should understand that this way (implementing multibyte collation via single byte one) even while being possible is a big hack and slowing sorting down up to 10 times. Proper Unicode collation algorithm is already implemented by ICU and other projects. See http://unicode.org/reports/tr10/ It will be better if someone adopt it instead of hacks. If you have a different patch, I'd appreciate seeing it. Sean signature.asc Description: This is a digitally signed message part
Re: login.conf -- UTF-8
On 05.04.2014 6:39, Sean Bruno wrote: On Sat, 2014-04-05 at 05:35 +0400, Andrey Chernov wrote: On 04.04.2014 16:46, Gleb Smirnoff wrote: On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: A On 02.04.2014 21:15, Gleb Smirnoff wrote: A S + :lang=en_US.UTF-8:\ A S + :charset=UTF-8: A A And I'd like to do same change for the 'russian' login class A in /etc/login.conf. A A Please everybody remember that we don't have UTF-8 collation A implemented, just fallback to bytecode comparison. Any objections on checking in FreeBSD-compatible[1] UTF-8 collation implementation from Alex Tutubalin? http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html Even his version 2 have my objections. I already reply Alex about this in 2008. In short: 1) It is error there: almost all single chars above ASCII should be chains, i.t. two bytes minimum, since there almost no intersections with ISO8859-1 as UTF-8 subset. 2) The table itself is very incomplete, f.e. not covering either whole KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting regression. Russian UTF-8 collation should be able to sort all major Russian charsets mentioned, i.e. we need combined table. 3) charmap map.ISO8859-1 declaration is missing (needed mainly for using pure ASCII chars mnemonic names). Even in case above mentioned errors will be removed and the code will be committed afterwards, we should understand that this way (implementing multibyte collation via single byte one) even while being possible is a big hack and slowing sorting down up to 10 times. Proper Unicode collation algorithm is already implemented by ICU and other projects. See http://unicode.org/reports/tr10/ It will be better if someone adopt it instead of hacks. If you have a different patch, I'd appreciate seeing it. I don't have a different patch. In case you have enough time to fix above mentioned obstacles, I can review yours (or somebody else's) one. No code situation doesn't mean wrong code can be committed. Do it properly even when it is a hack. -- http://ache.vniz.net/ signature.asc Description: OpenPGP digital signature
Re: login.conf -- UTF-8
On 05.04.2014 5:35, Andrey Chernov wrote: Even his version 2 have my objections. I already reply Alex about this in 2008. In short: 1) It is error there: almost all single chars above ASCII should be chains, i.t. two bytes minimum ... I check my whole correspondence with Alexey and withdraw objection #1 which was related to the \x80;...;\xff line in his table. While they are illegal sequences in UTF-8, our colldef(1) wants all single byte characters mapped. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
login.conf -- UTF-8
I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. This should be step 0 in a language agnostic installer project that is beyond the scope of making the system more useable. --- login.conf 2013-10-21 15:51:14.553992170 -0700 +++ /etc/login.conf 2014-03-31 09:26:17.588503798 -0700 @@ -45,7 +45,9 @@ :kqueues=unlimited:\ :priority=0:\ :ignoretime@:\ - :umask=022: + :umask=022:\ + :lang=en_US.UTF-8:\ + :charset=UTF-8: # signature.asc Description: This is a digitally signed message part
Re: login.conf -- UTF-8
Sean, On Wed, Apr 02, 2014 at 09:53:49AM -0700, Sean Bruno wrote: S I'd like to make this change to login.conf for default installs. S S This removes some amount of hackery in the ports system that is working S around our lack of UTF-8 in the base. S S This should be step 0 in a language agnostic installer project that is S beyond the scope of making the system more useable. S S S --- login.conf 2013-10-21 15:51:14.553992170 -0700 S +++ /etc/login.conf 2014-03-31 09:26:17.588503798 -0700 S @@ -45,7 +45,9 @@ S :kqueues=unlimited:\ S :priority=0:\ S :ignoretime@:\ S -:umask=022: S +:umask=022:\ S +:lang=en_US.UTF-8:\ S +:charset=UTF-8: And I'd like to do same change for the 'russian' login class in /etc/login.conf. I've got a few things that still need to be fixed, before this change, but I definitely target to achieve that before stable/11. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Wed, 02 Apr 2014 09:53:49 -0700, Sean Bruno sbr...@ignoranthack.me wrote: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. This should be step 0 in a language agnostic installer project that is beyond the scope of making the system more useable. --- login.conf2013-10-21 15:51:14.553992170 -0700 +++ /etc/login.conf 2014-03-31 09:26:17.588503798 -0700 @@ -45,7 +45,9 @@ :kqueues=unlimited:\ :priority=0:\ :ignoretime@:\ - :umask=022: + :umask=022:\ + :lang=en_US.UTF-8:\ + :charset=UTF-8: # Changing the default LC_COLLATE is risky, how about keeping LC_COLLATE=C by default? --- /usr/src/etc/login.conf 2013-09-30 19:04:24.0 + +++ /etc/login.conf 2013-09-30 19:02:22.0 + @@ -26,7 +26,7 @@ :passwd_format=sha512:\ :copyright=/etc/COPYRIGHT:\ :welcome=/etc/motd:\ - :setenv=MAIL=/var/mail/$,BLOCKSIZE=K:\ + :setenv=MAIL=/var/mail/$,BLOCKSIZE=K,LC_COLLATE=C:\ :path=/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin /usr/local/bin ~/bin:\ :nologin=/var/run/nologin:\ :cputime=unlimited:\ @@ -44,7 +44,9 @@ :pseudoterminals=unlimited:\ :priority=0:\ :ignoretime@:\ - :umask=022: + :umask=022:\ + :charset=UTF-8:\ + :lang=en_US.UTF-8: # -- Benjamin Lee http://www.b1c1l1.com/ signature.asc Description: PGP signature
Re: login.conf -- UTF-8
On Wed, Apr 2, 2014 at 6:53 PM, Sean Bruno sbr...@ignoranthack.me wrote: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. This should be step 0 in a language agnostic installer project that is beyond the scope of making the system more useable. --- login.conf 2013-10-21 15:51:14.553992170 -0700 +++ /etc/login.conf 2014-03-31 09:26:17.588503798 -0700 @@ -45,7 +45,9 @@ :kqueues=unlimited:\ :priority=0:\ :ignoretime@:\ - :umask=022: + :umask=022:\ + :lang=en_US.UTF-8:\ + :charset=UTF-8: Hi, Don't forget to request an exp-run from portmgr before committing this change, we never know what can break. Cheers, Antoine ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On 02.04.2014 21:15, Gleb Smirnoff wrote: S + :lang=en_US.UTF-8:\ S + :charset=UTF-8: And I'd like to do same change for the 'russian' login class in /etc/login.conf. Please everybody remember that we don't have UTF-8 collation implemented, just fallback to bytecode comparison. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
In article 1396457629.2280.2.ca...@powernoodle.corp.yahoo.com, sbr...@freebsd.org writes: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. I'm not sure what the connection is here. Surely the ports system runs with the locale of the user running make (which in my case is going to be C). Any port that requires a specific locale to build properly needs to be setting that locale explicitly. -GAWollman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Wed, 2014-04-02 at 18:06 -0400, Garrett Wollman wrote: In article 1396457629.2280.2.ca...@powernoodle.corp.yahoo.com, sbr...@freebsd.org writes: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. I'm not sure what the connection is here. Surely the ports system runs with the locale of the user running make (which in my case is going to be C). Any port that requires a specific locale to build properly needs to be setting that locale explicitly. -GAWollman I have been informed by folks that this change I suggest would help in the case of ports having to declare UTF-8 support explicitly or something. I'm hand-wavy on the details and ignorant of the hacks in place. I only know that I've been *told* this. sean signature.asc Description: This is a digitally signed message part
Re: login.conf -- UTF-8
On Wed, Apr 02, 2014 at 03:56:35PM -0700, Sean Bruno wrote: I have been informed by folks that this change I suggest would help in the case of ports having to declare UTF-8 support explicitly or something. Clearly ports that need to do this are broken -- but this very much an edge-case of brokenness. There's so much other lower-hanging fruit than to fix the individual offenders. mcl ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Wed, Apr 02, 2014 at 03:56:35PM -0700, Sean Bruno wrote: On Wed, 2014-04-02 at 18:06 -0400, Garrett Wollman wrote: In article 1396457629.2280.2.ca...@powernoodle.corp.yahoo.com, sbr...@freebsd.org writes: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. I'm not sure what the connection is here. Surely the ports system runs with the locale of the user running make (which in my case is going to be C). Any port that requires a specific locale to build properly needs to be setting that locale explicitly. You'd think so, but that's not what's happening. What's happening is the software builds as long as the locale isn't C. Hence, ugly hacks like this: http://svnweb.freebsd.org/ports/head/Mk/bsd.ruby.mk?annotate=348863#l257 Why? Because the people writing it have never encountered a system where LANG isn't set or is set to C. Yes, it's a bug in their software. No, they never have and never will encounter it. Because every other operating system sets LANG to whatever the user specifies. And so they have no interest in fixing it, because neither they nor any one they know will ever encounter it, and even if you report it to them they will tell you it's a bug in your system for not having LANG specified. And I have no interest in patching it hundreds of times. And this is just one example. There are others, I think, that aren't ruby related at all. I have been informed by folks that this change I suggest would help in the case of ports having to declare UTF-8 support explicitly or something. I'm hand-wavy on the details and ignorant of the hacks in place. I only know that I've been *told* this. I think we should join the club of asking the user, but that's more work and until then having a reasonable default and having people change it seems sane. Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Thu, 3 Apr 2014, Steve Wills wrote: On Wed, Apr 02, 2014 at 03:56:35PM -0700, Sean Bruno wrote: On Wed, 2014-04-02 at 18:06 -0400, Garrett Wollman wrote: In article 1396457629.2280.2.ca...@powernoodle.corp.yahoo.com, sbr...@freebsd.org writes: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. I'm not sure what the connection is here. Surely the ports system runs with the locale of the user running make (which in my case is going to be C). Any port that requires a specific locale to build properly needs to be setting that locale explicitly. You'd think so, but that's not what's happening. What's happening is the software builds as long as the locale isn't C. Hence, ugly hacks like this: http://svnweb.freebsd.org/ports/head/Mk/bsd.ruby.mk?annotate=348863#l257 Why? Because the people writing it have never encountered a system where LANG isn't set or is set to C. Yes, it's a bug in their software. No, they never have and never will encounter it. Because every other operating system sets LANG to whatever the user specifies. And so they have no interest in fixing it, because neither they nor any one they know will ever encounter it, and even if you report it to them they will tell you it's a bug in your system for not having LANG specified. And I have no interest in patching it hundreds of times. And this is just one example. There are others, I think, that aren't ruby related at all. The first thing I do when I get a Linux system is set LANG to C. I hate all the colorizations and incorrect ordering from ls when LANG isn't C. So you are saying, that ports will be broken when I set LANG back to C again? I have been informed by folks that this change I suggest would help in the case of ports having to declare UTF-8 support explicitly or something. I'm hand-wavy on the details and ignorant of the hacks in place. I only know that I've been *told* this. I think we should join the club of asking the user, but that's more work and until then having a reasonable default and having people change it seems sane. A default is fine, but saying that ports will be broken when not using the default is not fine. This is LANG, not a gcc/clang machine-specific optimization that someone has set to get an extra 0.001% improvement, but happens to break the compiler for some ports. -- DE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: login.conf -- UTF-8
On Wed, Apr 02, 2014 at 09:31:08PM -0400, Daniel Eischen wrote: On Thu, 3 Apr 2014, Steve Wills wrote: On Wed, Apr 02, 2014 at 03:56:35PM -0700, Sean Bruno wrote: On Wed, 2014-04-02 at 18:06 -0400, Garrett Wollman wrote: In article 1396457629.2280.2.ca...@powernoodle.corp.yahoo.com, sbr...@freebsd.org writes: I'd like to make this change to login.conf for default installs. This removes some amount of hackery in the ports system that is working around our lack of UTF-8 in the base. I'm not sure what the connection is here. Surely the ports system runs with the locale of the user running make (which in my case is going to be C). Any port that requires a specific locale to build properly needs to be setting that locale explicitly. You'd think so, but that's not what's happening. What's happening is the software builds as long as the locale isn't C. Hence, ugly hacks like this: http://svnweb.freebsd.org/ports/head/Mk/bsd.ruby.mk?annotate=348863#l257 Why? Because the people writing it have never encountered a system where LANG isn't set or is set to C. Yes, it's a bug in their software. No, they never have and never will encounter it. Because every other operating system sets LANG to whatever the user specifies. And so they have no interest in fixing it, because neither they nor any one they know will ever encounter it, and even if you report it to them they will tell you it's a bug in your system for not having LANG specified. And I have no interest in patching it hundreds of times. And this is just one example. There are others, I think, that aren't ruby related at all. The first thing I do when I get a Linux system is set LANG to C. I hate all the colorizations and incorrect ordering from ls when LANG isn't C. So you are saying, that ports will be broken when I set LANG back to C again? I have been informed by folks that this change I suggest would help in the case of ports having to declare UTF-8 support explicitly or something. I'm hand-wavy on the details and ignorant of the hacks in place. I only know that I've been *told* this. I think we should join the club of asking the user, but that's more work and until then having a reasonable default and having people change it seems sane. A default is fine, but saying that ports will be broken when not using the default is not fine. This is LANG, not a gcc/clang machine-specific optimization that someone has set to get an extra 0.001% improvement, but happens to break the compiler for some ports. I suppose you're right. Ugly hacks to work around ugly hacks will stay. :) (Not that I'd planned to remove them any time soon anyway, because such a change would take a long time to propogate to all supported versions anyway.) Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org