Re: NetBSD sort l10n: I give up!
-On [20020407 07:00], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Reasons are: Better option: 1) leave NetBSD sort 2) unhook from build 3) add GNU sort back for now 4) fix up NetBSD sort That you are unable doesn't mean others are unable as well. :) -- Jeroen Ruigrok van der Werven / asmodai / Kita no Mono asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED] http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/ Resolve to find thyself; and to know that he who finds himself, loses his misery... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 11:48:15 +0200, Jeroen Ruigrok/asmodai wrote: -On [20020407 07:00], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Reasons are: Better option: 1) leave NetBSD sort 2) unhook from build 3) add GNU sort back for now It is not better but the same as mine. I don't plan to remove inactive contrib stuff. 4) fix up NetBSD sort That you are unable doesn't mean others are unable as well. :) In theory, yes, but in practice I sure that nobody ever can fix it without total code flow redesign. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
-On [20020407 12:00], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: On Sun, Apr 07, 2002 at 11:48:15 +0200, Jeroen Ruigrok/asmodai wrote: -On [20020407 07:00], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Reasons are: Better option: 1) leave NetBSD sort 2) unhook from build 3) add GNU sort back for now It is not better but the same as mine. I don't plan to remove inactive contrib stuff. That was not what you said in your initial suggestion: ``I plan to remove all vestiges of NetBSD sort'', that really sounds, to me, as if you were going to cvs rm it. So, to get it clear, it will remain in contrib? -- Jeroen Ruigrok van der Werven / asmodai / Kita no Mono asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED] http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/ And I'm learning the highs and lows of the fake promises... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 12:13:50 +0200, Jeroen Ruigrok/asmodai wrote: It is not better but the same as mine. I don't plan to remove inactive contrib stuff. That was not what you said in your initial suggestion: ``I plan to remove all vestiges of NetBSD sort'', that really sounds, to me, as if you were going to cvs rm it. So, to get it clear, it will remain in contrib? Sorry if I was unclear, I mean functionality. Yes, it will remains in the contrib, if somebody needs it, I am not picky about inactive stuff. If you notice my second (after give up) message, I even suggest to install it under different name, if someone wants it. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
Here is a patch to make NetBSD's sort(1) sort by the locale's collating order. The table should not be called ascii[] anymore, but I can't think of a better one, and supplying a patch to change the name would be pointless. It works. It assumes the string strxfrm() outputs is the same length as its input, which is always possible, and true on FreeBSD. $ env LC_COLLATE=fr_FR.ISO8859-1 sort test.fr | rs Ète elle $ env LC_COLLATE=fr_FR.ISO8859-1 ./sort test.fr | rs Ète elle $ rs test.fr elle Ète Enjoy (?) Tim Index: init.c === RCS file: /home/ncvs/src/contrib/sort/init.c,v retrieving revision 1.2 diff -u -r1.2 init.c --- init.c 2002/04/07 00:49:00 1.2 +++ init.c 2002/04/07 10:29:59 @@ -46,6 +46,7 @@ #endif /* not lint */ #include ctype.h +#include err.h #include string.h static void insertcol __P((struct field *)); @@ -291,8 +292,7 @@ * Note: when sorting in forward order, to encode character zero in a key, * use \001\001; character 1 becomes \001\002. In this case, character 0 * is reserved for the field delimiter. Analagously for -r (fld_d = 255). - * Note: this is only good for ASCII sorting. For different LC 's, - * all bets are off. See also num_init in number.c + * See also num_init in number.c */ void settables(gflags) @@ -300,8 +300,20 @@ { u_char *wts; int i, incr; + static int warned; + char abuf[2], xbuf[8]; + + abuf[1] = '\0'; for (i=0; i 256; i++) { - ascii[i] = i; + if (i != 0) { + *abuf = i; + if (strxfrm(xbuf, abuf, sizeof(xbuf)) 1 !warned) { + warnx(collating order too complicated); + warned = 1; + } + ascii[i] = *xbuf; + } else + ascii[i] = 0; if (i REC_D i 255 - REC_D+1) Rascii[i] = 255 - i + 1; else To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 20:40:13 +1000, Tim J. Robbins wrote: It works. It assumes the string strxfrm() outputs is the same length as its input, which is always possible, and true on FreeBSD. It seems you try follow the same path as me :-) No, it not works since breaks so many other places. Please run some tests before posting the first idea comes into mind. I suggest following test first: none,-r,-f,-n combination for all FreeBSD locales compared to GNU sort. The next test is -R option in 0.255 range for all locales. Before you end up building correct tables for ascii,Rascii,Ftable,RFtable, I can inform you that correct tables for them breaks -n badly. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 14:55:37 +0400, Andrey A. Chernov wrote: Before you end up building correct tables for ascii,Rascii,Ftable,RFtable, I can inform you that correct tables for them breaks -n badly. I can additionly notice that building correct tables for Ftable and RFtable is especially hard because conflicts appearse due to duplicated lower-upper characters ranges and must be resolved by additional shifting from REC_D to unknown direction which may be not possible as single pass (i.e. not overwriting REC_D again) operation for given locale. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 02:55:37PM +0400, Andrey A. Chernov wrote: I suggest following test first: none,-r,-f,-n combination for all FreeBSD locales compared to GNU sort. The next test is -R option in 0.255 range for all locales. Perhaps you could make a test suite and commit to [gnu/]usr.bin/sort/testsuite ? Try: cd /usr/src/usr.bin/bzip2 ; make all test This way one would know when you would be happy with a GNU sort replacement. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 04:30:31 -0700, David O'Brien wrote: On Sun, Apr 07, 2002 at 02:55:37PM +0400, Andrey A. Chernov wrote: I suggest following test first: none,-r,-f,-n combination for all FreeBSD locales compared to GNU sort. The next test is -R option in 0.255 range for all locales. Perhaps you could make a test suite and commit to [gnu/]usr.bin/sort/testsuite ? Yes, after GNU sort will be restored. I already send request to cvs@ -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 10:00:08AM +0400, Andrey A. Chernov wrote: On Sun, Apr 07, 2002 at 08:52:21 +0400, Andrey A. Chernov wrote: It is sad news, but I try to do my best to l10n NetBSD sort in vain, it is tied to ASCII so closely so it is almost impossible to handle all possible cases without imbedding AI code far bigger then whole sort. So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Reasons are: For people who needs exact NetBSD sort functionality and don't needs l10n (if they exists) NetBSD sort can be installed under different name like ascii_sort or bsort. ... and from ports. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 21:49:44 +1000, Tim J. Robbins wrote: On Sun, Apr 07, 2002 at 02:55:37PM +0400, Andrey A. Chernov wrote: No, it not works since breaks so many other places. I guess I have to agree with you there, that it does break -n and -f and does not handle (for example) German correctly. I still do believe that a similar approach could correctly handle all the ISO8859 character sets, only it's not as simple as it seems. I think so too, initially. I even have correct ascii,Rascii,Ftable,RFtable,gweights tables in my last committed variant (but not for various -R). Nope. -n broke them all because it hardcoded to ASCII but sorted in the modified (collated) order.. Back permutation table not helps because of different forms of main (collated) order corresponding to -r -f flags. Via some hacking I even made variant without -R works for -n too, but it not means it will not be broken for any future locale we can have. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
Andrey A. Chernov [EMAIL PROTECTED] writes: So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Fair enough. I don't care as long as it sorts right. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
Dag-Erling Smorgrav [EMAIL PROTECTED] writes: Andrey A. Chernov [EMAIL PROTECTED] writes: So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Fair enough. I don't care as long as it sorts right. I must apologize for reacting the way I did, BTW. I shouldn't have made those commits; I realize now that I was acting in anger and with prejudice, which is never a good frame of mind for doing FreeBSD work. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
-On [20020407 12:30], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: Sorry if I was unclear, I mean functionality. Yes, it will remains in the contrib, if somebody needs it, I am not picky about inactive stuff. If you notice my second (after give up) message, I even suggest to install it under different name, if someone wants it. Tim J Robbins whipped up some code which seems to take us to the same level as GNU sort, as far as we could see. As present the GNU sort we have doesn't seem to be able to handle multibyte and/or shift states, does it? As far as he and I could see it was only 8-bit limited. And his work gives that to the NetBSD sort as well. -- Jeroen Ruigrok van der Werven / asmodai / Kita no Mono asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED] http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/ Life can only be understood backwards, but it must be lived forwards... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 15:32:18 +0200, Jeroen Ruigrok/asmodai wrote: -On [20020407 12:30], Andrey A. Chernov ([EMAIL PROTECTED]) wrote: Sorry if I was unclear, I mean functionality. Yes, it will remains in the contrib, if somebody needs it, I am not picky about inactive stuff. If you notice my second (after give up) message, I even suggest to install it under different name, if someone wants it. Tim J Robbins whipped up some code which seems to take us to the same level as GNU sort, as far as we could see. What code you mean? If you mean the patch he post, the patch is obviously wrong and not pass even simplest tests. It reminds my very early attempts. As present the GNU sort we have doesn't seem to be able to handle multibyte and/or shift states, does it? As far as he and I could see it was only 8-bit limited. And his work gives that to the NetBSD sort as well. Yes, both variants (i.e. NetBSD sort too, if it will be fixed) limited to 8bit, as 99% of other base-localized soft. Multibyte l10n is completely different thing. What his work you mean? I answer him, read this discussion to the end. He admit that his change is wrong. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
Andrey A. Chernov wrote: On Sun, Apr 07, 2002 at 04:30:31 -0700, David O'Brien wrote: On Sun, Apr 07, 2002 at 02:55:37PM +0400, Andrey A. Chernov wrote: I suggest following test first: none,-r,-f,-n combination for all FreeBSD locales compared to GNU sort. The next test is -R option in 0.255 range for all locales. Perhaps you could make a test suite and commit to [gnu/]usr.bin/sort/testsuite ? Yes, after GNU sort will be restored. I already send request to cvs@ There is no need for cvs@ to be involved here. Just get the old pre-rm files and 'cvs add' them back again. There is nothing significant that is still on the vendor branch that is worth messing around with. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NetBSD sort l10n: I give up!
On Sun, Apr 07, 2002 at 08:52:21 +0400, Andrey A. Chernov wrote: It is sad news, but I try to do my best to l10n NetBSD sort in vain, it is tied to ASCII so closely so it is almost impossible to handle all possible cases without imbedding AI code far bigger then whole sort. So, I plan to remove all vestiges of NetBSD sort and ask to restore GNU sort from the Attic. Reasons are: For people who needs exact NetBSD sort functionality and don't needs l10n (if they exists) NetBSD sort can be installed under different name like ascii_sort or bsort. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message