Re: Duplicate OPT_ entries in gcc/options.h
On 09 Jun 2016, at 00:30, Jung-uk Kimwrote: > > On 06/ 8/16 06:16 PM, Dimitry Andric wrote: >> On 08 Jun 2016, at 23:54, Jung-uk Kim wrote: >>> >>> On 06/ 8/16 05:15 PM, Dimitry Andric wrote: On 08 Jun 2016, at 21:11, Gerald Pfeifer wrote: > > I got a user report, and could reproduce this, that building > GCC (lang/gcc, but also current HEAD, so probably pretty much > any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get > conflicting entires in $BUILDDIR/gcc/options.h such as >> ... Note that GNU awk does *not* produce a different optionlist file when used with either LANG=C or LANG=en_US.UTF-8. >> ... So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works differently in our awk, depending on the LANG settings. No idea when that changed, though, if it changed at all... >>> >>> This behaviour is known for very long time: >>> >>> https://svnweb.freebsd.org/changeset/base/173731 >>> >>> and it is not our fault: >>> >>> https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html >> >> >> Indeed, so the real question is: why does this only started coming up >> now, if it is known since 2007? I have been building gcc ports for >> ages, and never ran into this problem, but I also have never actively >> used a persistent LANG environment variable, let alone with UTF-8 in it. >> >> Is this because more people started using UTF-8 recently? > > We are doing more correct collation now: > > https://svnweb.freebsd.org/changeset/base/290494 Indeed. This problem has come up before on the ports mailing list, almost immediately after that commit: https://lists.freebsd.org/pipermail/freebsd-ports/2015-November/101034.html Apparently some proposals were made to set LANG and LC_ALL to C globally for port builds, but it was never implemented? I guess more people are now noticing it, because they are trying out the 11.0-ALPHA installers. -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Re: Duplicate OPT_ entries in gcc/options.h
On 06/ 8/16 05:15 PM, Dimitry Andric wrote: > On 08 Jun 2016, at 21:11, Gerald Pfeiferwrote: >> >> I got a user report, and could reproduce this, that building >> GCC (lang/gcc, but also current HEAD, so probably pretty much >> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get >> conflicting entires in $BUILDDIR/gcc/options.h such as >> >> OPT_d = 135, /* -d */ >> OPT_D = 136, /* -D */ >> OPT_d = 137, /* -d */ >> OPT_D = 138, /* -D */ >> OPT_d = 141, /* -d */ >> OPT_D = 142, /* -D */ >> OPT_d = 143, /* -d */ >> >> Using LANG = en_US (without UTF-8), everything works fine. >> >> Any ideas what might be going on here? (This is done via >> AWK scripts from what I can tell, does this trigger any >> ideas?) > > It is definitely something caused by our awk in base, in any case. > First opt-gather.awk is run to generate a flat list of all options: > > /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist > > Then opt-functions.awk is run to process optionlist into options.h: > > /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h > > If I run the first step using LANG=C, or without any LANG setting, both > optionlist and options.h are as expected. If I run the first step using > LANG=en_US.UTF-8, the optionlist is sorted differently, for example the > "good" optionlist has the uppercase d options first, and much later the > lowercase d options: > > D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing > after %qs)^\-D[=] Define a with as its value. If > just is given, is taken to be 1 > D^\Driver Joined Separate > D^\Fortran Joined Separate > ... much later in the file, after all options starting with an uppercase > letter ... > d^\C ObjC C++ ObjC++ Joined > d^\Common Joined^\-d Enable dumps from specific passes of the > compiler > d^\Fortran Joined > d^\Java Separate SeparateAlias Alias(foutput-class-dir=) > > The "bad" optionlist has the upper and lower case d options sorted > together: > > d^\C ObjC C++ ObjC++ Joined > D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing > after %qs)^\-D[=] Define a with as its value. If > just is given, is taken to be 1 > d^\Common Joined^\-d Enable dumps from specific passes of the > compiler > D^\Driver Joined Separate > defsym=^\Driver JoinedOrMissing > defsym^\Driver Separate > d^\Fortran Joined > D^\Fortran Joined Separate > d^\Java Separate SeparateAlias Alias(foutput-class-dir=) > > Note that GNU awk does *not* produce a different optionlist file when > used with either LANG=C or LANG=en_US.UTF-8. > > opt-gather.awk's sorting function looks like this: > > function sort(ARRAY, ELEMENTS) > { > for (i = 2; i <= ELEMENTS; ++i) { > for (j = i; ARRAY[j-1] > ARRAY[j]; --j) { > temp = ARRAY[j] > ARRAY[j] = ARRAY[j-1] > ARRAY[j-1] = temp > } > } > return > } > > So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works > differently in our awk, depending on the LANG settings. No idea when > that changed, though, if it changed at all... This behaviour is known for very long time: https://svnweb.freebsd.org/changeset/base/173731 and it is not our fault: https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html GNU awk produces the same output with "--posix" option. FYI... Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: Duplicate OPT_ entries in gcc/options.h
On 08 Jun 2016, at 23:15, Dimitry Andricwrote: > > On 08 Jun 2016, at 21:11, Gerald Pfeifer wrote: ... > Note that GNU awk does *not* produce a different optionlist file when > used with either LANG=C or LANG=en_US.UTF-8. And that phenomenon is explained here: http://www.gnu.org/software/gawk/manual/gawk.html#POSIX-String-Comparison "6.3.2.3 String Comparison with POSIX Rules The POSIX standard says that string comparison is performed based on the locale's collating order. This is the order in which characters sort, as defined by the locale (for more discussion, see Locales). This order is usually very different from the results obtained when doing straight character-by-character comparison.34 Because this behavior differs considerably from existing practice, gawk only implements it when in POSIX mode (see Options)." -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Duplicate OPT_ entries in gcc/options.h
On 08 Jun 2016, at 21:11, Gerald Pfeiferwrote: > > I got a user report, and could reproduce this, that building > GCC (lang/gcc, but also current HEAD, so probably pretty much > any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get > conflicting entires in $BUILDDIR/gcc/options.h such as > > OPT_d = 135, /* -d */ > OPT_D = 136, /* -D */ > OPT_d = 137, /* -d */ > OPT_D = 138, /* -D */ > OPT_d = 141, /* -d */ > OPT_D = 142, /* -D */ > OPT_d = 143, /* -d */ > > Using LANG = en_US (without UTF-8), everything works fine. > > Any ideas what might be going on here? (This is done via > AWK scripts from what I can tell, does this trigger any > ideas?) It is definitely something caused by our awk in base, in any case. First opt-gather.awk is run to generate a flat list of all options: /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist Then opt-functions.awk is run to process optionlist into options.h: /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h If I run the first step using LANG=C, or without any LANG setting, both optionlist and options.h are as expected. If I run the first step using LANG=en_US.UTF-8, the optionlist is sorted differently, for example the "good" optionlist has the uppercase d options first, and much later the lowercase d options: D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after %qs)^\-D[=] Define a with as its value. If just is given, is taken to be 1 D^\Driver Joined Separate D^\Fortran Joined Separate ... much later in the file, after all options starting with an uppercase letter ... d^\C ObjC C++ ObjC++ Joined d^\Common Joined^\-d Enable dumps from specific passes of the compiler d^\Fortran Joined d^\Java Separate SeparateAlias Alias(foutput-class-dir=) The "bad" optionlist has the upper and lower case d options sorted together: d^\C ObjC C++ ObjC++ Joined D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after %qs)^\-D[=] Define a with as its value. If just is given, is taken to be 1 d^\Common Joined^\-d Enable dumps from specific passes of the compiler D^\Driver Joined Separate defsym=^\Driver JoinedOrMissing defsym^\Driver Separate d^\Fortran Joined D^\Fortran Joined Separate d^\Java Separate SeparateAlias Alias(foutput-class-dir=) Note that GNU awk does *not* produce a different optionlist file when used with either LANG=C or LANG=en_US.UTF-8. opt-gather.awk's sorting function looks like this: function sort(ARRAY, ELEMENTS) { for (i = 2; i <= ELEMENTS; ++i) { for (j = i; ARRAY[j-1] > ARRAY[j]; --j) { temp = ARRAY[j] ARRAY[j] = ARRAY[j-1] ARRAY[j-1] = temp } } return } So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works differently in our awk, depending on the LANG settings. No idea when that changed, though, if it changed at all... -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail