Re: Duplicate OPT_ entries in gcc/options.h

2016-06-08 Thread Dimitry Andric
On 09 Jun 2016, at 00:30, Jung-uk Kim  wrote:
> 
> On 06/ 8/16 06:16 PM, Dimitry Andric wrote:
>> On 08 Jun 2016, at 23:54, Jung-uk Kim  wrote:
>>> 
>>> On 06/ 8/16 05:15 PM, Dimitry Andric wrote:
 On 08 Jun 2016, at 21:11, Gerald Pfeifer  wrote:
> 
> I got a user report, and could reproduce this, that building
> GCC (lang/gcc, but also current HEAD, so probably pretty much
> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get
> conflicting entires in $BUILDDIR/gcc/options.h such as
>> ...
 Note that GNU awk does *not* produce a different optionlist file when
 used with either LANG=C or LANG=en_US.UTF-8.
>> ...
 So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works
 differently in our awk, depending on the LANG settings.  No idea when
 that changed, though, if it changed at all...
>>> 
>>> This behaviour is known for very long time:
>>> 
>>> https://svnweb.freebsd.org/changeset/base/173731
>>> 
>>> and it is not our fault:
>>> 
>>> https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html
>> 
>> 
>> Indeed, so the real question is: why does this only started coming up
>> now, if it is known since 2007?  I have been building gcc ports for
>> ages, and never ran into this problem, but I also have never actively
>> used a persistent LANG environment variable, let alone with UTF-8 in it.
>> 
>> Is this because more people started using UTF-8 recently?
> 
> We are doing more correct collation now:
> 
> https://svnweb.freebsd.org/changeset/base/290494

Indeed.  This problem has come up before on the ports mailing list,
almost immediately after that commit:

https://lists.freebsd.org/pipermail/freebsd-ports/2015-November/101034.html

Apparently some proposals were made to set LANG and LC_ALL to C globally
for port builds, but it was never implemented?

I guess more people are now noticing it, because they are trying out the
11.0-ALPHA installers.

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Re: Duplicate OPT_ entries in gcc/options.h

2016-06-08 Thread Jung-uk Kim
On 06/ 8/16 05:15 PM, Dimitry Andric wrote:
> On 08 Jun 2016, at 21:11, Gerald Pfeifer  wrote:
>>
>> I got a user report, and could reproduce this, that building
>> GCC (lang/gcc, but also current HEAD, so probably pretty much
>> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get
>> conflicting entires in $BUILDDIR/gcc/options.h such as
>>
>>  OPT_d = 135,   /* -d */
>>  OPT_D = 136,   /* -D */
>>  OPT_d = 137,   /* -d */
>>  OPT_D = 138,   /* -D */
>>  OPT_d = 141,   /* -d */
>>  OPT_D = 142,   /* -D */
>>  OPT_d = 143,   /* -d */
>>
>> Using LANG = en_US (without UTF-8), everything works fine.
>>
>> Any ideas what might be going on here?  (This is done via
>> AWK scripts from what I can tell, does this trigger any
>> ideas?)
> 
> It is definitely something caused by our awk in base, in any case.
> First opt-gather.awk is run to generate a flat list of all options:
> 
>   /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist
> 
> Then opt-functions.awk is run to process optionlist into options.h:
> 
>   /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f 
> /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h
> 
> If I run the first step using LANG=C, or without any LANG setting, both
> optionlist and options.h are as expected.  If I run the first step using
> LANG=en_US.UTF-8, the optionlist is sorted differently, for example the
> "good" optionlist has the uppercase d options first, and much later the
> lowercase d options:
> 
>   D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing 
> after %qs)^\-D[=]   Define a  with  as its value.  If 
> just  is given,  is taken to be 1
>   D^\Driver Joined Separate
>   D^\Fortran Joined Separate
>   ... much later in the file, after all options starting with an uppercase 
> letter ...
>   d^\C ObjC C++ ObjC++ Joined
>   d^\Common Joined^\-d   Enable dumps from specific passes of the 
> compiler
>   d^\Fortran Joined
>   d^\Java Separate SeparateAlias Alias(foutput-class-dir=)
> 
> The "bad" optionlist has the upper and lower case d options sorted
> together:
> 
>   d^\C ObjC C++ ObjC++ Joined
>   D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing 
> after %qs)^\-D[=]   Define a  with  as its value.  If 
> just  is given,  is taken to be 1
>   d^\Common Joined^\-d   Enable dumps from specific passes of the 
> compiler
>   D^\Driver Joined Separate
>   defsym=^\Driver JoinedOrMissing
>   defsym^\Driver Separate
>   d^\Fortran Joined
>   D^\Fortran Joined Separate
>   d^\Java Separate SeparateAlias Alias(foutput-class-dir=)
> 
> Note that GNU awk does *not* produce a different optionlist file when
> used with either LANG=C or LANG=en_US.UTF-8.
> 
> opt-gather.awk's sorting function looks like this:
> 
>   function sort(ARRAY, ELEMENTS)
>   {
>   for (i = 2; i <= ELEMENTS; ++i) {
>   for (j = i; ARRAY[j-1] > ARRAY[j]; --j) {
>   temp = ARRAY[j]
>   ARRAY[j] = ARRAY[j-1]
>   ARRAY[j-1] = temp
>   }
>   }
>   return
>   }
> 
> So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works
> differently in our awk, depending on the LANG settings.  No idea when
> that changed, though, if it changed at all...

This behaviour is known for very long time:

https://svnweb.freebsd.org/changeset/base/173731

and it is not our fault:

https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html

GNU awk produces the same output with "--posix" option.

FYI...

Jung-uk Kim



signature.asc
Description: OpenPGP digital signature


Re: Duplicate OPT_ entries in gcc/options.h

2016-06-08 Thread Dimitry Andric
On 08 Jun 2016, at 23:15, Dimitry Andric  wrote:
> 
> On 08 Jun 2016, at 21:11, Gerald Pfeifer  wrote:
...
> Note that GNU awk does *not* produce a different optionlist file when
> used with either LANG=C or LANG=en_US.UTF-8.

And that phenomenon is explained here:

http://www.gnu.org/software/gawk/manual/gawk.html#POSIX-String-Comparison

"6.3.2.3 String Comparison with POSIX Rules

The POSIX standard says that string comparison is performed based on the
locale's collating order. This is the order in which characters sort, as
defined by the locale (for more discussion, see Locales). This order is
usually very different from the results obtained when doing straight
character-by-character comparison.34

Because this behavior differs considerably from existing practice, gawk
only implements it when in POSIX mode (see Options)."

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Duplicate OPT_ entries in gcc/options.h

2016-06-08 Thread Dimitry Andric
On 08 Jun 2016, at 21:11, Gerald Pfeifer  wrote:
> 
> I got a user report, and could reproduce this, that building
> GCC (lang/gcc, but also current HEAD, so probably pretty much
> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get
> conflicting entires in $BUILDDIR/gcc/options.h such as
> 
>  OPT_d = 135,   /* -d */
>  OPT_D = 136,   /* -D */
>  OPT_d = 137,   /* -d */
>  OPT_D = 138,   /* -D */
>  OPT_d = 141,   /* -d */
>  OPT_D = 142,   /* -D */
>  OPT_d = 143,   /* -d */
> 
> Using LANG = en_US (without UTF-8), everything works fine.
> 
> Any ideas what might be going on here?  (This is done via
> AWK scripts from what I can tell, does this trigger any
> ideas?)

It is definitely something caused by our awk in base, in any case.
First opt-gather.awk is run to generate a flat list of all options:

  /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist

Then opt-functions.awk is run to process optionlist into options.h:

  /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f 
/usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h

If I run the first step using LANG=C, or without any LANG setting, both
optionlist and options.h are as expected.  If I run the first step using
LANG=en_US.UTF-8, the optionlist is sorted differently, for example the
"good" optionlist has the uppercase d options first, and much later the
lowercase d options:

  D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after 
%qs)^\-D[=]   Define a  with  as its value.  If just 
 is given,  is taken to be 1
  D^\Driver Joined Separate
  D^\Fortran Joined Separate
  ... much later in the file, after all options starting with an uppercase 
letter ...
  d^\C ObjC C++ ObjC++ Joined
  d^\Common Joined^\-d   Enable dumps from specific passes of the 
compiler
  d^\Fortran Joined
  d^\Java Separate SeparateAlias Alias(foutput-class-dir=)

The "bad" optionlist has the upper and lower case d options sorted
together:

  d^\C ObjC C++ ObjC++ Joined
  D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after 
%qs)^\-D[=]   Define a  with  as its value.  If just 
 is given,  is taken to be 1
  d^\Common Joined^\-d   Enable dumps from specific passes of the 
compiler
  D^\Driver Joined Separate
  defsym=^\Driver JoinedOrMissing
  defsym^\Driver Separate
  d^\Fortran Joined
  D^\Fortran Joined Separate
  d^\Java Separate SeparateAlias Alias(foutput-class-dir=)

Note that GNU awk does *not* produce a different optionlist file when
used with either LANG=C or LANG=en_US.UTF-8.

opt-gather.awk's sorting function looks like this:

  function sort(ARRAY, ELEMENTS)
  {
  for (i = 2; i <= ELEMENTS; ++i) {
  for (j = i; ARRAY[j-1] > ARRAY[j]; --j) {
  temp = ARRAY[j]
  ARRAY[j] = ARRAY[j-1]
  ARRAY[j-1] = temp
  }
  }
  return
  }

So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works
differently in our awk, depending on the LANG settings.  No idea when
that changed, though, if it changed at all...

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail