RE: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread Martin Dorey
Putting OP's reply on the record.


From: Rob Holbert [mailto:robholb...@gmail.com]
Sent: Tuesday, July 19, 2011 04:49
To: Martin Dorey
Subject: Re: $(sort) - what is lexical order? (was RE: Follow-up)

Wow,

Just putting your sources in order yourself will be much better and make more 
sense I guess. Calling external sort is unwieldy and quite ridiculous (what was 
that about Mandarin?). The built-in sort has no use since it does not actually 
sort. I guess you answered my question, make is a very poor sorter, and chooses 
to remain that way. Another tool that does almost what you need, maybe we 
should brand it Microsoft since it is so inflexible.

At least describe in the manual that it DOES NOT sort lexically. In fact, I 
am not quite sure what you would call it (useless?) I would definitely strive 
to make my sort function work like the native unix sort.  Do away with sort all 
together then since it doesn't work. Know you can't do it is much better than 
knowing you can almost do it.



On Mon, Jul 18, 2011 at 7:56 PM, Martin Dorey 
mdo...@bluearc.commailto:mdo...@bluearc.com wrote:
If I had check-in privs, I'd at least make this issue explicit in the 
documentation.  Even that, though, will require others on the list to be 
persuaded.

 Maybe $(alphabetize list)?

Creative, but the name doesn't really work for writing systems, like Mandarin 
written in hanzi, without an alphabet.

Did you realize that your makefile can delegate the sorting to sort(1)?  You 
wouldn't want Little Bobby Tables using this but who uses $(sort) on lists 
containing shell meta-characters like quotes?

martind@whitewater:~$ { echo 'strcoll = $(shell echo $(1) | fmt -w1 | sort)'; 
echo 'L:=$(call strcoll,B a)'; } | make -f - -p 21 | grep '^L '
L := a B
martind@whitewater:~$


From: Rob Holbert [mailto:robholb...@gmail.commailto:robholb...@gmail.com]
Sent: Sunday, July 17, 2011 10:13
To: Martin Dorey
Subject: Re: $(sort) - what is lexical order? (was RE: Follow-up)

I contend that the only useful purpose for the sort function is to alphabetize 
a list of items correctly. I realize that the out the box c strcmp function 
doesn't give us what we want exactly. The simple and obvious solution to sort 
the alphabet correctly in the ASCII world would be to put all strings in the 
same case prior to the comparison. Maybe $(alphabetize list)? Just don't see 
any real use for the quasi-sort that presently exists. Why would you want to 
almost alphabetize a list of files or words? It's like a tease. lol.




On Tue, Jul 12, 2011 at 5:17 PM, Martin Dorey 
mdo...@bluearc.commailto:mdo...@bluearc.com wrote:
OP has something of a point: contrast the locale-dependent behavior of sort(1) 
with make's $(sort):

$ echo 'L:=$(sort B a)' | make -f - -p 21 | grep '^L '
L := B a
$ { echo B; echo a; } | sort
a
B
$ { echo B; echo a; } | LC_ALL=C sort
B
a
$

I present this more to provoke we can't change that! and clarified 
documentation than as a serious suggestion:

Index: configure.inhttp://configure.in
===
RCS file: /sources/make/make/configure.inhttp://configure.in,v
retrieving revision 1.157
diff -u -r1.157 configure.inhttp://configure.in
--- configure.inhttp://configure.in29 Aug 2010 23:05:27 -   1.157
+++ configure.inhttp://configure.in 12 Jul 2011 21:11:28 -
@@ -166,6 +167,7 @@
 AC_CHECK_FUNCS(strcasecmp strncasecmp strcmpi strncmpi stricmp strnicmp)

 # strcoll() is used by the GNU glob library
+# and by $(sort)
 AC_FUNC_STRCOLL

 AC_FUNC_ALLOCA
Index: misc.c
===
RCS file: /sources/make/make/misc.c,v
retrieving revision 1.84
diff -u -r1.84 misc.c
--- misc.c  6 Nov 2010 21:56:24 -  1.84
+++ misc.c   12 Jul 2011 21:11:28 -
@@ -51,6 +51,10 @@
 # define VA_END(args)
 #endif

+#if !defined(HAVE_STRCOLL)
+# define strcoll strcmp
+#endif
+

 /* Compare strings *S1 and *S2.
Return negative if the first is less, positive if it is greater,
@@ -62,9 +66,7 @@
   const char *s1 = *((char **)v1);
   const char *s2 = *((char **)v2);

-  if (*s1 != *s2)
-return *s1 - *s2;
-  return strcmp (s1, s2);
+  return strcoll (s1, s2);
 }

 /* Discard each backslash-newline combination from LINE.
cvs diff: Diffing config
cvs diff: Diffing doc
Index: doc/make.texi
===
RCS file: /sources/make/make/doc/make.texi,v
retrieving revision 1.72
diff -u -r1.72 make.texi
--- doc/make.texi2 May 2011 15:11:23 - 1.72
+++ doc/make.texi 12 Jul 2011 21:11:28 -
@@ -6846,6 +6846,8 @@

 @noindent
 returns the value @samp{bar foo lose}.
+In a change from previous versions, make now sorts in locale-dependent order.
+Run with LC_ALL=C in the environment to select the previous behavior.

 @cindex removing duplicate words
 @cindex duplicate words

RE: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread Paul Smith
There is no standard definition of lexical order that I'm aware of
that means only, and exactly, sorted according to the current locale
collation definition.  The free dictionary defines it as:

the arrangement of a set of items in accordance with a recursive
algorithm, such as the entries in a dictionary whose order
depends on their first letter unless these are the same in which
case it is the second which decides, and so on

Which seems reasonable to me.


GNU make uses the standard C runtime function qsort(3) to perform its
sorting, with a comparison function of the standard C runtime function
strcmp().  The strcmp() function is defined by both the ISO C and POSIX
standards to perform byte-wise comparisons; that is, it uses ASCII
sorting order (on systems where the characters are stored as ASCII
chars, which is just about all of them these days).

The builtin sort function DOES sort.  It may not sort the way you would
prefer, but it sorts in a standard, repeatable, well-defined way that
does not change based on a particular user's environment settings...
most creators of build systems consider this reproducibility to be MUCH
more important than locale-specific collation, which is just visual
flourish.


I agree that the manual should document the fact that the sort function
does not sort according the current LC_COLLATE value but instead always
uses the standard ASCII (or LC_COLLATE=C) order.

But I will not say that it doesn't sort lexically, because that's not
true: it does.


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread David Boyce
On Tue, Jul 19, 2011 at 3:00 PM, Paul Smith psm...@gnu.org wrote:

 I agree that the manual should document the fact that the sort function
 does not sort according the current LC_COLLATE value but instead always
 uses the standard ASCII (or LC_COLLATE=C) order.

 But I will not say that it doesn't sort lexically, because that's not
 true: it does.

Agree completely, and add a note to the OP that the sort function has
an extremely important side effect of removing repeated words. Another
reason to keep it, if that was meant seriously.

Why not trivially s/lexical/ASCII/ on the affected line in the manual?
Lexical may be technically correct but ASCII is more precise.

David Boyce

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread Edward Welbourne
 GNU make uses the standard C runtime function qsort(3) to perform its
 sorting, with a comparison function of the standard C runtime function
 strcmp().
...
 The builtin sort function DOES sort.  It may not sort the way you would
 prefer, but it sorts in a standard, repeatable, well-defined way that
 does not change based on a particular user's environment settings...
 most creators of build systems consider this reproducibility to be MUCH
 more important than locale-specific collation, which is just visual
 flourish.

So the short story is: we *could* use strcoll() and have make adapt
itself to the locale preferences of users, but doing that would
probably lead to subtle ways that make files would work for their
authors and be broken for other users, which would be seriously
difficult to debug.  We would then need a way for make to let authors
specify, in a make-file, that it should use some particular locale
instead of whatever happens to please the user (so as to avoid such
bugs).  Every responsible make-file author would then exercise this
and chose POSIX, i.e. C, so we'd be back where we are today.  Ergo,
it's really not worth making such a change.

Eddy.

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


RE: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread Martin Dorey
 Why not trivially s/lexical/ASCII/ on the affected line in the manual?

Because that could mislead someone who uses non-ASCII characters?  How about:

Index: doc/make.texi
===
RCS file: /sources/make/make/doc/make.texi,v
retrieving revision 1.72
diff -u -r1.72 make.texi
--- doc/make.texi   2 May 2011 15:11:23 -   1.72
+++ doc/make.texi   19 Jul 2011 19:20:36 -
@@ -6846,6 +6846,8 @@
 
 @noindent
 returns the value @samp{bar foo lose}.
+Results are returned in the C locale's collation order,
+regardless of LC_COLLATE's value.
 
 @cindex removing duplicate words
 @cindex duplicate words, removing

-Original Message-
From: David Boyce [mailto:david.s.bo...@gmail.com] 
Sent: Tuesday, July 19, 2011 12:09
To: psm...@gnu.org
Cc: Martin Dorey; robholb...@gmail.com; Bug-make@gnu.org
Subject: Re: $(sort) - what is lexical order? (was RE: Follow-up)

On Tue, Jul 19, 2011 at 3:00 PM, Paul Smith psm...@gnu.org wrote:

 I agree that the manual should document the fact that the sort function
 does not sort according the current LC_COLLATE value but instead always
 uses the standard ASCII (or LC_COLLATE=C) order.

 But I will not say that it doesn't sort lexically, because that's not
 true: it does.

Agree completely, and add a note to the OP that the sort function has
an extremely important side effect of removing repeated words. Another
reason to keep it, if that was meant seriously.

Why not trivially s/lexical/ASCII/ on the affected line in the manual?
Lexical may be technically correct but ASCII is more precise.

David Boyce

___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


Re: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-19 Thread Paul Smith
Please don't reply only to me: discussions belong on the mailing lists.

On Tue, 2011-07-19 at 15:32 -0400, Rob Holbert wrote:
 The key in that definition is depends on their first letter, not
 the capitalization of their first letter. But in any event, if you
 don't have a clear definition for Lexical, then that's probably a big
 clue it should not be in the manual. Otherwise, the definition would
 have to mention case. So, using a sort function to provide an unsorted
 list does not sound reasonable to me. Especially when the UNIX command
 line does it right, which is quite odd because I thought the kernal
 was built with make and gcc. Hmmm. Well, maybe somebody ought to get
 coherent on what it does mean instead of reiterating the obvious.

I think there's some misunderstanding of how things really work shown by
this paragraph.  When I use sort on MY command line, it sorts the same
way that GNU make does:

~$ (echo aardvark; echo Zebra) | sort
Zebra
aardvark

This is exactly what I'd expect from my UNIX command line tools, because
I've configured my environment to use the built-in sorting order (which
was the only sorting order for much of the history of UNIX), ASCII.

You've configured your environment to use a different sorting order,
which is fine.  The ability to show directory listings, etc. using the
user's personal sorting order preferences is a great enhancement from
when UNIX was originally created.

But it's not at all obvious that having a tool like make follow the
user's wishes here is the right thing to do.

 I would not really expect to find Zebra before a bear in an index? The
 only use for a sorted list is to quickly identify the presence or
 absence of an item by quickly scanning the list. (i.e. Damn why is
 lcd.o not present in the build?) . If not for this, what useful
 purpose does the broken sort serve?

If you understand the sorting order used by make, it's no more or less
difficult to determine whether a file is missing than any other sort
order.  You simply have to look with the other lowercase filenames and
see if it's there.  The sort is not broken.  It's not random; that would
be silly.  It's just not what you're used to.

Consider another scenario: make uses your sorting preferences, and you
run a build and it fails.  You send the log to me and I'm looking at it,
but I use different sorting preferences.  Now I'm confused by your log
because the sorting order is different and I can't find things.  Or,
your build fails because of a subtlety related to the sorting order, but
I can't help you as things work fine for me because I use a different
order.

Having a completely deterministic sorting order is a good thing, for a
build system.  Any ordering which is modified by the user's environment
is by definition not deterministic.


___
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make


RE: $(sort) - what is lexical order? (was RE: Follow-up)

2011-07-18 Thread Martin Dorey
If I had check-in privs, I'd at least make this issue explicit in the 
documentation.  Even that, though, will require others on the list to be 
persuaded.

 Maybe $(alphabetize list)?

Creative, but the name doesn't really work for writing systems, like Mandarin 
written in hanzi, without an alphabet.

Did you realize that your makefile can delegate the sorting to sort(1)?  You 
wouldn't want Little Bobby Tables using this but who uses $(sort) on lists 
containing shell meta-characters like quotes?

martind@whitewater:~$ { echo 'strcoll = $(shell echo $(1) | fmt -w1 | sort)'; 
echo 'L:=$(call strcoll,B a)'; } | make -f - -p 21 | grep '^L '
L := a B
martind@whitewater:~$


From: Rob Holbert [mailto:robholb...@gmail.com]
Sent: Sunday, July 17, 2011 10:13
To: Martin Dorey
Subject: Re: $(sort) - what is lexical order? (was RE: Follow-up)

I contend that the only useful purpose for the sort function is to alphabetize 
a list of items correctly. I realize that the out the box c strcmp function 
doesn't give us what we want exactly. The simple and obvious solution to sort 
the alphabet correctly in the ASCII world would be to put all strings in the 
same case prior to the comparison. Maybe $(alphabetize list)? Just don't see 
any real use for the quasi-sort that presently exists. Why would you want to 
almost alphabetize a list of files or words? It's like a tease. lol.




On Tue, Jul 12, 2011 at 5:17 PM, Martin Dorey 
mdo...@bluearc.commailto:mdo...@bluearc.com wrote:
OP has something of a point: contrast the locale-dependent behavior of sort(1) 
with make's $(sort):

$ echo 'L:=$(sort B a)' | make -f - -p 21 | grep '^L '
L := B a
$ { echo B; echo a; } | sort
a
B
$ { echo B; echo a; } | LC_ALL=C sort
B
a
$

I present this more to provoke we can't change that! and clarified 
documentation than as a serious suggestion:

Index: configure.inhttp://configure.in
===
RCS file: /sources/make/make/configure.inhttp://configure.in,v
retrieving revision 1.157
diff -u -r1.157 configure.inhttp://configure.in
--- configure.inhttp://configure.in29 Aug 2010 23:05:27 -   1.157
+++ configure.inhttp://configure.in 12 Jul 2011 21:11:28 -
@@ -166,6 +167,7 @@
 AC_CHECK_FUNCS(strcasecmp strncasecmp strcmpi strncmpi stricmp strnicmp)

 # strcoll() is used by the GNU glob library
+# and by $(sort)
 AC_FUNC_STRCOLL

 AC_FUNC_ALLOCA
Index: misc.c
===
RCS file: /sources/make/make/misc.c,v
retrieving revision 1.84
diff -u -r1.84 misc.c
--- misc.c  6 Nov 2010 21:56:24 -  1.84
+++ misc.c   12 Jul 2011 21:11:28 -
@@ -51,6 +51,10 @@
 # define VA_END(args)
 #endif

+#if !defined(HAVE_STRCOLL)
+# define strcoll strcmp
+#endif
+

 /* Compare strings *S1 and *S2.
Return negative if the first is less, positive if it is greater,
@@ -62,9 +66,7 @@
   const char *s1 = *((char **)v1);
   const char *s2 = *((char **)v2);

-  if (*s1 != *s2)
-return *s1 - *s2;
-  return strcmp (s1, s2);
+  return strcoll (s1, s2);
 }

 /* Discard each backslash-newline combination from LINE.
cvs diff: Diffing config
cvs diff: Diffing doc
Index: doc/make.texi
===
RCS file: /sources/make/make/doc/make.texi,v
retrieving revision 1.72
diff -u -r1.72 make.texi
--- doc/make.texi2 May 2011 15:11:23 - 1.72
+++ doc/make.texi 12 Jul 2011 21:11:28 -
@@ -6846,6 +6846,8 @@

 @noindent
 returns the value @samp{bar foo lose}.
+In a change from previous versions, make now sorts in locale-dependent order.
+Run with LC_ALL=C in the environment to select the previous behavior.

 @cindex removing duplicate words
 @cindex duplicate words, removing


From: 
bug-make-bounces+mdorey=bluearc.comhttp://bluearc.com@gnu.orghttp://gnu.org 
[mailto:bug-make-bounces+mdoreymailto:bug-make-bounces%2Bmdorey=bluearc.comhttp://bluearc.com@gnu.orghttp://gnu.org]
 On Behalf Of Rob Holbert
Sent: Monday, July 11, 2011 12:24
To: bug-make@gnu.orgmailto:bug-make@gnu.org
Subject: Follow-up

Wanted to followup to my earlier email. Attached is the smallest makefile I 
could create to demonsterate the issue.

or

#does not sort lexically like expected
LIST = $(sort widget.c main.c ad.c Buzzer.c)
#target
all: list
list:
@echo $(LIST)
.PHONY: all list

Previous email:
Hello,

I ran across perhaps a bug or need for another feature at least. If a list of 
items has words beginning with both upper and lower case letters, the resulting 
$(sort $(LIST)) will result in all capital letter words coming before the lower 
case words. In this case, Zebra.c would appear before apple.c. This is dictated 
by the ASCII chart of course. However, it is not lexical order as the manual 
explains the function is. Lexical would be apple.c Zebra.c.

This is solved easily by making the sort