[PATCH] mktemp: cut the output info when suffix contains XXX.

2024-02-26 Thread lvgenggeng
* src/mktemp.c: When --suffix was supply in cmdline, the template
points to the meraged buffer(dest_name). As X is counted from
template which not contains suffix, So the output should be cut.

Signed-off-by: lvgenggeng 
---
 src/mktemp.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/mktemp.c b/src/mktemp.c
index 5f8e5155d..d194aeff5 100644
--- a/src/mktemp.c
+++ b/src/mktemp.c
@@ -156,7 +156,7 @@ main (int argc, char **argv)
   int status = EXIT_SUCCESS;
   size_t x_count;
   size_t suffix_len;
-  char *dest_name;
+  char *dest_name = nullptr;
 
   initialize_main (, );
   set_program_name (argv[0]);
@@ -255,7 +255,12 @@ main (int argc, char **argv)
 }
   x_count = count_consecutive_X_s (template, suffix - template);
   if (x_count < 3)
-error (EXIT_FAILURE, 0, _("too few X's in template %s"), quote (template));
+{
+  /* when suffix was appended to template, template should be cut. */
+  if (template == dest_name)
+template[suffix - template] = '\0';
+  error (EXIT_FAILURE, 0, _("too few X's in template %s"), quote 
(template));
+}
 
   if (use_dest_dir)
 {
-- 
2.20.1




bug#69418: test failure when no french locale is installed

2024-02-26 Thread Bruno Haible
Testing the current coreutils git master:

On a Debian 12 system, in which I have not installed a French UTF-8 locale,
I see a test failure of tests/misc/join-utf8.

The essential lines from test-suite.log:

+ test set = set
+ LC_ALL=none
../tests/misc/join-utf8.sh: line 24: warning: setlocale: LC_ALL: cannot change 
locale (none): No such file or directory

The cause is that on such a system, LOCALE_FR_UTF8, as determined by
gnulib/m4/locale-fr.m4, is 'none', not empty or absent.

The attached patch fixes the failure.

>From 74be78b1efba477016c153e0eacc93a3c661e748 Mon Sep 17 00:00:00 2001
From: Bruno Haible 
Date: Mon, 26 Feb 2024 22:33:18 +0100
Subject: [PATCH] join: Avoid test failure on systems which have no French
 UTF-8 locale

* tests/misc/join-utf8.sh: Test the value of LOCALE_FR_UTF8 against
'none', not against a missing value.
---
 tests/misc/join-utf8.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/misc/join-utf8.sh b/tests/misc/join-utf8.sh
index 2dbe2bb81..9af9e55ce 100755
--- a/tests/misc/join-utf8.sh
+++ b/tests/misc/join-utf8.sh
@@ -19,7 +19,7 @@
 . "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
 print_ver_ join
 
-test "${LOCALE_FR_UTF8+set}" = set || skip_ "French UTF-8 locale not available"
+test "$LOCALE_FR_UTF8" != none || skip_ "French UTF-8 locale not available"
 
 LC_ALL=$LOCALE_FR_UTF8
 export LC_ALL
-- 
2.34.1



Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 18:06, Andreas Schwab wrote:

On Feb 26 2024, Pádraig Brady wrote:


https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938


It's still bad as it adds a hidden dependency that is invisible to rpm.


Right. In practice though since coreutils also links libcrypto
for cksum and the separate digest utilities, this should be OK.
In edge cases with a separate package per util, the dependency
can be added manually.


Also, the regexp in the sed command contains unquoted uses of '.' that
are supposed to be matched literally.


Fixed.

thanks,
Pádraig




Re: sort dynamic linking overhead

2024-02-26 Thread Paul Eggert

On 2024-02-26 06:12, Pádraig Brady wrote:

On 26/02/2024 06:44, Yann Collet wrote:
  * xxhash128 is not a cryptographic hash function, so it doesn't 
attempt tobe random.


Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.


It’s even tested with PractRand, and it could be used as a good random 
number generator.


Being non-cryptographic means that what it doesn’t try is to make sure 
no one can intentionally forge a hash collision from 2 different files 
(other than brute-forcing, which is impractical).


But that’s different, and I wouldn’t call this property “randomness”, 
even though randomness is a pre-requisite (but not sufficient in 
itself) to collision resistance.


Fair enough, I should have called it a different name since it's not 
really random. However, 'sort -R' does have problems when hashes have 
collisions, as it falls back on ordinary comparisons and thus ceases to 
be a "random" sort, so collision resistance is a good property to have 
if 'sort -R' is given adversarial input.




md5 shouldn't be considered as cryptographic anyway since it's broken.


Although MD5 is broken as a hash for a string, it's not clear to me that 
it's broken in the way that GNU 'sort -R' uses MD5. GNU 'sort -R' uses a 
random salt, does not report hashes to the user, and does not output 
anything until it's read all its input. It seems to me that breaking gnu 
'sort -R' would be harder than breaking HMAC-MD5, which itself is 
significantly harder than breaking MD5.


That being said, if MD5 is broken for GNU 'sort' then we should use a 
better hash.




Re: sort dynamic linking overhead

2024-02-26 Thread Andreas Schwab
On Feb 26 2024, Pádraig Brady wrote:

> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

It's still bad as it adds a hidden dependency that is invisible to rpm.

Also, the regexp in the sed command contains unquoted uses of '.' that
are supposed to be matched literally.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 17:39, Bruno Haible wrote:

Pádraig Brady wrote:

+  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);


That only works if libopenssl-devel is installed.


Good spot.
I'd already pushed a fix for this at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938


Does this work for all the various names of libcrypto in various distros?

Debian 12   libcrypto.so.3
Ubuntu 22.04libcrypto.so.1.1 libcrypto.so.3
Slackware 15libcrypto.so.1.1
openSUSE 15.5   libcrypto.so.1.1
CentOS Stream 9 libcrypto.so.3
Guix 1.4libcrypto.so.1.1
Alpine 3.19 libcrypto.so.3
FreeBSD 14.0libcrypto.so.38
NetBSD 9.3  libcrypto.so.14
OpenBSD 7.4 libcrypto.so.52.0



I only tested with libcrypto.so.3, but it should match all of the above.
It matches libcrypto.so.[.0-9]*

cheers,
Pádraig



Re: sort dynamic linking overhead

2024-02-26 Thread Bruno Haible
Pádraig Brady wrote:
> >> +  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
> > 
> > That only works if libopenssl-devel is installed.
> 
> Good spot.
> I'd already pushed a fix for this at:
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

Does this work for all the various names of libcrypto in various distros?

Debian 12   libcrypto.so.3
Ubuntu 22.04libcrypto.so.1.1 libcrypto.so.3
Slackware 15libcrypto.so.1.1
openSUSE 15.5   libcrypto.so.1.1
CentOS Stream 9 libcrypto.so.3
Guix 1.4libcrypto.so.1.1
Alpine 3.19 libcrypto.so.3
FreeBSD 14.0libcrypto.so.38
NetBSD 9.3  libcrypto.so.14
OpenBSD 7.4 libcrypto.so.52.0

Bruno






Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 17:11, Andreas Schwab wrote:

On Feb 25 2024, Paul Eggert wrote:


+/* Dynamically link the crypto library, if it needs linking.  */
+static void
+link_libcrypto (void)
+{
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);


That only works if libopenssl-devel is installed.


Good spot.
I'd already pushed a fix for this at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

thanks,
Pádraig



Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 06:25, Paul Eggert wrote:

On 2023-10-09 06:48, Pádraig Brady wrote:


An incremental patch attached to use xxhash128 (0.8.2)
shows a good improvement (note avx2 being used on this cpu):


xxhash128 is not a cryptographic hash function, so it doesn't attempt to
be random. Of course most people won't care - it's random "enough" - but
it would be a functionality change.

blake2 is cryptographic and would be random, but would bloat the 'sort'
executable with code that's hardly ever used.

To attack the problem in a more conservative way, I installed the
attached patch into coreutils. With it, 'sort -R' continues to use MD5
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
performance, and reduces the startup overhead of plain 'sort' to be what
it was before we started passing -lcrypto to gcc by default (in
coreutils 8.32).

I also toyed with changing MD5 to SHA512, but that hurt performance. For
what it's worth, although I tested with an Intel Xeon W-1350, which
supports SHA-NI as well as various AVX-512 options, I didn't see where
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes
advantage of these special-purpose instructions.


I noticed that the dlopen() used an unversioned lib,
which generally won't work as that's what's available
at build time, not at run time.
Attached is a fix.
Note that also includes a comment for why we
check for SHA512 and not MD5 (to avoid deprecation warnings).

Also attached is a fix to a sc_tight_scope failure.

cheers,
PádraigFrom 09fed593048f744f5a76e23e58b232fd8922bf2e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 26 Feb 2024 14:42:40 +
Subject: [PATCH 1/2] maint: avoid sc_tight_scope failure in sort.c

* cfg.mk: Exclude the ptr_MD5_* symbols added in
commit v9.4-130-g7f57ac2d2, as there is no way
to declare these static given they way they're defined.
---
 cfg.mk | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cfg.mk b/cfg.mk
index c0a131b2b..7da1a4b4a 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -927,6 +927,9 @@ _gl_TS_other_headers = $(srcdir)/src/*.h src/*.h
 # Normally, the rule would detect its declaration, but that uses a
 # different name, __clz_tab.
 _gl_TS_unmarked_extern_vars = factor_clz_tab
+# Avoid tight_scope rule stating these should be static
+# as there is no way to achieve that with the way these are defined.
+_gl_TS_unmarked_extern_vars += ptr_MD5_.*
 # Other tight_scope settings
 _gl_TS_dir = .
 _gl_TS_obj_files = src/*.$(OBJEXT)
-- 
2.43.0

From e8d92ab0aa57fa79afed52ecfebfb2d3a0f6c3ea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 26 Feb 2024 16:38:41 +
Subject: [PATCH 2/2] build: fix libcrypto version linked by sort at runtime

One should link the versioned lib at runtime,
and the unversioned lib at build time,
as the unversioned lib may not be installed,
and better couples the binary with the required version.

* configure.ac: Define LIBCRYPTO_SONAME, determined from
the test binary linked with -lcrypto.  Also document
why we use SHA512() in the check, rather than MD5().
* src/sort.c (link_libcrypto): Use the versioned lib in dlopen().
---
 configure.ac | 13 ++---
 src/sort.c   |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/configure.ac b/configure.ac
index 043081b90..eb6c33aa4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -365,12 +365,16 @@ AS_CASE([$LIB_CRYPTO],
   [AC_LANG_PROGRAM(
  [[#include 
#include 
+   /* Use SHA512 rather than MD5 here to avoid deprecation warnings.
+  So need to check HAVE_OPENSSL_MD5.. with DLOPEN_LIBCRYPTO. */
  ]],
  [[return !(dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL)
 && SHA512 (0, 0, 0));]])],
   [# readelf works with cross-builds; ldd works on more platforms.
-   AS_CASE([`(readelf -d conftest$EXEEXT || ldd conftest$EXEEXT
- ) 2>/dev/null`],
+   LIBCRYPTO_SONAME="`(readelf -d conftest$EXEEXT || ldd conftest$EXEEXT
+  ) 2>/dev/null |
+  sed -n 's/.*\(libcrypto.so.[[.0-9]]*\).*/\1/p'`"
+   AS_CASE([$LIBCRYPTO_SONAME],
  [*libcrypto*],
[utils_cv_dlopen_libcrypto=yes])])
 LIBS=$saved_LIBS])
@@ -378,7 +382,10 @@ AS_CASE([$LIB_CRYPTO],
[yes],
  [AC_DEFINE([DLOPEN_LIBCRYPTO], [1],
 [Define to 1 if dlopen exists and libcrypto is
- linked dynamically.])])])
+ linked dynamically.])
+  AC_DEFINE_UNQUOTED([LIBCRYPTO_SONAME], ["$LIBCRYPTO_SONAME"],
+ [versioned libcrypto])
+ ])])
 
 # macOS >= 10.12
 AC_CHECK_FUNCS([fclonefileat])
diff --git a/src/sort.c b/src/sort.c
index cefe381bf..2d8324ca4 100644
--- a/src/sort.c
+++ 

Re: sort dynamic linking overhead

2024-02-26 Thread Andreas Schwab
On Feb 25 2024, Paul Eggert wrote:

> +/* Dynamically link the crypto library, if it needs linking.  */
> +static void
> +link_libcrypto (void)
> +{
> +#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
> +  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);

That only works if libopenssl-devel is installed.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: sort dynamic linking overhead

2024-02-26 Thread Yann Collet via GNU coreutils General Discussion
  *   xxhash128 is not a cryptographic hash function, so it doesn't attempt to 
be random.

Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.
It’s even tested with PractRand, and it could be used as a good random number 
generator.

Being non-cryptographic means that what it doesn’t try is to make sure no one 
can intentionally forge a hash collision from 2 different files (other than 
brute-forcing, which is impractical).
But that’s different, and I wouldn’t call this property “randomness”, even 
though randomness is a pre-requisite (but not sufficient in itself) to 
collision resistance.


From: Paul Eggert 
Date: Sunday, February 25, 2024 at 10:25 PM
To: Pádraig Brady , Bruno Haible , 
bug-gnu...@gnu.org , Coreutils 
Cc: Yann Collet 
Subject: Re: sort dynamic linking overhead
On 2023-10-09 06:48, Pádraig Brady wrote:

> An incremental patch attached to use xxhash128 (0.8.2)
> shows a good improvement (note avx2 being used on this cpu):

xxhash128 is not a cryptographic hash function, so it doesn't attempt to
be random. Of course most people won't care - it's random "enough" - but
it would be a functionality change.

blake2 is cryptographic and would be random, but would bloat the 'sort'
executable with code that's hardly ever used.

To attack the problem in a more conservative way, I installed the
attached patch into coreutils. With it, 'sort -R' continues to use MD5
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
performance, and reduces the startup overhead of plain 'sort' to be what
it was before we started passing -lcrypto to gcc by default (in
coreutils 8.32).

I also toyed with changing MD5 to SHA512, but that hurt performance. For
what it's worth, although I tested with an Intel Xeon W-1350, which
supports SHA-NI as well as various AVX-512 options, I didn't see where
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes
advantage of these special-purpose instructions.


Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 06:44, Yann Collet wrote:

  * xxhash128 is not a cryptographic hash function, so it doesn't attempt tobe 
random.

Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.

It’s even tested with PractRand, and it could be used as a good random number 
generator.

Being non-cryptographic means that what it doesn’t try is to make sure no one 
can intentionally forge a hash collision from 2 different files (other than 
brute-forcing, which is impractical).

But that’s different, and I wouldn’t call this property “randomness”, even 
though randomness is a pre-requisite (but not sufficient in itself) to 
collision resistance.


Right. I was looking at both md5 and xxhash128 having a 10 quality score in the 
SMHasher metric.
I even saw a comment from you Yann that xxhas128 may have slightly better 
dispersion than md5.
Also md5 shouldn't be considered as cryptographic anyway since it's broken.
I.e. I don't think users would need to be informed of this change if made.

Re Paul's committed patch, it's a good improvement, and does not add a new 
(xxhash) dependency.
Paul, should the configure check, be testing for the MD5 routines rather than 
SHA512?
Also an entry in the Improvements section of NEWS would be appropriate.

thanks!
Pádraig




Re: sort dynamic linking overhead

2024-02-26 Thread Paul Eggert

On 2023-10-09 06:48, Pádraig Brady wrote:


An incremental patch attached to use xxhash128 (0.8.2)
shows a good improvement (note avx2 being used on this cpu):


xxhash128 is not a cryptographic hash function, so it doesn't attempt to 
be random. Of course most people won't care - it's random "enough" - but 
it would be a functionality change.


blake2 is cryptographic and would be random, but would bloat the 'sort' 
executable with code that's hardly ever used.


To attack the problem in a more conservative way, I installed the 
attached patch into coreutils. With it, 'sort -R' continues to use MD5 
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is 
used (Bruno's suggestion). This doesn't significantly affect 'sort -R' 
performance, and reduces the startup overhead of plain 'sort' to be what 
it was before we started passing -lcrypto to gcc by default (in 
coreutils 8.32).


I also toyed with changing MD5 to SHA512, but that hurt performance. For 
what it's worth, although I tested with an Intel Xeon W-1350, which 
supports SHA-NI as well as various AVX-512 options, I didn't see where 
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes 
advantage of these special-purpose instructions.From 7f57ac2d20c144242953a8dc7d95b02df0244751 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 25 Feb 2024 17:13:12 -0800
Subject: [PATCH] sort: dynamically link -lcrypto if -R

This saves time in the usual case, which does not need -lcrypto.
* configure.ac (DLOPEN_LIBCRYPTO): New macro.
* src/sort.c [DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5]: New macros
MD5_Init, MD5_Update, MD5_Final.  Include "md5.h" after defining
them.  Include , and define new functions link_failure
and symbol_address.
(link_libcrypto): New function.
(random_md5_state_init): Call it before using crypto functions.
---
 configure.ac | 29 +
 src/sort.c   | 52 +++-
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c7eca1b8d..043081b90 100644
--- a/configure.ac
+++ b/configure.ac
@@ -351,6 +351,35 @@ if test $utils_cv_localtime_cache = yes; then
   AC_DEFINE([LOCALTIME_CACHE], [1], [FIXME])
 fi
 
+# Should 'sort' link libcrypto dynamically?
+AS_CASE([$LIB_CRYPTO],
+  [-lcrypto],
+[# Check for dlopen and libcrypto dynamic linking in one program,
+ # as there's little point to checking them separately.
+ AC_CACHE_CHECK([for dlopen and whether libcrypto is linked dynamically],
+   [utils_cv_dlopen_libcrypto],
+   [utils_cv_dlopen_libcrypto=no
+saved_LIBS=$LIBS
+LIBS="$LIBS $LIB_CRYPTO"
+AC_LINK_IFELSE(
+  [AC_LANG_PROGRAM(
+ [[#include 
+   #include 
+ ]],
+ [[return !(dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL)
+&& SHA512 (0, 0, 0));]])],
+  [# readelf works with cross-builds; ldd works on more platforms.
+   AS_CASE([`(readelf -d conftest$EXEEXT || ldd conftest$EXEEXT
+ ) 2>/dev/null`],
+ [*libcrypto*],
+   [utils_cv_dlopen_libcrypto=yes])])
+LIBS=$saved_LIBS])
+ AS_CASE([$utils_cv_dlopen_libcrypto],
+   [yes],
+ [AC_DEFINE([DLOPEN_LIBCRYPTO], [1],
+[Define to 1 if dlopen exists and libcrypto is
+ linked dynamically.])])])
+
 # macOS >= 10.12
 AC_CHECK_FUNCS([fclonefileat])
 
diff --git a/src/sort.c b/src/sort.c
index dea7be45d..cefe381bf 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -39,7 +39,6 @@
 #include "hash.h"
 #include "heap.h"
 #include "ignore-value.h"
-#include "md5.h"
 #include "mbswidth.h"
 #include "nproc.h"
 #include "physmem.h"
@@ -2085,6 +2084,56 @@ getmonth (char const *month, char **ea)
   return 0;
 }
 
+/* When using the OpenSSL implementation, dynamically link only if -R.
+   This saves startup time in the usual (sans -R) case.  */
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+/* In the typical case where md5.h does not #undef HAVE_OPENSSL_MD5,
+   trick md5.h into declaring and using pointers to functions not functions.
+   This causes the compiler's -lcrypto option to have no effect,
+   as sort.o no longer uses any crypto symbols statically.  */
+# define MD5_Init (*ptr_MD5_Init)
+# define MD5_Update (*ptr_MD5_Update)
+# define MD5_Final (*ptr_MD5_Final)
+#endif
+
+#include "md5.h"
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+# include 
+
+/* Diagnose a dynamic linking failure.  */
+static void
+link_failure (void)
+{
+  error (SORT_FAILURE, 0, "%s", dlerror ());
+}
+
+/* Return a function pointer in HANDLE for SYMBOL.  */
+static void *
+symbol_address (void *handle, char const *symbol)
+{
+  void *address = dlsym (handle, symbol);
+  if (!address)
+link_failure ();
+  return address;
+}
+#endif
+
+/* Dynamically link the crypto library, if it needs linking.  */
+static void
+link_libcrypto (void)
+{
+#if