gnulib-tool.py: Follow gnulib-tool changes, part 29, 30.

2024-02-26 Thread Collin Funk
These two TODO entries were pretty simple. One of the ChangeLog
entries listed to a test case Bruno wrote that worked for testing both
of them.

[collin@debian gnulib]$ gnulib-tool.py --create-testdir --dir=testdir-all 
--single-configure
module parse-datetime2 depends on a module with an incompatible license: gettime
module parse-datetime2 depends on a module with an incompatible license: 
nstrftime
module parse-datetime2 depends on a module with an incompatible license: 
parse-datetime
module parse-datetime2 depends on a module with an incompatible license: time_rz
module parse-datetime2 depends on a module with an incompatible license: 
timespec
module parse-datetime2 depends on a module with an incompatible license: tzset
module ucs4-utf16 depends on a module with an incompatible license: 
unistr/u16-uctomb
module utf16-ucs4 depends on a module with an incompatible license: 
unistr/u16-mbtouc
module utf16-ucs4-unsafe depends on a module with an incompatible license: 
unistr/u16-mbtouc-unsafe

The first patch fixes the parse-datetime2 warnings in the same way
that gnulib-tool does (ready for the release of parse-datetime3). The
second fixes the libunistring ones. Let me know if I missed anything.

CollinFrom 3d99a44c67c1a7db5bfe210f46ede1eb0700e664 Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Mon, 26 Feb 2024 21:22:23 -0800
Subject: [PATCH 1/2] gnulib-tool.py: Follow gnulib-tool changes, part 29.

Follow gnulib-tool change
2021-03-06  Paul Eggert  
parse-datetime2: fix licensing

* pygnulib/GLModuleSystem.py (GLModule.getLicense): Handle the special
licensing case for parse-datetime2 or any other module starting with
"parse-datetime". Update comment.
---
 ChangeLog  | 10 ++
 gnulib-tool.py.TODO| 14 --
 pygnulib/GLModuleSystem.py |  4 ++--
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d399841e18..7692ac13bc 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2024-02-26  Collin Funk  
+
+	gnulib-tool.py: Follow gnulib-tool changes, part 29.
+	Follow gnulib-tool change
+	2021-03-06  Paul Eggert  
+	parse-datetime2: fix licensing
+	* pygnulib/GLModuleSystem.py (GLModule.getLicense): Handle the special
+	licensing case for parse-datetime2 or any other module starting with
+	"parse-datetime". Update comment.
+
 2024-02-26  Bruno Haible  
 
 	gnulib-tool.py: Add more comments.
diff --git a/gnulib-tool.py.TODO b/gnulib-tool.py.TODO
index c72746941e..7be0e243e7 100644
--- a/gnulib-tool.py.TODO
+++ b/gnulib-tool.py.TODO
@@ -605,20 +605,6 @@ Date:   Mon Apr 26 23:31:29 2021 -0700
 
 
 
-commit 487b9551b63ef936a6be6df38d1c9484cd97810c
-Author: Paul Eggert 
-Date:   Sat Mar 6 08:23:48 2021 -0800
-
-parse-datetime2: fix licensing
-
-Problem reported by Bruno Haible in:
-https://lists.gnu.org/r/bug-gnulib/2021-03/msg00017.html
-* gnulib-tool (func_get_license): Treat parse-datetime2
-(actually, anything starting with "parse-datetime")
-like parse-datetime, as far as licenses go.
-
-
-
 commit 0be855ee827bf7e9043eeb626c4fd847704be2e6
 Author: Bruno Haible 
 Date:   Tue Dec 29 02:48:31 2020 +0100
diff --git a/pygnulib/GLModuleSystem.py b/pygnulib/GLModuleSystem.py
index c6f0eb15c7..787614d38c 100644
--- a/pygnulib/GLModuleSystem.py
+++ b/pygnulib/GLModuleSystem.py
@@ -665,8 +665,8 @@ class GLModule(object):
 raise GLError(18, str(self))
 else:  # if not self.config['errors']
 sys.stderr.write('gnulib-tool: warning: module %s lacks a License\n' % str(self))
-if str(self) == 'parse-datetime':
-# This module is under a weaker license only for the purpose of some
+if str(self).startswith('parse-datetime'):
+# These modules are under a weaker license only for the purpose of some
 # users who hand-edit it and don't use gnulib-tool. For the regular
 # gnulib users they are under a stricter license.
 result = 'GPL'
-- 
2.39.2

From 9411cafcac1cec3dd6d7e8d6476df27070d02269 Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Mon, 26 Feb 2024 21:46:30 -0800
Subject: [PATCH 2/2] gnulib-tool.py: Follow gnulib-tool changes, part 30.

Follow gnulib-tool change
2021-05-30  Bruno Haible  
Write 'LGPLv3+ or GPLv2+' instead of 'LGPLv3+ or GPLv2'.

* pygnulib/GLImport.py (GLImport.prepare): Change.
* pygnulib/GLTestDir.py (GLTestDir.execute): Likewise.
---
 ChangeLog |  9 +
 gnulib-tool.py.TODO   | 11 ---
 pygnulib/GLImport.py  |  4 ++--
 pygnulib/GLTestDir.py | 10 +-
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7692ac13bc..fdf08ad934 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-02-26  Collin

Re: pycodestyle configuration

2024-02-26 Thread Collin Funk
Hi Bruno,

On 2/26/24 6:38 PM, Bruno Haible wrote:
> But now we have the desired configurations of pycodestyle and pylint
> in two places: in comments and in the config files. With the promise that
> they will get out of sync in the future. To avoid this, let me
> update the comments...

True. I thought about adding comments to the configuration files, but
your idea is probably better. Less chance for statements to disagree
with each other. Better to just tailor the files to those comments.

> I don't see a problem here, or anything to "clean up". It is on purpose
> that these fields are stored in different objects:
>   - in the cmdargs, to satisfy the requirements of the argparser,
>   - in a GLConfig object, for the tool's logic.
> For example, the way argparser works, 'mode_list', 'mode_find' etc.
> need to be different booleans in cmdargs; however in GLConfig, we don't
> want to obey a dictate from argparser; instead it should be a single
> field 'mode', with several possible values.

I see, it makes more sense with that explanation. Thanks. I saw your
email about the sorting of modules and GLImport. I see what you mean
and I'll work on that later.

It seems that gnulib-tool.py likes to mess with copyright headers on
source files copied from lib/. I'll probably look into that first
since it makes comparing the result of gnulib-tool and gnulib-tool.py
take more work. It seems unrelated to the licensing items in
gnulib-tool.py.TODO but those seem trivial so I might as well do them
too.

Thanks,
Collin



Re: pycodestyle configuration

2024-02-26 Thread Bruno Haible
Hi Collin,

> > You can now submit the two configuration files in the pygnulib/ directory.
> > That's the proper place, not the root directory of gnulib.
> 
> Done.

Thanks; I applied both patches.

But now we have the desired configurations of pycodestyle and pylint
in two places: in comments and in the config files. With the promise that
they will get out of sync in the future. To avoid this, let me
update the comments...

> I've confirmed that pylint recognizes .pylintrc so I've used
> that file name instead. One thing to note is that you must run pylint
> from the pygnulib/ subdirectory. Pycodestyle doesn't have this
> restriction. For example:
> 
>  # Works
>  [collin@debian pygnulib]$ pylint main.py
>  # Doesn't work, doesn't pick up configuration file.
>  [collin@debian pygnulib]$ cd ../
>  [collin@debian gnulib]$ pylint pygnulib/main.py

Interesting. Also worth documenting. Done through the patch below.

> That file should probably be
> cleaned up at a later date. For some reason all of the fields in the
> object returned by argparse are used before being assigned to local
> variables and then used again.

I don't see a problem here, or anything to "clean up". It is on purpose
that these fields are stored in different objects:
  - in the cmdargs, to satisfy the requirements of the argparser,
  - in a GLConfig object, for the tool's logic.
For example, the way argparser works, 'mode_list', 'mode_find' etc.
need to be different booleans in cmdargs; however in GLConfig, we don't
want to obey a dictate from argparser; instead it should be a single
field 'mode', with several possible values.

Bruno


2024-02-26  Bruno Haible  

gnulib-tool.py: Add more comments.
* pygnulib/main.py: Add comments regarding code style. Mention the
pycodestyle and pylint configurations.

diff --git a/pygnulib/main.py b/pygnulib/main.py
index cfa59c6562..443300c439 100644
--- a/pygnulib/main.py
+++ b/pygnulib/main.py
@@ -24,11 +24,28 @@
 # - Place line breaks to help reading the code:
 #   Avoid line breaks inside expressions, if they can be avoided.
 #   Do use line breaks to separate the parts of [ ... for ... ] iterations.
+# - String literals: Use 'single-quotes'.
+#   Rationale: 

+# - Comparison operators:
+#   Use == and != for variables which can contain only strings, numbers, lists,
+#   and None.
+#   Use 'is' and 'is not' for variables which can contain only class instances
+#   (e.g. GLModule instances) and None.
+#   Rationale: 

+
 # You can use this command to check the style:
-#   $ pycodestyle --max-line-length=136 
--ignore=E265,W503,E241,E711,E712,E201,E202,E221 pygnulib/*.py
+#   $ pycodestyle *.py
+# The pycodestyle configuration is found in the file setup.cfg.
+# Documentation:
+# 
 
 # You can use this command to check for mistakes:
-#   $ pylint 
--disable=C0103,C0114,C0121,C0209,C0301,C0302,R0902,R0912,R0913,R0914,R0915,R1705,R1702,R1720
 pygnulib/*.py
+#   $ pylint *.py
+# The pylint configuration is found in the file .pylintrc.
+# Note: This configuration is only effective when you invoke pylint from this
+# directory.
+# Documentation:
+# 

 
 
 
#===






Re: pycodestyle configuration

2024-02-26 Thread Collin Funk
Hi Bruno,

On 2/26/24 2:54 PM, Bruno Haible wrote:
> I tend to agree. Even after it is finished, it will still be useful
> to have all Python sources in one directory: it makes it easy to do
> a "grep something *.py".

True. I've already run into this a few times. :)

> Done:
> 
> 2024-02-26  Bruno Haible  
> 
>   gnulib-tool.py: Reorganize code.
>   * pygnulib/main.py: New file, moved here from gnulib-tool.py.
>   * pygnulib/constants.py: Change the way APP['name'] and DIRS['root'] are
>   computed.
>   * gnulib-tool.py: New file, based on gnulib-tool.
> 
> (The patch is best viewed using gitk.)

I took a brief look at the patch and everything looks good. I've run
the merge-emacs script and confirmed that it works just as before.

> You can now submit the two configuration files in the pygnulib/ directory.
> That's the proper place, not the root directory of gnulib.

Done. I've confirmed that pylint recognizes .pylintrc so I've used
that file name instead. One thing to note is that you must run pylint
from the pygnulib/ subdirectory. Pycodestyle doesn't have this
restriction. For example:

 # Works
 [collin@debian pygnulib]$ pylint main.py
 # Doesn't work, doesn't pick up configuration file.
 [collin@debian pygnulib]$ cd ../
 [collin@debian gnulib]$ pylint pygnulib/main.py

This isn't a problem for me but I figured I'd mention it in case
someone gets scared by a ton of warnings that we don't care about.

The second patch fixes an undefined variable access that was uncovered
by the typos of mine that you fixed.

   [collin@debian pygnulib]$ gnulib-tool.py --create-testdir --dir test dummy
   Traceback (most recent call last):
 File "/home/collin/.local/src/gnulib/pygnulib/main.py", line 1180, in 

   main()
 File "/home/collin/.local/src/gnulib/pygnulib/main.py", line 608, in main
   if mode != None and 'test' in mode and gnu_make:
   
UnboundLocalError: cannot access local variable 'gnu_make' where it is not 
associated with a value

That was my fault, sorry about that. That file should probably be
cleaned up at a later date. For some reason all of the fields in the
object returned by argparse are used before being assigned to local
variables and then used again.

Thanks,
CollinFrom 900ded124da7b8f5c267dd58b423f119ac1eda2a Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Mon, 26 Feb 2024 16:02:34 -0800
Subject: [PATCH 1/2] gnulib-tool.py: Add configuration files for Python tools.

* pygnulib/.pylintrc: New file, used by pylintrc.
* pygnulib/setup.cfg: New file, currently only used for pycodestyle
options.
---
 ChangeLog  |  7 +++
 pygnulib/.pylintrc |  8 
 pygnulib/setup.cfg | 10 ++
 3 files changed, 25 insertions(+)
 create mode 100644 pygnulib/.pylintrc
 create mode 100644 pygnulib/setup.cfg

diff --git a/ChangeLog b/ChangeLog
index af84986424..5b40d684ce 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2024-02-26  Collin Funk  
+
+	gnulib-tool.py: Add configuration files for Python tools.
+	* pygnulib/.pylintrc: New file, used by pylintrc.
+	* pygnulib/setup.cfg: New file, currently only used for pycodestyle
+	options.
+
 2024-02-26  Bruno Haible  
 
 	gnulib-tool.py: Reorganize code.
diff --git a/pygnulib/.pylintrc b/pygnulib/.pylintrc
new file mode 100644
index 00..7c4c1a338c
--- /dev/null
+++ b/pygnulib/.pylintrc
@@ -0,0 +1,8 @@
+# .pylintrc
+
+[MESSAGES CONTROL]
+disable=C0103,C0114,C0121,C0209,C0301,C0302,R0902,R0912,R0913,R0914,R0915,R1705,R1702,R1720
+
+# Local Variables:
+# mode: conf
+# End:
diff --git a/pygnulib/setup.cfg b/pygnulib/setup.cfg
new file mode 100644
index 00..5d4a17171a
--- /dev/null
+++ b/pygnulib/setup.cfg
@@ -0,0 +1,10 @@
+# setup.cfg
+
+[pycodestyle]
+ignore = E265,W503,E241,E711,E712,E201,E202,E221
+max-line-length = 136
+statistics = True
+
+# Local Variables:
+# mode: conf
+# End:
-- 
2.39.2

From c995d3334c9822ca9496025c0de674c0c041d183 Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Mon, 26 Feb 2024 16:13:46 -0800
Subject: [PATCH 2/2] gnulib-tool.py: Fix undefined variable access.

* pygnulib/main.py (main): Don't use gnu_make before it is defined.
---
 ChangeLog| 5 +
 pygnulib/main.py | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 5b40d684ce..035eda5c25 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2024-02-26  Collin Funk  
+
+	gnulib-tool.py: Fix undefined variable access.
+	* pygnulib/main.py (main): Don't use gnu_make before it is defined.
+
 2024-02-26  Collin Funk  
 
 	gnulib-tool.py: Add configuration files for Python tools.
diff --git a/pygnulib/main.py b/pygnulib/main.py
index 3d615ddbac..cfa59c6562 100644
--- a/pygnulib/main.py
+++ b/pygnulib/main.py
@@ -605,7 +605,7 @@ def main():
 if cmdargs.pobase == None and cmdargs.podomain != None:
 message = '%s: warning: --po-domain has no effect without a --po-base option\n

Re: pycodestyle configuration

2024-02-26 Thread Bruno Haible
Hi Collin,

> I think that it would be best to just shove all the Python mess in
> that subdirectory since it is a work-in-progress and is safe to be
> ignored by gnulib users.

I tend to agree. Even after it is finished, it will still be useful
to have all Python sources in one directory: it makes it easy to do
a "grep something *.py".

Done:

2024-02-26  Bruno Haible  

gnulib-tool.py: Reorganize code.
* pygnulib/main.py: New file, moved here from gnulib-tool.py.
* pygnulib/constants.py: Change the way APP['name'] and DIRS['root'] are
computed.
* gnulib-tool.py: New file, based on gnulib-tool.

(The patch is best viewed using gitk.)

You can now submit the two configuration files in the pygnulib/ directory.
That's the proper place, not the root directory of gnulib.

Bruno






Re: pycodestyle configuration

2024-02-26 Thread Collin Funk
Hi Bruno,

On 2/26/24 12:38 PM, Bruno Haible wrote:
> For pycodestyle, a file named '.pycodestyle' would be OK with me as well,
> if that works. If not, then how about
>   - submitting a feature request to the pycodestyle developers, so that
> '.pycodestyle' is accepted in addition to of 'pycodestyle' or 'setup.cfg'?
>   - or, alternatively, can we move the gnulib-tool.py into the pygnulib/
> directory, leaving only a small redirector gnulib-tool.py at the top
> level? Then we could have the 'pycodestyle' or 'setup.cfg' in the
> pygnulib/ directory.

I think that option 2 would be the best idea. It seems like all the
different Python tools could never agree on a way to configure things.
>From the pycodestyle documentation, per project configuration must be
placed in setup.cfg or tox.ini [1]. It seems that setup.cfg is a file
from setuptools (Python packaging tool) that has mostly superseded by
pyproject.toml [2]. Most tools seem to support this pyproject.toml
format, but it seems that the pycodestyle people are not fans [3]. The
tox.ini file is used to configure tox which is a test automation tool
that allows you to run packages with different virtual environments
[4]. Seems useful, but a strange place to store configurations.

It is my first time learning all this stuff so sorry if it is hard to
follow. It is for me as well...

I think that it would be best to just shove all the Python mess in
that subdirectory since it is a work-in-progress and is safe to be
ignored by gnulib users. As long as some gnulib-tool.py file is left
in the root directory, it should be easy enough to modify the
bootstrap script for testing.

[1] https://pycodestyle.pycqa.org/en/latest/intro.html#configuration
[2] https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
[3] https://github.com/PyCQA/pycodestyle/issues/813#issuecomment-953736162
[4] https://tox.wiki/en/4.13.0/

Collin



Re: gnulib-tool.py: Follow gnulib-tool changes, part 28.

2024-02-26 Thread Bruno Haible
Hi Collin,

> From what I could understand, it seems that upon seeing that the mode
> is "import", the specified_modules is set with the unsorted list of
> modules [1]. Then that variable is sorted before it is used [2]. Could
> you confirm that I am understanding this correctly?

Yes, I confirm. So, in the Python code, the corresponding place to do
the canonicalization (sorting and removing duplicates) would be in
GLImport.py line 258.

> If so, I feel that this patch should be the correct way to handle it.
> My initial ideas where to sort the modules in main() before the
> GLConfig object was created, or to modify the GLConfig class's methods
> which are used to set the modules in use.

Well, GLConfig applies to all modes (not just 'import', but also
'create-testdir' etc.). Since on the bash side, you found that the
sorting is specifically in the func_import(), the right place to do it
is in GLImport.py, not GLConfig.py.

The second part of your patch (removal of sorting from GLImport.actioncmd)
is good, but I guess you will want to have it combined with another patch
to GLImport.py.

Bruno






Re: pycodestyle configuration

2024-02-26 Thread Bruno Haible
Collin Funk wrote:
> Adding the two attached files to the root directory of gnulib should
> adjust the warnings to the agreed upon coding style (as far as
> pycodestyle and pylint are concerned). It works with Emacs and Eglot
> using pylsp. I assume other editors should work similarly.
> 
> Not sure if they are worth adding to the repository but I figured I
> would share them just incase anyone else wants to work on stuff in the
> future.
> 
> More information about these files here:
> https://pycodestyle.pycqa.org/en/latest/intro.html#configuration
> https://pylint.readthedocs.io/en/stable/user_guide/configuration/index.html#

It's a good idea to have these in the git repository, rather than to
have each developer install them privately.

For pylint, according to
https://pylint.pycqa.org/en/stable/user_guide/usage/run.html#command-line-options
it's OK to have the file .pylintrc in the gnulib root directory.
If this is true (i.e. if it works with the file name '.pylintrc' instead of
'pylintrc' in your environment), feel free to submit this file as a patch.
It's OK with me because the file name is descriptive enough and because it's
hidden.

For pycodestyle, a file named '.pycodestyle' would be OK with me as well,
if that works. If not, then how about
  - submitting a feature request to the pycodestyle developers, so that
'.pycodestyle' is accepted in addition to of 'pycodestyle' or 'setup.cfg'?
  - or, alternatively, can we move the gnulib-tool.py into the pygnulib/
directory, leaving only a small redirector gnulib-tool.py at the top
level? Then we could have the 'pycodestyle' or 'setup.cfg' in the
pygnulib/ directory.

Bruno






Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 18:06, Andreas Schwab wrote:

On Feb 26 2024, Pádraig Brady wrote:


https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938


It's still bad as it adds a hidden dependency that is invisible to rpm.


Right. In practice though since coreutils also links libcrypto
for cksum and the separate digest utilities, this should be OK.
In edge cases with a separate package per util, the dependency
can be added manually.


Also, the regexp in the sed command contains unquoted uses of '.' that
are supposed to be matched literally.


Fixed.

thanks,
Pádraig




Re: sort dynamic linking overhead

2024-02-26 Thread Paul Eggert

On 2024-02-26 06:12, Pádraig Brady wrote:

On 26/02/2024 06:44, Yann Collet wrote:
  * xxhash128 is not a cryptographic hash function, so it doesn't 
attempt tobe random.


Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.


It’s even tested with PractRand, and it could be used as a good random 
number generator.


Being non-cryptographic means that what it doesn’t try is to make sure 
no one can intentionally forge a hash collision from 2 different files 
(other than brute-forcing, which is impractical).


But that’s different, and I wouldn’t call this property “randomness”, 
even though randomness is a pre-requisite (but not sufficient in 
itself) to collision resistance.


Fair enough, I should have called it a different name since it's not 
really random. However, 'sort -R' does have problems when hashes have 
collisions, as it falls back on ordinary comparisons and thus ceases to 
be a "random" sort, so collision resistance is a good property to have 
if 'sort -R' is given adversarial input.




md5 shouldn't be considered as cryptographic anyway since it's broken.


Although MD5 is broken as a hash for a string, it's not clear to me that 
it's broken in the way that GNU 'sort -R' uses MD5. GNU 'sort -R' uses a 
random salt, does not report hashes to the user, and does not output 
anything until it's read all its input. It seems to me that breaking gnu 
'sort -R' would be harder than breaking HMAC-MD5, which itself is 
significantly harder than breaking MD5.


That being said, if MD5 is broken for GNU 'sort' then we should use a 
better hash.




Re: sort dynamic linking overhead

2024-02-26 Thread Andreas Schwab
On Feb 26 2024, Pádraig Brady wrote:

> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

It's still bad as it adds a hidden dependency that is invisible to rpm.

Also, the regexp in the sed command contains unquoted uses of '.' that
are supposed to be matched literally.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 17:39, Bruno Haible wrote:

Pádraig Brady wrote:

+  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);


That only works if libopenssl-devel is installed.


Good spot.
I'd already pushed a fix for this at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938


Does this work for all the various names of libcrypto in various distros?

Debian 12   libcrypto.so.3
Ubuntu 22.04libcrypto.so.1.1 libcrypto.so.3
Slackware 15libcrypto.so.1.1
openSUSE 15.5   libcrypto.so.1.1
CentOS Stream 9 libcrypto.so.3
Guix 1.4libcrypto.so.1.1
Alpine 3.19 libcrypto.so.3
FreeBSD 14.0libcrypto.so.38
NetBSD 9.3  libcrypto.so.14
OpenBSD 7.4 libcrypto.so.52.0



I only tested with libcrypto.so.3, but it should match all of the above.
It matches libcrypto.so.[.0-9]*

cheers,
Pádraig



Re: sort dynamic linking overhead

2024-02-26 Thread Bruno Haible
Pádraig Brady wrote:
> >> +  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);
> > 
> > That only works if libopenssl-devel is installed.
> 
> Good spot.
> I'd already pushed a fix for this at:
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

Does this work for all the various names of libcrypto in various distros?

Debian 12   libcrypto.so.3
Ubuntu 22.04libcrypto.so.1.1 libcrypto.so.3
Slackware 15libcrypto.so.1.1
openSUSE 15.5   libcrypto.so.1.1
CentOS Stream 9 libcrypto.so.3
Guix 1.4libcrypto.so.1.1
Alpine 3.19 libcrypto.so.3
FreeBSD 14.0libcrypto.so.38
NetBSD 9.3  libcrypto.so.14
OpenBSD 7.4 libcrypto.so.52.0

Bruno






Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 17:11, Andreas Schwab wrote:

On Feb 25 2024, Paul Eggert wrote:


+/* Dynamically link the crypto library, if it needs linking.  */
+static void
+link_libcrypto (void)
+{
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);


That only works if libopenssl-devel is installed.


Good spot.
I'd already pushed a fix for this at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=3bbdb3938

thanks,
Pádraig



Re: sort dynamic linking overhead

2024-02-26 Thread Andreas Schwab
On Feb 25 2024, Paul Eggert wrote:

> +/* Dynamically link the crypto library, if it needs linking.  */
> +static void
> +link_libcrypto (void)
> +{
> +#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
> +  void *handle = dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL);

That only works if libopenssl-devel is installed.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: sort dynamic linking overhead

2024-02-26 Thread Yann Collet
  *   xxhash128 is not a cryptographic hash function, so it doesn't attempt to 
be random.

Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.
It’s even tested with PractRand, and it could be used as a good random number 
generator.

Being non-cryptographic means that what it doesn’t try is to make sure no one 
can intentionally forge a hash collision from 2 different files (other than 
brute-forcing, which is impractical).
But that’s different, and I wouldn’t call this property “randomness”, even 
though randomness is a pre-requisite (but not sufficient in itself) to 
collision resistance.


From: Paul Eggert 
Date: Sunday, February 25, 2024 at 10:25 PM
To: Pádraig Brady , Bruno Haible , 
bug-gnulib@gnu.org , Coreutils 
Cc: Yann Collet 
Subject: Re: sort dynamic linking overhead
On 2023-10-09 06:48, Pádraig Brady wrote:

> An incremental patch attached to use xxhash128 (0.8.2)
> shows a good improvement (note avx2 being used on this cpu):

xxhash128 is not a cryptographic hash function, so it doesn't attempt to
be random. Of course most people won't care - it's random "enough" - but
it would be a functionality change.

blake2 is cryptographic and would be random, but would bloat the 'sort'
executable with code that's hardly ever used.

To attack the problem in a more conservative way, I installed the
attached patch into coreutils. With it, 'sort -R' continues to use MD5
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is
used (Bruno's suggestion). This doesn't significantly affect 'sort -R'
performance, and reduces the startup overhead of plain 'sort' to be what
it was before we started passing -lcrypto to gcc by default (in
coreutils 8.32).

I also toyed with changing MD5 to SHA512, but that hurt performance. For
what it's worth, although I tested with an Intel Xeon W-1350, which
supports SHA-NI as well as various AVX-512 options, I didn't see where
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes
advantage of these special-purpose instructions.


Re: sort dynamic linking overhead

2024-02-26 Thread Pádraig Brady

On 26/02/2024 06:44, Yann Collet wrote:

  * xxhash128 is not a cryptographic hash function, so it doesn't attempt tobe 
random.

Just a correction : xxh128 does try to be random. And quite hardly: a 
significant amount of development is spent on ensuring this property.

It’s even tested with PractRand, and it could be used as a good random number 
generator.

Being non-cryptographic means that what it doesn’t try is to make sure no one 
can intentionally forge a hash collision from 2 different files (other than 
brute-forcing, which is impractical).

But that’s different, and I wouldn’t call this property “randomness”, even 
though randomness is a pre-requisite (but not sufficient in itself) to 
collision resistance.


Right. I was looking at both md5 and xxhash128 having a 10 quality score in the 
SMHasher metric.
I even saw a comment from you Yann that xxhas128 may have slightly better 
dispersion than md5.
Also md5 shouldn't be considered as cryptographic anyway since it's broken.
I.e. I don't think users would need to be informed of this change if made.

Re Paul's committed patch, it's a good improvement, and does not add a new 
(xxhash) dependency.
Paul, should the configure check, be testing for the MD5 routines rather than 
SHA512?
Also an entry in the Improvements section of NEWS would be appropriate.

thanks!
Pádraig




Re: sort dynamic linking overhead

2024-02-26 Thread Paul Eggert

On 2023-10-09 06:48, Pádraig Brady wrote:


An incremental patch attached to use xxhash128 (0.8.2)
shows a good improvement (note avx2 being used on this cpu):


xxhash128 is not a cryptographic hash function, so it doesn't attempt to 
be random. Of course most people won't care - it's random "enough" - but 
it would be a functionality change.


blake2 is cryptographic and would be random, but would bloat the 'sort' 
executable with code that's hardly ever used.


To attack the problem in a more conservative way, I installed the 
attached patch into coreutils. With it, 'sort -R' continues to use MD5 
but on GNUish platforms 'sort' links libcrypto dynamically only if -R is 
used (Bruno's suggestion). This doesn't significantly affect 'sort -R' 
performance, and reduces the startup overhead of plain 'sort' to be what 
it was before we started passing -lcrypto to gcc by default (in 
coreutils 8.32).


I also toyed with changing MD5 to SHA512, but that hurt performance. For 
what it's worth, although I tested with an Intel Xeon W-1350, which 
supports SHA-NI as well as various AVX-512 options, I didn't see where 
libcrypto (at least on Ubuntu 23.10, which has OpenSSL 3.0.10) takes 
advantage of these special-purpose instructions.From 7f57ac2d20c144242953a8dc7d95b02df0244751 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 25 Feb 2024 17:13:12 -0800
Subject: [PATCH] sort: dynamically link -lcrypto if -R

This saves time in the usual case, which does not need -lcrypto.
* configure.ac (DLOPEN_LIBCRYPTO): New macro.
* src/sort.c [DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5]: New macros
MD5_Init, MD5_Update, MD5_Final.  Include "md5.h" after defining
them.  Include , and define new functions link_failure
and symbol_address.
(link_libcrypto): New function.
(random_md5_state_init): Call it before using crypto functions.
---
 configure.ac | 29 +
 src/sort.c   | 52 +++-
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c7eca1b8d..043081b90 100644
--- a/configure.ac
+++ b/configure.ac
@@ -351,6 +351,35 @@ if test $utils_cv_localtime_cache = yes; then
   AC_DEFINE([LOCALTIME_CACHE], [1], [FIXME])
 fi
 
+# Should 'sort' link libcrypto dynamically?
+AS_CASE([$LIB_CRYPTO],
+  [-lcrypto],
+[# Check for dlopen and libcrypto dynamic linking in one program,
+ # as there's little point to checking them separately.
+ AC_CACHE_CHECK([for dlopen and whether libcrypto is linked dynamically],
+   [utils_cv_dlopen_libcrypto],
+   [utils_cv_dlopen_libcrypto=no
+saved_LIBS=$LIBS
+LIBS="$LIBS $LIB_CRYPTO"
+AC_LINK_IFELSE(
+  [AC_LANG_PROGRAM(
+ [[#include 
+   #include 
+ ]],
+ [[return !(dlopen ("libcrypto.so", RTLD_LAZY | RTLD_GLOBAL)
+&& SHA512 (0, 0, 0));]])],
+  [# readelf works with cross-builds; ldd works on more platforms.
+   AS_CASE([`(readelf -d conftest$EXEEXT || ldd conftest$EXEEXT
+ ) 2>/dev/null`],
+ [*libcrypto*],
+   [utils_cv_dlopen_libcrypto=yes])])
+LIBS=$saved_LIBS])
+ AS_CASE([$utils_cv_dlopen_libcrypto],
+   [yes],
+ [AC_DEFINE([DLOPEN_LIBCRYPTO], [1],
+[Define to 1 if dlopen exists and libcrypto is
+ linked dynamically.])])])
+
 # macOS >= 10.12
 AC_CHECK_FUNCS([fclonefileat])
 
diff --git a/src/sort.c b/src/sort.c
index dea7be45d..cefe381bf 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -39,7 +39,6 @@
 #include "hash.h"
 #include "heap.h"
 #include "ignore-value.h"
-#include "md5.h"
 #include "mbswidth.h"
 #include "nproc.h"
 #include "physmem.h"
@@ -2085,6 +2084,56 @@ getmonth (char const *month, char **ea)
   return 0;
 }
 
+/* When using the OpenSSL implementation, dynamically link only if -R.
+   This saves startup time in the usual (sans -R) case.  */
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+/* In the typical case where md5.h does not #undef HAVE_OPENSSL_MD5,
+   trick md5.h into declaring and using pointers to functions not functions.
+   This causes the compiler's -lcrypto option to have no effect,
+   as sort.o no longer uses any crypto symbols statically.  */
+# define MD5_Init (*ptr_MD5_Init)
+# define MD5_Update (*ptr_MD5_Update)
+# define MD5_Final (*ptr_MD5_Final)
+#endif
+
+#include "md5.h"
+
+#if DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5
+# include 
+
+/* Diagnose a dynamic linking failure.  */
+static void
+link_failure (void)
+{
+  error (SORT_FAILURE, 0, "%s", dlerror ());
+}
+
+/* Return a function pointer in HANDLE for SYMBOL.  */
+static void *
+symbol_address (void *handle, char const *symbol)
+{
+  void *address = dlsym (handle, symbol);
+  if (!address)
+link_failure ();
+  return address;
+}
+#endif
+
+/* Dynamically link the crypto library, if it needs linking.  */
+static void
+link_libcrypto (void)
+{
+#if DLOPEN_LIBCRYP