Re: Unused parameter warnings

2020-10-11 Thread Jeffrey Walton
On Sun, Oct 11, 2020 at 7:11 PM Paul Eggert  wrote:
>
> > -static size_t
> > +static size_t _GL_ATTRIBUTE_PURE
> >  gl_tree_size (gl_list_t list)
>
> Although changes like these don't break anything, I generally don't bother 
> doing
> them if only older GCCs warn about a missing attribute, under the theory that
> people stuck with older compilers can build with --disable-gcc-warnings and it
> lessens the maintenance burden if we don't bloat the code with obvious detail
> that a decent compiler should deduce automatically.

For this particular warning, I find it is easier to forgo the compiler
specific attributes and then use an UNUSED macro. The UNUSED macro
works everywhere with all compilers. It also does not pollute the
function signature.

#ifndef GNULIB_UNUSED
# define GNULIB_UNUSED(x) ((void)(x))
#endif

And then:

const char*
gl_locale_name_thread_unsafe (int category, const char *categoryname)
{
GNULIB_UNUSED(category);
...
}

Jeff



*printf: Avoid "expanded before it was required" warning

2020-10-11 Thread Bruno Haible
In GNU libtextstyle I'm seeing this warning:

configure.ac:71: warning: AC_REQUIRE: `gl_SNPRINTF_TRUNCATION_C99' was expanded 
before it was required
configure.ac:71: 
http://www.gnu.org/software/autoconf/manual/autoconf.html#Expanded-Before-Required
gnulib-m4/vasnprintf.m4:15: gl_REPLACE_VASNPRINTF is expanded from...
gnulib-m4/snprintf-posix.m4:7: gl_FUNC_SNPRINTF_POSIX is expanded from...
gnulib-m4/gnulib-comp.m4:222: lts_INIT is expanded from...
configure.ac:71: the top level

This patch fixes it.


2020-10-11  Bruno Haible  

*printf: Avoid "expanded before it was required" warning.
* m4/printf.m4 (gl_SNPRINTF_TRUNCATION_C99): Define through
AC_DEFUN_ONCE.

diff --git a/m4/printf.m4 b/m4/printf.m4
index c6a7ef6..eeb792b 100644
--- a/m4/printf.m4
+++ b/m4/printf.m4
@@ -1,4 +1,4 @@
-# printf.m4 serial 69
+# printf.m4 serial 70
 dnl Copyright (C) 2003, 2007-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -1198,7 +1198,7 @@ dnl Test whether the string produced by the snprintf 
function is always NUL
 dnl terminated. (ISO C99, POSIX:2001)
 dnl Result is gl_cv_func_snprintf_truncation_c99.
 
-AC_DEFUN([gl_SNPRINTF_TRUNCATION_C99],
+AC_DEFUN_ONCE([gl_SNPRINTF_TRUNCATION_C99],
 [
   AC_REQUIRE([AC_PROG_CC])
   AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles




Re: Unused parameter warnings

2020-10-11 Thread Paul Eggert

-static size_t
+static size_t _GL_ATTRIBUTE_PURE
 gl_tree_size (gl_list_t list)


Although changes like these don't break anything, I generally don't bother doing 
them if only older GCCs warn about a missing attribute, under the theory that 
people stuck with older compilers can build with --disable-gcc-warnings and it 
lessens the maintenance burden if we don't bloat the code with obvious detail 
that a decent compiler should deduce automatically.




Re: stack bounds

2020-10-11 Thread Paul Eggert

On 10/11/20 3:08 PM, Bruno Haible wrote:

But do you have an overview which targets are meant in the doc?


Unfortunately not. I expect it'd have to be determined from the comments in GCC, 
and it might also need info from various OSes and/or linkers.




Re: stack bounds

2020-10-11 Thread Bruno Haible
Paul Eggert wrote:
> That being said, it does look like a reliability win if we start using 
> -fstack-clash-protection on platforms like Fedora x86-64 that support it and 
> do 
> not enable it by default. Perhaps we should have a Gnulib or Autoconf macro 
> that 
> does that.

Yes, such a macro would be useful. To be used at the discretion of the package
maintainer (e.g. through a Gnulib module, similar to the 'largefile' module).

But do you have an overview which targets are meant in the doc?

  "old-style stack checking is also the fallback method for ‘specific’ if no
   target support has been added in the compiler."

  "Most targets do not fully support stack clash protection."

Bruno




hash, xhash: modernize

2020-10-11 Thread Bruno Haible
Hi,

It has been reported today that looking at the 'hash' module made Marc guess
incorrectly what is desired coding style and terminology in Gnulib.

1) regarding where to documented exported functions of a module
   
2) regarding C++ interoperability,
3) regarding terminology ("delete" vs. "remove")
   

Here are proposed patches to modernize the 'hash' and 'xhash' modules in
this respect.


Objections?

Bruno


2020-10-11  Bruno Haible  

hash: Rename hash_delete to hash_remove.
* lib/hash.h (hash_remove): Renamed from hash_delete.
(hash_delete): New declaration.
* lib/hash.c (hash_remove): Renamed from hash_delete.
(hash_delete): New function.
* tests/test-hash.c (main): Update.
* lib/fts-cycle.c (leave_dir): Likewise.
* NEWS: Mention the change.

2020-10-11  Bruno Haible  

hash, xhash: Make usable from C++.
* lib/hash.h: Add extern "C".

2020-10-11  Bruno Haible  

hash, xhash: Move comments to the .h file.
* lib/hash.c: Move comments meant for the user from here...
* lib/xhash.c: ... and here...
* lib/hash.h: ... to here.


>From 7159b249f613ec38fea1d29103b03de0e18834ad Mon Sep 17 00:00:00 2001
From: Bruno Haible 
Date: Sun, 11 Oct 2020 23:24:22 +0200
Subject: [PATCH 1/3] hash, xhash: Move comments to the .h file.

* lib/hash.c: Move comments meant for the user from here...
* lib/xhash.c: ... and here...
* lib/hash.h: ... to here.
---
 ChangeLog   |   7 ++
 lib/hash.c  | 125 ---
 lib/hash.h  | 239 ++--
 lib/xhash.c |   6 --
 4 files changed, 206 insertions(+), 171 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6a0e15d..83518c7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2020-10-11  Bruno Haible  
+
+	hash, xhash: Move comments to the .h file.
+	* lib/hash.c: Move comments meant for the user from here...
+	* lib/xhash.c: ... and here...
+	* lib/hash.h: ... to here.
+
 2020-10-11  Benji Wiebe  
 
 	getprogname: Add support for OpenServer 6 and UnixWare 7.
diff --git a/lib/hash.c b/lib/hash.c
index 7aaf106..6b7b76a 100644
--- a/lib/hash.c
+++ b/lib/hash.c
@@ -138,38 +138,24 @@ static const Hash_tuning default_tuning =
 
 /* Information and lookup.  */
 
-/* The following few functions provide information about the overall hash
-   table organization: the number of entries, number of buckets and maximum
-   length of buckets.  */
-
-/* Return the number of buckets in the hash table.  The table size, the total
-   number of buckets (used plus unused), or the maximum number of slots, are
-   the same quantity.  */
-
 size_t
 hash_get_n_buckets (const Hash_table *table)
 {
   return table->n_buckets;
 }
 
-/* Return the number of slots in use (non-empty buckets).  */
-
 size_t
 hash_get_n_buckets_used (const Hash_table *table)
 {
   return table->n_buckets_used;
 }
 
-/* Return the number of active entries.  */
-
 size_t
 hash_get_n_entries (const Hash_table *table)
 {
   return table->n_entries;
 }
 
-/* Return the length of the longest chain (bucket).  */
-
 size_t
 hash_get_max_bucket_length (const Hash_table *table)
 {
@@ -194,9 +180,6 @@ hash_get_max_bucket_length (const Hash_table *table)
   return max_bucket_length;
 }
 
-/* Do a mild validation of a hash table, by traversing it and checking two
-   statistics.  */
-
 bool
 hash_table_ok (const Hash_table *table)
 {
@@ -254,9 +237,6 @@ safe_hasher (const Hash_table *table, const void *key)
   return table->bucket + n;
 }
 
-/* If ENTRY matches an entry already in the hash table, return the
-   entry from the table.  Otherwise, return NULL.  */
-
 void *
 hash_lookup (const Hash_table *table, const void *entry)
 {
@@ -275,15 +255,6 @@ hash_lookup (const Hash_table *table, const void *entry)
 
 /* Walking.  */
 
-/* The functions in this page traverse the hash table and process the
-   contained entries.  For the traversal to work properly, the hash table
-   should not be resized nor modified while any particular entry is being
-   processed.  In particular, entries should not be added, and an entry
-   may be removed only if there is no shrink threshold and the entry being
-   removed has already been passed to hash_get_next.  */
-
-/* Return the first data in the table, or NULL if the table is empty.  */
-
 void *
 hash_get_first (const Hash_table *table)
 {
@@ -299,10 +270,6 @@ hash_get_first (const Hash_table *table)
   return bucket->data;
 }
 
-/* Return the user data for the entry following ENTRY, where ENTRY has been
-   returned by a previous call to either 'hash_get_first' or 'hash_get_next'.
-   Return NULL if there are no more entries.  */
-
 void *
 hash_get_next (const Hash_table *table, const void *entry)
 {
@@ -328,10 +295,6 @@ hash_get_next (const Hash_table *table, const void 

Re: _Atomic

2020-10-11 Thread Bruno Haible
Paul Eggert wrote:
> Oh, I was thinking the other way: a compiler that says it's C11 but doesn't 
> support _Atomic.

I haven't found such a compiler in my testing [1]. If one appears, we can add
an ad-hoc test or an Autoconf test.

Bruno

[1] https://lists.gnu.org/archive/html/bug-gnulib/2020-10/msg00068.html




Re: _Atomic

2020-10-11 Thread Paul Eggert

On 10/10/20 3:39 PM, Bruno Haible wrote:

GCC 4.9.x is such a compiler that is non-C11 but supports _Atomic.


Oh, I was thinking the other way: a compiler that says it's C11 but doesn't 
support _Atomic. I wrote my thoughts backwards, though.


If Gnulib code won't use _Atomic when compiled with GCC 4.9 (even though _Atomic 
would work), that doesn't sound like a big deal. For example, Debian 8 Jessie 
uses GCC 4.9 but long term support for Jessie ended in June and if Debian 
doesn't support it we don't need to worry about it.




Re: Port getprogname module to SCO OpenServer

2020-10-11 Thread Bruno Haible
Hi Benji,

> > It is better, but there are still (minor) things to tweak.
> Minor things tweaked.
> 
> I'm not sure why I cast getpid to int.
> 
> Including string.h did fix the strrchr warning.
> 
> Now using __SCO_SERVER__ || __sysv5__ for OpenServer6/UnixWare7 and SCO 
> cc/GCC detection.

Thanks.

> Also as far as the indentation goes, it is correct as far as I can tell. 

No, these lines were not indented right:
+   {
+ progname = buf;
+   }

I fixed the indentation and committed the patch in your name.

Bruno




Re: Draft #3 (with iterators)

2020-10-11 Thread Bruno Haible
Hi Marc,

> I have implemented everything we have discussed today. The patch
> versus the current master is attached so it can be reviewed.
> 
> The changes versus the previous draft can be summarized as follows:
> 
> * Bug fixes.
> * Use _Atomic on GCC 4.9+.
> * Implement a lightweight iterator type akin to the iterators of gl_list.
> * Expose element initialization to the user so than an element can be
> inserted in more than one hamt.
> * Rename delete to remove.
> * Improve documentation.

OK for me. Thanks for having listened to the many remarks.

Awesome work!!

Bruno




Re: another stdio patch for UnixWare

2020-10-11 Thread Tim Rice


Hi Bruno,

On Sun, 11 Oct 2020, Bruno Haible wrote:

> Tim Rice wrote:
> > ./gnulib-tool --create-testdir --dir=/var/tmp --with-tests \
> > --single-configure --avoid=havelib-tests fseterr freadable fwritable \
> > fbufmode freading fwriting freadptr freadseek freadahead fpurge fseeko
> > ftello fpending fflush 
> 
> Yes, this is the way to test these modules.
> 
> > and the attached patch, I am down to 2 failures in the stdio tests.
> 
> Hmm, there are three things I find suspicious:
> 
> 1) __freadable is defined in a way that is inconsistent with Gnulib:
>#define __freadable(p) (__freading(p) && __fpending(p))
>[in Gnulib, a stream that is not reading nor writing is considered
>to be readable]
>but you don't ensure that freadable doesn't get defined other than
>__freadable.
> 
> 2) Likewise for __fwritable, which is defined in a way inconsistent with 
> Gnulib:
>#define __fwritable(p) (__fwriting(p) && __fpending(p) < __fbufsiz(p))

Indeed. And after the post with the defines I found tests for
for __freadable and __fwritable were failing.
It looks like we just need to check for _IOREAD, _IORW, _IOWRT flags.
Sorry I didn't update the post with the defines. This is what I'm working
with now which does pass __freadable and __fwritable tests.

#define __freadable(p) ((void)sizeof(__filbuf(p)), ((FILE *)(p))->__flag & 
(_IOREAD | _IORW))
#define __fwritable(p) ((void)sizeof(__filbuf(p)), ((FILE *)(p))->__flag & 
(_IOWRT | _IORW))

> 3) In freadahead.c you add:
>return __fpending(fp_);
>But __fpending looks at the pending output, whereas freadahead shall look
>at the available input. Should be different things, no?

I had that change for so long I had to research why it was done that way.
Looking back at a 2009 an e-mail from one of the SCO engineers, I see
"... using the __fpending(), which in our libc.so.1 will work for
both input or output files."

> 
> Bruno
> 

-- 
Tim RiceMultitalents
t...@multitalents.net





Re: nap() name space clash

2020-10-11 Thread Bruno Haible
Hi Tim,

> I've prepared 2 patch options for your consideration.
> 
> The attached nap2gnulib_nap.patch renames nap to gnulib_nap and leaves
> the tests/nap.h name alone.
> 
> The attached nap2gnulib_nap_with_nap_h_rename.patch assumes a
> "git mv tests/nap.h tests/gnulib_nap.h"

It smells too much of a workaround. I would prefer
  (a) either a rename like in your second patch, with a symbol name
  that looks clean (e.g. 'short_sleep' instead of 'gnulib_nap'),
  (b) or a workaround that is limited in scope: few lines of code, and
  no effect on other platforms.

> > } > > In which .h file is this function declared on UnixWare?
> > } > 
> > } > unistd.h

Since the function defined by tests/nap.h is of storage class 'static',
there is no conflict at the linker level, and the conflict at the source
code level can be resolved by a simple '#define'.

So I'm picking the approach (b).


2020-10-11  Bruno Haible  

tests: Avoid a name clash on UnixWare.
Reported by Tim Rice  in
.
* tests/nap.h (nap): Define as gl_nap on OpenServer and UnixWare.

diff --git a/tests/nap.h b/tests/nap.h
index 5dd264f..c27e538 100644
--- a/tests/nap.h
+++ b/tests/nap.h
@@ -24,6 +24,13 @@
 
 # include 
 
+/* Avoid a conflict with a function called nap() on UnixWare.  */
+# if defined _SCO_DS || (defined __SCO_VERSION__ || defined __sysv5__)  /* 
OpenServer, UnixWare */
+#  include 
+#  undef nap
+#  define nap gl_nap
+# endif
+
 /* Name of the witness file.  */
 #define TEMPFILE BASE "nap.tmp"
 




Re: stack bounds

2020-10-11 Thread Paul Eggert

On 10/10/20 2:49 PM, Bruno Haible wrote:

gcc -fstack-clash-protection -m32 -O2 stackish.c fixes this issue.


Yes. However, the GCC manual says this about -fstack-clash-protection:

 Most targets do not fully support stack clash protection.  However,
 on those targets '-fstack-clash-protection' will protect dynamic
 stack allocations.  '-fstack-clash-protection' may also provide
 limited protection for static stack allocations if the target
 supports '-fstack-check=specific'.

which is not as close to "it should just work" as I'd like, especially when I go 
read the section on -fstack-check. I suppose I need to look at the output of gcc 
-S -fstack-clash-protection on my platform (and understand what the OS does) to 
know whether stack overflow is detected reliably.


That being said, it does look like a reliability win if we start using 
-fstack-clash-protection on platforms like Fedora x86-64 that support it and do 
not enable it by default. Perhaps we should have a Gnulib or Autoconf macro that 
does that.




Re: Draft #3 (with iterators)

2020-10-11 Thread Marc Nieper-Wißkirchen
I have implemented everything we have discussed today. The patch
versus the current master is attached so it can be reviewed.

The changes versus the previous draft can be summarized as follows:

* Bug fixes.
* Use _Atomic on GCC 4.9+.
* Implement a lightweight iterator type akin to the iterators of gl_list.
* Expose element initialization to the user so than an element can be
inserted in more than one hamt.
* Rename delete to remove.
* Improve documentation.

Future options for when the code has matured:

* Inline a number of subtrie procedures to get rid of forward
declarations to help compilers.
* Implement purely non-pure hamts to replace ordinary hash tables.
* Add _nx versions of the procedures that won't call xalloc_die.

Many thanks to Bruno for his support, guidance and his great suggestions.

Marc

Am So., 11. Okt. 2020 um 19:32 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
> :
> >
> > Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :
>
> [...]
>
> > > * hamt_lookup: If the caller is allowed to modify the payload stored in 
> > > the
> > >   returned entry, this function should return a 'Hamt_entry *', not a
> > >   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
> > >   not a 'const char *'.
> >
> > Unless the caller knows what it does, modifying the payload is not a
> > good idea because the entry is shared between different hamts. If the
> > caller really wants to modify the payload, it has to do an explicit
> > type cast (which is safe).
>
> I have noticed a problem with the current design: While an element can
> be in more than one hamt (because copies of hamts are created through
> various operations), an element cannot be actively inserted in more
> than one hamt. The reason is that the reference counter of the element
> is initialized whenever the element is inserted.
>
> The way out is to expose the initialization function to the user, who
> becomes responsible for initializing each element exactly once.
>
> As soon as it is possible to insert an element more than once, another
> observation will be made: The insert procedure does not accept a
> pointer to a const element because it must be able to change the
> reference counter internally. Thus it is more convenient if procedures
> like hamt_lookup do not return const versions.
From ece7b9e3cd090e9c084efd72677669130e80dd9c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc=20Nieper-Wi=C3=9Fkirchen?= 
Date: Sat, 10 Oct 2020 23:38:21 +0200
Subject: [PATCH] hamt: New module.

This module provides (persistent) hash array mapped tries.
* MODULES.html.sh: Add hamt.
* lib/hamt.c: New file.
* lib/hamt.h: New file.
* modules/hamt: New file.
* modules/hamt-tests: New file.
* tests/test-hamt.c: New file.
---
 ChangeLog  |   11 +
 MODULES.html.sh|1 +
 lib/hamt.c | 1083 
 lib/hamt.h |  253 +++
 modules/hamt   |   29 ++
 modules/hamt-tests |   11 +
 tests/test-hamt.c  |  378 
 7 files changed, 1766 insertions(+)
 create mode 100644 lib/hamt.c
 create mode 100644 lib/hamt.h
 create mode 100644 modules/hamt
 create mode 100644 modules/hamt-tests
 create mode 100644 tests/test-hamt.c

diff --git a/ChangeLog b/ChangeLog
index 22f79fb09..7263e628f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2020-10-11  Marc Nieper-Wißkirchen  
+
+	hamt: New module.
+	This module provides (persistent) hash array mapped tries.
+	* MODULES.html.sh: Add hamt.
+	* lib/hamt.c: New file.
+	* lib/hamt.h: New file.
+	* modules/hamt: New file.
+	* modules/hamt-tests: New file.
+	* tests/test-hamt.c: New file.
+
 2020-10-11  Bruno Haible  
 
 	stdioext: Update comments regarding UnixWare.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index 7e7cdae3e..2907eb741 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -2028,6 +2028,7 @@ func_all_modules ()
   func_module hash-pjw
   func_module hash-pjw-bare
   func_module hash
+  func_module hamt
   func_module readline
   func_module readtokens
   func_module readtokens0
diff --git a/lib/hamt.c b/lib/hamt.c
new file mode 100644
index 0..f92d9c4e8
--- /dev/null
+++ b/lib/hamt.c
@@ -0,0 +1,1083 @@
+/* (Persistent) hash array mapped tries.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see 

Re: patch: stdio fiex for UnixWare

2020-10-11 Thread Tim Rice


Hi Bruno,

On Sun, 11 Oct 2020, Bruno Haible wrote:

> I'm applying the patch below, to fix this. The main differences to yours are:
>   * No need to touch the modules freadahead, freadptr, fseterr, since the
> functions __freadahead, __freadptr, __fseterr don't exist on your 
> platform,
True as things stand today. I figured it would be more future proof
so add those now in case __freadahead, __freadptr, __fseterr are added
to UnixWare later.

> .
>   * When we modify a .m4 file, we also need to bump its serial number.

Sorry, I've seen enough patches come by on this list, I should have
remembered that. 

>   * It does not make the code clearer to remove the HAVE___FLBF and 
> HAVE___FPURGE
> tests in fbufmode.c and fpurge.c.

Fine with me.

Thank you for the commit.

-- 
Tim RiceMultitalents(707) 456-1146
t...@multitalents.net





Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
:
>
> Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :

[...]

> > * hamt_lookup: If the caller is allowed to modify the payload stored in the
> >   returned entry, this function should return a 'Hamt_entry *', not a
> >   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
> >   not a 'const char *'.
>
> Unless the caller knows what it does, modifying the payload is not a
> good idea because the entry is shared between different hamts. If the
> caller really wants to modify the payload, it has to do an explicit
> type cast (which is safe).

I have noticed a problem with the current design: While an element can
be in more than one hamt (because copies of hamts are created through
various operations), an element cannot be actively inserted in more
than one hamt. The reason is that the reference counter of the element
is initialized whenever the element is inserted.

The way out is to expose the initialization function to the user, who
becomes responsible for initializing each element exactly once.

As soon as it is possible to insert an element more than once, another
observation will be made: The insert procedure does not accept a
pointer to a const element because it must be able to change the
reference counter internally. Thus it is more convenient if procedures
like hamt_lookup do not return const versions.



Re: another stdio patch for UnixWare

2020-10-11 Thread Bruno Haible
Tim Rice wrote:
> ./gnulib-tool --create-testdir --dir=/var/tmp --with-tests \
> --single-configure --avoid=havelib-tests fseterr freadable fwritable \
> fbufmode freading fwriting freadptr freadseek freadahead fpurge fseeko
> ftello fpending fflush 

Yes, this is the way to test these modules.

> and the attached patch, I am down to 2 failures in the stdio tests.

Hmm, there are three things I find suspicious:

1) __freadable is defined in a way that is inconsistent with Gnulib:
   #define __freadable(p) (__freading(p) && __fpending(p))
   [in Gnulib, a stream that is not reading nor writing is considered
   to be readable]
   but you don't ensure that freadable doesn't get defined other than
   __freadable.

2) Likewise for __fwritable, which is defined in a way inconsistent with Gnulib:
   #define __fwritable(p) (__fwriting(p) && __fpending(p) < __fbufsiz(p))

3) In freadahead.c you add:
   return __fpending(fp_);
   But __fpending looks at the pending output, whereas freadahead shall look
   at the available input. Should be different things, no?

Bruno




Re: another stdio patch for UnixWare

2020-10-11 Thread Bruno Haible
Tim Rice wrote:
> * lib/fflush.c: Update comments for UnixWare
> * lib/fpending.c: Likewise.
> * lib/freadable.c: Likewise.
> * lib/freadptr.c: Likewise.
> * lib/freadseek.c: Likewise.
> * lib/fseterr.c: Likewise.
> * lib/fwritable.c: Likewise.
> * lib/fwriting.c: Likewise.

Regarding the comments, I prefer to keep UnixWare and OpenServer close
together, since UnixWare 7 seems to be quite similar to OpenServer 6.


2020-10-11  Bruno Haible  

stdioext: Update comments regarding UnixWare.
Reported by Tim Rice  in
.
* lib/fbufmode.c: Update comments.
* lib/fflush.c: Likewise.
* lib/fpending.c: Likewise.
* lib/fpurge.c: Likewise.
* lib/freadable.h: Likewise.
* lib/freadable.c: Likewise.
* lib/freadahead.c: Likewise.
* lib/freading.h: Likewise.
* lib/freading.c: Likewise.
* lib/freadptr.c: Likewise.
* lib/freadseek.c: Likewise.
* lib/fseeko.c: Likewise.
* lib/fseterr.c: Likewise.
* lib/fwritable.h: Likewise.
* lib/fwritable.c: Likewise.
* lib/fwriting.h: Likewise.
* lib/fwriting.c: Likewise.

diff --git a/lib/fbufmode.c b/lib/fbufmode.c
index a528a63..87fa3a6 100644
--- a/lib/fbufmode.c
+++ b/lib/fbufmode.c
@@ -56,7 +56,7 @@ fbufmode (FILE *fp)
   return fp->_flags & (_IOLBF | _IONBF | _IOFBF);
 #elif defined __minix   /* Minix */
   return fp->_flags & (_IOLBF | _IONBF | _IOFBF);
-#elif defined _IOERR/* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, mingw, MSVC, NonStop Kernel, OpenVMS */
+#elif defined _IOERR/* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, UnixWare, mingw, MSVC, NonStop Kernel, OpenVMS */
 # if defined WINDOWS_OPAQUE_FILE
   if (fp_->_flag & 0x100)
 return _IOFBF; /* Impossible to distinguish _IOFBF and _IOLBF.  */
diff --git a/lib/fflush.c b/lib/fflush.c
index b3a40e8..2161b22 100644
--- a/lib/fflush.c
+++ b/lib/fflush.c
@@ -64,7 +64,7 @@ clear_ungetc_buffer (FILE *fp)
   fp->_ungetc_count = 0;
   fp->_rcount = - fp->_rcount;
 }
-# elif defined _IOERR   /* Minix, AIX, HP-UX, IRIX, OSF/1, 
Solaris, OpenServer, mingw, MSVC, NonStop Kernel, OpenVMS */
+# elif defined _IOERR   /* Minix, AIX, HP-UX, IRIX, OSF/1, 
Solaris, OpenServer, UnixWare, mingw, MSVC, NonStop Kernel, OpenVMS */
   /* Nothing to do.  */
 # else  /* other implementations */
   fseeko (fp, 0, SEEK_CUR);
diff --git a/lib/fpending.c b/lib/fpending.c
index 4cc0ea7..836a3a9 100644
--- a/lib/fpending.c
+++ b/lib/fpending.c
@@ -25,7 +25,8 @@
 #include "stdio-impl.h"
 
 /* This file is not used on systems that already have the __fpending function,
-   namely glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android API >= 23.  */
+   namely glibc >= 2.2, Solaris >= 7, UnixWare >= 7.1.4.MP4, Cygwin >= 1.7.34,
+   Android API >= 23.  */
 
 /* Return the number of pending (aka buffered, unflushed)
bytes on the stream, FP, that is open for writing.  */
@@ -45,7 +46,7 @@ __fpending (FILE *fp)
   return fp->_ptr - fp->_buffer;
 #elif defined __minix/* Minix */
   return fp_->_ptr - fp_->_buf;
-#elif defined _IOERR /* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, mingw, MSVC, NonStop Kernel, OpenVMS */
+#elif defined _IOERR /* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, UnixWare, mingw, MSVC, NonStop Kernel, OpenVMS */
   return (fp_->_ptr ? fp_->_ptr - fp_->_base : 0);
 #elif defined __UCLIBC__ /* uClibc */
   return (fp->__modeflags & __FLAG_WRITING ? fp->__bufpos - fp->__bufstart : 
0);
diff --git a/lib/fpurge.c b/lib/fpurge.c
index 9469ce4..9df90de 100644
--- a/lib/fpurge.c
+++ b/lib/fpurge.c
@@ -19,7 +19,7 @@
 /* Specification.  */
 #include 
 
-#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23, musl libc */
+#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
UnixWare >= 7.1.4.MP4, Cygwin >= 1.7.10, Android API >= 23, musl libc */
 # if HAVE_STDIO_EXT_H
 #  include 
 # endif
@@ -31,7 +31,7 @@
 int
 fpurge (FILE *fp)
 {
-#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23, musl libc */
+#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
UnixWare >= 7.1.4.MP4, Cygwin >= 1.7.10, Android API >= 23, musl libc */
 
   __fpurge (fp);
   /* The __fpurge function does not have a return value.  */
@@ -101,7 +101,7 @@ fpurge (FILE *fp)
   if (fp->_ptr != NULL)
 fp->_count = 0;
   return 0;
-# elif defined _IOERR   /* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, mingw, MSVC, NonStop Kernel, OpenVMS */
+# elif defined _IOERR   /* AIX, HP-UX, IRIX, OSF/1, Solaris, 
OpenServer, UnixWare, mingw, 

Re: another stdio patch for UnixWare

2020-10-11 Thread Bruno Haible
Tim Rice wrote:

> 2020-09-30  Tim Rice  
> * lib/stdio-impl.h: Add support for UnixWare

I prefer to use the ifdefs that we found a couple of days ago.

What did you mean by "best not to use __base on these platforms",
when 4 lines below, the code does
  #  define _base __base


2020-10-11  Bruno Haible  

stdioext: Treat OpenServer 6 and UnixWare 7 like OpenServer 5.
Reported Tim Rice  in
.
Uses the info from
.
* lib/stdio-impl.h: Test also __SCO_VERSION__ and __sysv5__.

diff --git a/lib/stdio-impl.h b/lib/stdio-impl.h
index 067b95e..15066aa 100644
--- a/lib/stdio-impl.h
+++ b/lib/stdio-impl.h
@@ -175,7 +175,7 @@
 #  define fp_ fp
 # endif
 
-# if defined _SCO_DS/* OpenServer */
+# if defined _SCO_DS || (defined __SCO_VERSION__ || defined __sysv5__)  /* 
OpenServer 5, OpenServer 6, UnixWare 7 */
 #  define _cnt __cnt
 #  define _ptr __ptr
 #  define _base __base




Re: terminology

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 16:14 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > I have attached an improved version of the HAMT module to this email.
>
> How about terminology: "delete" vs. "remove"?

> In this sense, 'hamt_delete' is triggering the wrong associations.
> How about renaming 'hamt_remove'?
>
> Deleting an entry from a hash table or HAMT does not always mean to
> delete the object that the entry references.
>
> The Java collections [1], C# collections [2], Python collections [3]
> all use the verb "remove".

I'm fine with this change and agree with your reasoning. I used the
hash module (see your comment below) as a guide for the interface,
that's why I called the procedure hamt_delete in the first place.

> Yes, we still have hash_delete (in module 'hash') and 'argz_delete' (in
> module 'argz'); these are very old modules.

Marc



Re: terminology

2020-10-11 Thread Bruno Haible
Hi Marc,

> I have attached an improved version of the HAMT module to this email.

How about terminology: "delete" vs. "remove"?

In C++ the verb "delete" is more or less the same as "free": It means
"deallocate" and "free memory".

Likewise in some C APIs, e.g. pthread_key_delete.

In this sense, 'hamt_delete' is triggering the wrong associations.
How about renaming 'hamt_remove'?

Deleting an entry from a hash table or HAMT does not always mean to
delete the object that the entry references.

The Java collections [1], C# collections [2], Python collections [3]
all use the verb "remove".

Yes, we still have hash_delete (in module 'hash') and 'argz_delete' (in
module 'argz'); these are very old modules.

Bruno

[1] https://docs.oracle.com/javase/7/docs/api/java/util/Set.html
[2] 
https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1
[3] https://docs.python.org/3/library/stdtypes.html#set






Re: HAMT for gl_set and gl_map

2020-10-11 Thread Bruno Haible
Hi Marc,

> > How would a HAMT implementation look like that does not support persistence,
> > but is instead optimized for destructive updates? Probably the reference
> > counters would go away. Anything else?
> 
> The reference counter would go away as would all tests for "shared" in the 
> code.
> 
> Without a reference counter, an element can be any "void *" (instead
> of a pointer to a struct starting with Hamt_entry), which is a
> substantial change.

Thanks for explaining.

> We can add a number of #if's to the code so that it either compiles to
> an implementation of persistent or of transient hamts, but I wouldn't
> rather do it right now because it would make the code harder to read.
> But when the hamt code has become stable (and has been put to use and,
> therefore, to real-world testing), this sounds like a feasible option.

I agree that stabilizing the current code should come first.

Then, whether the adaptations will be doable with a small set of #ifs
or whether it will be best to copy the code, will remain to be seen.
I mean, there are different red-black tree implementations in Gnulib
and in the Linux kernel, and it does not cause maintenance problems
because algorithmic code only very rarely needs updates.

Bruno




Re: HAMT iterators

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> For a bucket, in the worst case, we need the full size_t range for
> position

Right. I missed that. When the hash function is very bad (maps everything
to a single hash code), the bucket is very large.

Bruno




Re: HAMT iterator

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> >   - If someone creates a derivative of the HAMT, the iterator won't
> > be affected, right? ("persistence")
> 
> Yes.

OK, that's one thing to document.

> > So, what is the scenario where increasing the reference count will make
> > a difference?
> 
> If you have a language with GC like Lisp or Scheme, say, the hamt may
> be GC'd while the iterator is still live.

And in C, when someone calls hamt_free on the HAMT, the iterator won't be
affected. IMO, that's another thing to document (because it's unexpected).

Bruno




Re: patch: stdio fiex for UnixWare

2020-10-11 Thread Bruno Haible
Tim Rice wrote in
:
> 
> I've attached a patch to address the fact that UnixWare does not have
> stdio_ext.h but has some __X() stdio helper functions.
> 
> Wrap "#include " in HAVE_STDIO_EXT_H insead of HAVE___X

I'm applying the patch below, to fix this. The main differences to yours are:
  * No need to touch the modules freadahead, freadptr, fseterr, since the
functions __freadahead, __freadptr, __fseterr don't exist on your platform,
as you reported in
.
  * When we modify a .m4 file, we also need to bump its serial number.
  * It does not make the code clearer to remove the HAVE___FLBF and 
HAVE___FPURGE
tests in fbufmode.c and fpurge.c.


2020-10-11  Bruno Haible  

stdioext: Avoid compilation errors on UnixWare 7.
Reported by Tim Rice  in
.
* lib/fbufmode.c: Don't include  if it does not exist.
* lib/fpurge.c: Likewise.
* lib/freadable.h: Likewise.
* lib/freading.h: Likewise.
* lib/fwritable.h: Likewise.
* lib/fwriting.h: Likewise.
* m4/fbufmode.m4 (gl_FUNC_FBUFMODE): Test whether  exists.
* m4/fpurge.m4 (gl_FUNC_FPURGE): Likewise.
* m4/freadable.m4 (gl_FUNC_FREADABLE): Likewise.
* m4/freading.m4 (gl_FUNC_FREADING): Likewise.
* m4/fwritable.m4 (gl_FUNC_FWRITABLE): Likewise.
* m4/fwriting.m4 (gl_FUNC_FWRITING): Likewise.

diff --git a/lib/fbufmode.c b/lib/fbufmode.c
index 65a5cfc..a528a63 100644
--- a/lib/fbufmode.c
+++ b/lib/fbufmode.c
@@ -20,7 +20,9 @@
 #include "fbufmode.h"
 
 #if HAVE___FLBF
-# include 
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 #endif
 
 #include "stdio-impl.h"
diff --git a/lib/fpurge.c b/lib/fpurge.c
index fc88646..9469ce4 100644
--- a/lib/fpurge.c
+++ b/lib/fpurge.c
@@ -19,8 +19,10 @@
 /* Specification.  */
 #include 
 
-#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23 */
-# include 
+#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23, musl libc */
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 #endif
 #include 
 
diff --git a/lib/freadable.h b/lib/freadable.h
index f05e5fb..9139865 100644
--- a/lib/freadable.h
+++ b/lib/freadable.h
@@ -24,7 +24,9 @@
 
 #if HAVE___FREADABLE /* glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android 
API >= 23, musl libc */
 
-# include 
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 # define freadable(stream) (__freadable (stream) != 0)
 
 #else
diff --git a/lib/freading.h b/lib/freading.h
index 1891f5a..7bef18e 100644
--- a/lib/freading.h
+++ b/lib/freading.h
@@ -35,7 +35,9 @@
 #if HAVE___FREADING && (!defined __GLIBC__ || defined __UCLIBC__ || __GLIBC__ 
> 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 7))
 /* Solaris >= 7, Cygwin >= 1.7.34, Android API >= 29, not glibc >= 2.2, but 
glibc >= 2.7, or musl libc  */
 
-# include 
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 # define freading(stream) (__freading (stream) != 0)
 
 #else
diff --git a/lib/fwritable.h b/lib/fwritable.h
index 509e069..fb9c871 100644
--- a/lib/fwritable.h
+++ b/lib/fwritable.h
@@ -24,7 +24,9 @@
 
 #if HAVE___FWRITABLE /* glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android 
API >= 23, musl libc */
 
-# include 
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 # define fwritable(stream) (__fwritable (stream) != 0)
 
 #else
diff --git a/lib/fwriting.h b/lib/fwriting.h
index 2d16f42..3aa5d7f 100644
--- a/lib/fwriting.h
+++ b/lib/fwriting.h
@@ -35,7 +35,9 @@
 
 #if HAVE___FWRITING /* glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android 
API >= 29, musl libc */
 
-# include 
+# if HAVE_STDIO_EXT_H
+#  include 
+# endif
 # define fwriting(stream) (__fwriting (stream) != 0)
 
 #else
diff --git a/m4/fbufmode.m4 b/m4/fbufmode.m4
index 87e76e0..e2a146f 100644
--- a/m4/fbufmode.m4
+++ b/m4/fbufmode.m4
@@ -1,4 +1,4 @@
-# fbufmode.m4 serial 2
+# fbufmode.m4 serial 3
 dnl Copyright (C) 2007, 2009-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -7,5 +7,6 @@ dnl with or without modifications, as long as this notice is 
preserved.
 AC_DEFUN([gl_FUNC_FBUFMODE],
 [
   dnl Prerequisites of lib/fbufmode.c.
+  AC_CHECK_HEADERS_ONCE([stdio_ext.h])
   AC_CHECK_FUNCS_ONCE([__flbf __fbufsize])
 ])
diff --git a/m4/fpurge.m4 b/m4/fpurge.m4
index 0796a6f..9a486e0 100644
--- a/m4/fpurge.m4
+++ b/m4/fpurge.m4
@@ -1,4 +1,4 @@
-# fpurge.m4 serial 11
+# fpurge.m4 serial 12
 dnl Copyright (C) 2007, 2009-2020 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -8,6 +8,7 @@ AC_DEFUN([gl_FUNC_FPURGE],
 [
   

Re: HAMT iterators

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 12:29 Uhr schrieb Bruno Haible :

> The recursion across hamt_do_while / entry_do_while / subtrie_do_while /
> bucket_do_while builds up a stack whose essential contents is a set of
> indices, each at most 5 bits large, with a total of SIZE_WIDTH bits.
> In other words, the iterator object can be a

Yes, this is roughly how the implementation I worked on this morning
works. However, I haven't yet implemented the 5-bit compression.

> struct
>   {
> const Hamt *hamt;
> size_t position;
> int depth;
> const Hamt_entry *entry;
>   };
>
> where during the iteration
>   - hamt stays unmodified,
>   - position is incremented by some amount at each round,
>   - depth is the recursion depth.
>   - entry is a cache of hamt->nodes[(position >> 59) & 31][(position >> 54) & 
> 31]...
> with depth-1 indexing operations, and is changed at each round,
>
> HASH-TABLE-ITERATOR sets position, depth, entry to point to the first entry.
> HASH-TABLE-ITERATE works as follows:
>   - in a bucket, it increments position and fetches the next entry of the 
> bucket,

For a bucket, in the worst case, we need the full size_t range for
position, so we have to store the path in the tree that leads to the
bucket in another size_t variable.

>   - if the bucket is done, it decreases depth and goes to
> hamt->nodes[(position >> 59) & 31][(position >> 54) & 31]... (with depth-1
> indexing operations) until it finds a subtrie that is not yet done,

This means a lot of pointer operations if large subtries are processed
that are deep in the trie. The hamt_do_while operation stores that
chain of entries from the root implicitly on the stack. The same
should be done for the iterator, meaning that an array of MAX_DEPTH
entries has to be cached. On a 32 bit system, this means 28 bytes,
and, on a 64 bit system, 130 bytes.

> then it increases depth, entry to point to the first entry in that 
> subtrie.
>
> If done this way, the iterator will fit into the gl_set_iterator_t and
> gl_map_iterator_t that we already have in gl_set.h and gl_map.h.
>
> Bruno



Re: lib/fwriting.c

2020-10-11 Thread Bruno Haible
Tim Rice wrote:
> On line 24 of lib/fwriting.c we see
> /* This file is not used on systems that have the __fwritable function,
> 
> Should that not be __fwriting instead of __fwritable ?

Indeed. Thanks for the report. Fixed as part of this larger commit:


2020-10-11  Bruno Haible  

stdioext: Update comments regarding Cygwin.
* lib/fpending.c: Update comments.
* lib/fpurge.c: Likewise.
* lib/freadable.h: Likewise.
* lib/freadable.c: Likewise.
* lib/freading.h: Likewise.
* lib/freading.c: Likewise.
* lib/fwritable.h: Likewise.
* lib/fwritable.c: Likewise.
* lib/fwriting.h: Likewise.
* lib/fwriting.c: Likewise.

diff --git a/lib/fpending.c b/lib/fpending.c
index 802ebcb..4cc0ea7 100644
--- a/lib/fpending.c
+++ b/lib/fpending.c
@@ -25,7 +25,7 @@
 #include "stdio-impl.h"
 
 /* This file is not used on systems that already have the __fpending function,
-   namely glibc >= 2.2, Solaris >= 7, Android API >= 23.  */
+   namely glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android API >= 23.  */
 
 /* Return the number of pending (aka buffered, unflushed)
bytes on the stream, FP, that is open for writing.  */
@@ -39,7 +39,7 @@ __fpending (FILE *fp)
   /* GNU libc, BeOS, Haiku, Linux libc5 */
   return fp->_IO_write_ptr - fp->_IO_write_base;
 #elif defined __sferror || defined __DragonFly__ || defined __ANDROID__
-  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin, Minix 3, Android */
+  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin < 1.7.34, Minix 3, 
Android */
   return fp->_p - fp->_bf._base;
 #elif defined __EMX__/* emx+gcc */
   return fp->_ptr - fp->_buffer;
diff --git a/lib/fpurge.c b/lib/fpurge.c
index f05da5a..fc88646 100644
--- a/lib/fpurge.c
+++ b/lib/fpurge.c
@@ -19,7 +19,7 @@
 /* Specification.  */
 #include 
 
-#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Android API >= 23 */
+#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23 */
 # include 
 #endif
 #include 
@@ -29,13 +29,13 @@
 int
 fpurge (FILE *fp)
 {
-#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Android API >= 23, musl libc */
+#if HAVE___FPURGE   /* glibc >= 2.2, Haiku, Solaris >= 7, 
Cygwin >= 1.7.10, Android API >= 23, musl libc */
 
   __fpurge (fp);
   /* The __fpurge function does not have a return value.  */
   return 0;
 
-#elif HAVE_FPURGE   /* FreeBSD, NetBSD, OpenBSD, DragonFly, 
Mac OS X, Cygwin 1.7 */
+#elif HAVE_FPURGE   /* FreeBSD, NetBSD, OpenBSD, DragonFly, 
Mac OS X, Cygwin >= 1.7 */
 
   /* Call the system's fpurge function.  */
 # undef fpurge
diff --git a/lib/freadable.c b/lib/freadable.c
index 160fc30..b7110cb 100644
--- a/lib/freadable.c
+++ b/lib/freadable.c
@@ -26,7 +26,8 @@
 #endif
 
 /* This file is not used on systems that have the __freadable function,
-   namely glibc >= 2.2, Solaris >= 7, Android API >= 23, musl libc.  */
+   namely glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android API >= 23,
+   musl libc.  */
 
 bool
 freadable (FILE *fp)
@@ -38,7 +39,7 @@ freadable (FILE *fp)
   /* GNU libc, BeOS, Haiku, Linux libc5 */
   return (fp->_flags & _IO_NO_READS) == 0;
 #elif defined __sferror || defined __DragonFly__ || defined __ANDROID__
-  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin, Minix 3, Android */
+  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin < 1.7.34, Minix 3, 
Android */
   return (fp_->_flags & (__SRW | __SRD)) != 0;
 #elif defined __EMX__   /* emx+gcc */
   return (fp->_flags & (_IORW | _IOREAD)) != 0;
diff --git a/lib/freadable.h b/lib/freadable.h
index 5a17e4a..f05e5fb 100644
--- a/lib/freadable.h
+++ b/lib/freadable.h
@@ -22,7 +22,7 @@
STREAM must not be wide-character oriented.
The result doesn't change until the stream is closed or re-opened.  */
 
-#if HAVE___FREADABLE /* glibc >= 2.2, Solaris >= 7, Android API >= 23, musl 
libc */
+#if HAVE___FREADABLE /* glibc >= 2.2, Solaris >= 7, Cygwin >= 1.7.34, Android 
API >= 23, musl libc */
 
 # include 
 # define freadable(stream) (__freadable (stream) != 0)
diff --git a/lib/freading.c b/lib/freading.c
index f4dab78..f046676 100644
--- a/lib/freading.c
+++ b/lib/freading.c
@@ -37,7 +37,7 @@ freading (FILE *fp)
   || ((fp->_flags & (_IO_NO_READS | _IO_CURRENTLY_PUTTING)) == 0
   && fp->_IO_read_base != NULL));
 # elif defined __sferror || defined __DragonFly__ || defined __ANDROID__
-  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin, Minix 3, Android */
+  /* FreeBSD, NetBSD, OpenBSD, DragonFly, Mac OS X, Cygwin < 1.7.34, Minix 3, 
Android */
   return (fp_->_flags & __SRD) != 0;
 # elif defined __EMX__   /* emx+gcc */
   return (fp->_flags & _IOREAD) != 0;
diff --git a/lib/freading.h b/lib/freading.h
index 6c5592e..1891f5a 100644
--- a/lib/freading.h
+++ b/lib/freading.h

Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 14:04 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > > > /* Return an independent copy of ITER that is initially in the same
> > > >state.  */
> > > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> > >
> > > Then a copy function is not needed, because the user's program can do
> > >
> > >   Hamt_iterator iter_clone = iter;
> >
> > The hamt itself has to be copied (to increase the reference counter).
>
> Then the comment should clarify what "independent" means. I thought it
> means that both iterators (the original one and the copy) can be used
> simultaneously, as long as the HAMT does not change. Do you mean
> something broader?
>   - If someone creates a derivative of the HAMT, the iterator won't
> be affected, right? ("persistence")

Yes.

>   - If someone makes destructive modifications to the HAMT (through the
> *_x functions), the iterator will be affected if it has not yet
> passed the point of modification, right?

No, the iterator shouldn't be affected. (The point of modification is
not well-defined without exposing implementation details.)

> So, what is the scenario where increasing the reference count will make
> a difference?

If you have a language with GC like Lisp or Scheme, say, the hamt may
be GC'd while the iterator is still live.



Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 14:14 Uhr schrieb Bruno Haible :

[...]

> > > > /* Return an independent copy of ITER that is initially in the same
> > > >state.  */
> > > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> > >
> > > Then a copy function is not needed, because the user's program can do
> > >
> > >   Hamt_iterator iter_clone = iter;
> >
> > The hamt itself has to be copied (to increase the reference counter).
>
> Whether copying an iterator can be done by assignment or requires a function
> call, is of second importance.
>
> The more important point I wanted to make: Does allocating an iterator
> require a (heap) memory allocation?
>
> If you iterate with do_while, no memory allocation is needed, because all
> information is stored on the stack, between the various function calls.
>
> If you iterate with hamt_iterator_create, and the Hamt_iterator type
> has bounded size (I guess it will not require more than 15 pointers and
> 13 indices), it can be stored on the caller's stack.

That's a very good point you make. The hamt iterator has, of course,
(low) bounded size, so it can (and should) be allocated on the stack.



Re: out-of-memory handling

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 13:56 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > I am not so convinced of whether it makes sense to create a
> > gl_set/gl_map frontend for this hamt implementation.
>
> Wikipedia [1] lists some advantages of HAMTs even without the persistence.
>
> > It is optimized
> > for the persistence (otherwise, use ordinary hash tables), while the
> > gl_set/gl_map interface is for destructive updates.
>
> How would a HAMT implementation look like that does not support persistence,
> but is instead optimized for destructive updates? Probably the reference
> counters would go away. Anything else?

The reference counter would go away as would all tests for "shared" in the code.

Without a reference counter, an element can be any "void *" (instead
of a pointer to a struct starting with Hamt_entry), which is a
substantial change.

We can add a number of #if's to the code so that it either compiles to
an implementation of persistent or of transient hamts, but I wouldn't
rather do it right now because it would make the code harder to read.
But when the hamt code has become stable (and has been put to use and,
therefore, to real-world testing), this sounds like a feasible option.



Re: HAMT iterator

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> > > I am going to implement the following iterator API as well:
> > >
> > > /* An alternative interface to iterating through the entry of a hamt
> > >that does not make use of a higher-order function like
> > >hamt_do_while uses the opaque Hamt_iterator type.  */
> > > typedef struct hamt_iterator Hamt_iterator;
> > >
> > > /* Return a newly created iterator for HAMT.  */
> > > extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);
> >
> > The pointer return here is overkill. A prototype
> >
> >   extern Hamt_iterator hamt_iterator_create (const Hamt *hamt);
> >
> > is sufficient, because the way such an iterator is used is:
> >
> >   Hamt_iterator iter = hamt_iterator_create (hamt);
> >
> >   for (...)
> > {
> >   ...
> >   Hamt_entry *elt;
> >   if (hamt_iterator_next (, ))
> > {
> >   ...
> > }
> >   ...
> > }
> >
> >   hamt_iterator_free ();
> >
> > > /* Return an independent copy of ITER that is initially in the same
> > >state.  */
> > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> >
> > Then a copy function is not needed, because the user's program can do
> >
> >   Hamt_iterator iter_clone = iter;
> 
> The hamt itself has to be copied (to increase the reference counter).

Whether copying an iterator can be done by assignment or requires a function
call, is of second importance.

The more important point I wanted to make: Does allocating an iterator
require a (heap) memory allocation?

If you iterate with do_while, no memory allocation is needed, because all
information is stored on the stack, between the various function calls.

If you iterate with hamt_iterator_create, and the Hamt_iterator type
has bounded size (I guess it will not require more than 15 pointers and
13 indices), it can be stored on the caller's stack.

Bruno




Re: HAMT iterator

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> > > /* Return an independent copy of ITER that is initially in the same
> > >state.  */
> > > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
> >
> > Then a copy function is not needed, because the user's program can do
> >
> >   Hamt_iterator iter_clone = iter;
> 
> The hamt itself has to be copied (to increase the reference counter).

Then the comment should clarify what "independent" means. I thought it
means that both iterators (the original one and the copy) can be used
simultaneously, as long as the HAMT does not change. Do you mean
something broader?
  - If someone creates a derivative of the HAMT, the iterator won't
be affected, right? ("persistence")
  - If someone makes destructive modifications to the HAMT (through the
*_x functions), the iterator will be affected if it has not yet
passed the point of modification, right?
So, what is the scenario where increasing the reference count will make
a difference?

Bruno




Re: out-of-memory handling

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> I am not so convinced of whether it makes sense to create a
> gl_set/gl_map frontend for this hamt implementation.

Wikipedia [1] lists some advantages of HAMTs even without the persistence.

> It is optimized
> for the persistence (otherwise, use ordinary hash tables), while the
> gl_set/gl_map interface is for destructive updates.

How would a HAMT implementation look like that does not support persistence,
but is instead optimized for destructive updates? Probably the reference
counters would go away. Anything else?

Bruno

[1] https://en.wikipedia.org/wiki/Hash_array_mapped_trie#Advantages_of_HAMTs




Re: [PATCH] pthread_sigmask: fix implicit declaration of 'pthread_sigmask'

2020-10-11 Thread Bruno Haible
KO Myung-Hun wrote:
> * lib/signal.in.h [__KLIBC__]: Include .
> * lib/sys_select.in.h [__KLIBC__]: Do not include .

Thanks. Applied.

Bruno




Re: HAMT iterator

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 13:02 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> > I am going to implement the following iterator API as well:
> >
> > /* An alternative interface to iterating through the entry of a hamt
> >that does not make use of a higher-order function like
> >hamt_do_while uses the opaque Hamt_iterator type.  */
> > typedef struct hamt_iterator Hamt_iterator;
> >
> > /* Return a newly created iterator for HAMT.  */
> > extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);
>
> The pointer return here is overkill. A prototype
>
>   extern Hamt_iterator hamt_iterator_create (const Hamt *hamt);
>
> is sufficient, because the way such an iterator is used is:
>
>   Hamt_iterator iter = hamt_iterator_create (hamt);
>
>   for (...)
> {
>   ...
>   Hamt_entry *elt;
>   if (hamt_iterator_next (, ))
> {
>   ...
> }
>   ...
> }
>
>   hamt_iterator_free ();
>
> > /* Return an independent copy of ITER that is initially in the same
> >state.  */
> > extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);
>
> Then a copy function is not needed, because the user's program can do
>
>   Hamt_iterator iter_clone = iter;

The hamt itself has to be copied (to increase the reference counter).

> > /* Return true if ITER is not at the end, place the current element in
> >*ELT_PTR and move the iterator forward.  Otherwise, return
> >*false.  */
> > extern bool hamt_iterator_next (Hamt_iterator *iter,
> > Hamt_entry *const *elt_ptr);
>
> The 'const' here forbids assigning to *ELT_PTR.

Yes. Wrong position of const.



Re: out-of-memory handling

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 12:54 Uhr schrieb Bruno Haible :

> > The bigger problem is that it mustn't make the code slower for the
> > 99,9% of the use cases where there is either enough memory or
> > calling xalloc_die is the only reasonable action.
>
> OK. When we add hamt as a possible implementation for gl_set and gl_map, we
> could document that out-of-memory handling must be done
>   - either through xalloc_die(),
>   - or by defining xmalloc and xrealloc [this is what Emacs does],
> unlike for the other implementations.

I am not so convinced of whether it makes sense to create a
gl_set/gl_map frontend for this hamt implementation. It is optimized
for the persistence (otherwise, use ordinary hash tables), while the
gl_set/gl_map interface is for destructive updates.



Re: HAMT iterator

2020-10-11 Thread Bruno Haible
Hi Marc,

> I am going to implement the following iterator API as well:
> 
> /* An alternative interface to iterating through the entry of a hamt
>that does not make use of a higher-order function like
>hamt_do_while uses the opaque Hamt_iterator type.  */
> typedef struct hamt_iterator Hamt_iterator;
> 
> /* Return a newly created iterator for HAMT.  */
> extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);

The pointer return here is overkill. A prototype

  extern Hamt_iterator hamt_iterator_create (const Hamt *hamt);

is sufficient, because the way such an iterator is used is:

  Hamt_iterator iter = hamt_iterator_create (hamt);

  for (...)
{
  ...
  Hamt_entry *elt;
  if (hamt_iterator_next (, ))
{
  ...
}
  ...
}

  hamt_iterator_free ();

> /* Return an independent copy of ITER that is initially in the same
>state.  */
> extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);

Then a copy function is not needed, because the user's program can do

  Hamt_iterator iter_clone = iter;

> /* Return true if ITER is not at the end, place the current element in
>*ELT_PTR and move the iterator forward.  Otherwise, return
>*false.  */
> extern bool hamt_iterator_next (Hamt_iterator *iter,
> Hamt_entry *const *elt_ptr);

The 'const' here forbids assigning to *ELT_PTR.

Bruno




Re: out-of-memory handling

2020-10-11 Thread Bruno Haible
Marc Nieper-Wißkirchen wrote:
> >   2) We have a dilemma regarding use of malloc vs. xmalloc. While modules
> >  meant for use in programs can readily use xmalloc, modules meant for 
> > use
> >  (also) in libraries cannot use xmalloc, since a library is not supposed
> >  to terminate the program, even when memory is tight.
> 
> GMP more or less has this behavior.

Indeed, GMP's out-of-memory options [1] are poor in this respect:
  "There’s currently no defined way for the allocation functions to recover
   from an error such as out of memory, they must terminate program execution.
   A longjmp or throwing a C++ exception will have undefined results."
Likewise for libffcall, alas.

> >  The solution we found for this dilemma is that the *-set and *-map 
> > modules
> >  use just malloc, and we have thin wrapper modules (xset, xmap) that do
> >  call xalloc_die() in case of out-of-memory.
> >
> >  Unfortunately, the API of many of the functions need to be adjusted to
> >  cope with the possibility of an ENOMEM failure. That is tedious work, 
> > and
> >  the code does not look so pretty afterwards... But I see no other way 
> > to
> >  make it fit for use in libraries.
> 
> The bigger problem is that it mustn't make the code slower for the
> 99,9% of the use cases where there is either enough memory or
> calling xalloc_die is the only reasonable action.

OK. When we add hamt as a possible implementation for gl_set and gl_map, we
could document that out-of-memory handling must be done
  - either through xalloc_die(),
  - or by defining xmalloc and xrealloc [this is what Emacs does],
unlike for the other implementations.

Bruno

[1] https://gmplib.org/manual/Custom-Allocation




Re: HAMT iterators

2020-10-11 Thread Bruno Haible
Hi Marc,

> >  You can see that there is an initial function call 
> > HASH-TABLE-ITERATOR
> >  upfront, and then a function call to HASH-TABLE-ITERATE in each 
> > round
> >  of the loop.
> >
> >  This principle makes it possible to iterate e.g. through a hash 
> > table
> >  and a list in parallel.
> 
> In Scheme, it is even more relevant because you have call/cc so any
> iteration can be suspended and restored any number of times.
> 
> I thought this could be solved by storing a "next" pointer in the
> payload but this doesn't work with hamts sharing entries... So I will
> add some iterator interface as well. Preferably one that can also be
> used to implement Hamts in Scheme.

The recursion across hamt_do_while / entry_do_while / subtrie_do_while /
bucket_do_while builds up a stack whose essential contents is a set of
indices, each at most 5 bits large, with a total of SIZE_WIDTH bits.
In other words, the iterator object can be a

struct
  {
const Hamt *hamt;
size_t position;
int depth;
const Hamt_entry *entry;
  };

where during the iteration
  - hamt stays unmodified,
  - position is incremented by some amount at each round,
  - depth is the recursion depth.
  - entry is a cache of hamt->nodes[(position >> 59) & 31][(position >> 54) & 
31]...
with depth-1 indexing operations, and is changed at each round,

HASH-TABLE-ITERATOR sets position, depth, entry to point to the first entry.
HASH-TABLE-ITERATE works as follows:
  - in a bucket, it increments position and fetches the next entry of the 
bucket,
  - if the bucket is done, it decreases depth and goes to
hamt->nodes[(position >> 59) & 31][(position >> 54) & 31]... (with depth-1
indexing operations) until it finds a subtrie that is not yet done,
then it increases depth, entry to point to the first entry in that subtrie.

If done this way, the iterator will fit into the gl_set_iterator_t and
gl_map_iterator_t that we already have in gl_set.h and gl_map.h.

Bruno




Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
I am going to implement the following iterator API as well:

/* An alternative interface to iterating through the entry of a hamt
   that does not make use of a higher-order function like
   hamt_do_while uses the opaque Hamt_iterator type.  */
typedef struct hamt_iterator Hamt_iterator;

/* Return a newly created iterator for HAMT.  */
extern Hamt_iterator *hamt_iterator_create (const Hamt *hamt);

/* Free the iterator ITER created through hamt_iterator_create or
   hamt_iterator_copy.  */
extern void hamt_iterator_free (Hamt_iterator *iter);

/* Return an independent copy of ITER that is initially in the same
   state.  */
extern Hamt_iterator *hamt_iterator_copy (Hamt_iterator *iter);

/* Return true if ITER is not at the end, place the current element in
   *ELT_PTR and move the iterator forward.  Otherwise, return
   *false.  */
extern bool hamt_iterator_next (Hamt_iterator *iter,
Hamt_entry *const *elt_ptr);

Am So., 11. Okt. 2020 um 10:20 Uhr schrieb Marc Nieper-Wißkirchen
:

> Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :

> >- Creating a closure in C (unlike Scheme or Lisp) is tedious 
> > hand-work.
> >  (The GNU C extension that allows local functions inside functions 
> > [1]
> >  is not a solution, because
> >- it is unportable,
> >- it forces an executable stack on many platforms, which is
> >  considered bad for security.)

PS Conceptually, hamt_do_while takes a closure, which consists of the
function pointer PROC and the closure environment DATA.

If this interface is sufficient for an application, it is more
lightweight than the alternative interface above because all
intermediate state is stored on the stack.



Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)

2020-10-11 Thread Marc Nieper-Wißkirchen
Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible :
>
> Hi Marc,
>
> Some comments:
>
> * GL_HAMT_THREAD_SAFE can be defined to 1 not only if
> __STDC_VERSION__ >= 201112 && ! defined __STD_NO_ATOMICS__
>   but also if
> __GNUC__ + (__GNUC_MINOR >= 9) > 4

Fixed.

>   (see the other mail).
>
> * Still '#ifdef GL_HAMT_THREAD_SAFE'.

Fixed.

> * Typo s/comparision/comparison/

Fixed.

> * Hamt_processor: The comment does not say what the return value means. Maybe 
> it
>   would be clearer if this type was moved down to the "Iteration" section.

Moved.

> * hamt_lookup: If the caller is allowed to modify the payload stored in the
>   returned entry, this function should return a 'Hamt_entry *', not a
>   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
>   not a 'const char *'.

Unless the caller knows what it does, modifying the payload is not a
good idea because the entry is shared between different hamts. If the
caller really wants to modify the payload, it has to do an explicit
type cast (which is safe).

> * In trienode_count, you don't need count_one_bits_l, only count_one_bits.
>   It's OK to assume that 'uint32_t' and 'unsigned int' are the same type.

Okay.

> * If subtrie_lookup was inlined in entry_lookup, you could get rid of the
>   forward declaration of entry_lookup, and compilers would be more likely
>   to turn the recursion of entry_lookup into a loop (tail-recursion
>   elimination).

I would prefer to have smaller functions. Are there any relevant
compilers who do not inline the function at higher recursion levels?

> * If subtrie_do_while was inlined in entry_do_while, you could get rid of the
>   forward declaration of entry_do_while.

> * It would be useful to be able to construct modules 'hamt-set' and 
> 'hamt-map',
>   analogous to 'hash-set' and 'hash-map', but the hamt API currently has two
>   issues that block it:
>
>   1) The iterator API is such that you cannot write the iteration using a 
> loop;
>  it involves a function call that is active during the entire iteration.
>  This is also problematic for two other reasons:
>
>- Creating a closure in C (unlike Scheme or Lisp) is tedious hand-work.
>  (The GNU C extension that allows local functions inside functions [1]
>  is not a solution, because
>- it is unportable,
>- it forces an executable stack on many platforms, which is
>  considered bad for security.)
>
>- In Common Lisp, iteration through a hash table is available not only
>  through a function-call interface (MAPHASH [2]) but also through an
>  iteration-loop interface (WITH-HASH-TABLE-ITERATOR [3], LOOP [4]).
>
>  For example, in GNU clisp, LOOP expands to
>
>  [1]> (macroexpand-1 '(loop for x being each hash-key of h do (foo 
> x)))
>  (MACROLET ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-ERROR)))
>   (BLOCK NIL
>(LET ((#:WHTI-3214 (SYSTEM::HASH-TABLE-ITERATOR H)))
> (MULTIPLE-VALUE-BIND (#:MORE?3215 #:HASH-KEY-3216 
> #:HASH-VALUE-3217)
>  (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214)
>  (DECLARE (IGNORE #:HASH-VALUE-3217))
>  (LET ((X NIL))
>   (LET NIL
>(MACROLET ((LOOP-FINISH NIL '(GO SYSTEM::END-LOOP)))
> (TAGBODY SYSTEM::BEGIN-LOOP (UNLESS #:MORE?3215 (LOOP-FINISH))
>  (SETQ X #:HASH-KEY-3216) (PROGN (PROGN (FOO X)))
>  (MULTIPLE-VALUE-SETQ
>   (#:MORE?3215 #:HASH-KEY-3216 #:HASH-VALUE-3217)
>   (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214))
>  (GO SYSTEM::BEGIN-LOOP) SYSTEM::END-LOOP
>  (MACROLET
>   ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-WARN)
> '(GO SYSTEM::END-LOOP ;
>
>  You can see that there is an initial function call 
> HASH-TABLE-ITERATOR
>  upfront, and then a function call to HASH-TABLE-ITERATE in each round
>  of the loop.
>
>  This principle makes it possible to iterate e.g. through a hash table
>  and a list in parallel.

In Scheme, it is even more relevant because you have call/cc so any
iteration can be suspended and restored any number of times.

I thought this could be solved by storing a "next" pointer in the
payload but this doesn't work with hamts sharing entries... So I will
add some iterator interface as well. Preferably one that can also be
used to implement Hamts in Scheme.

>   2) We have a dilemma regarding use of malloc vs. xmalloc. While modules
>  meant for use in programs can readily use xmalloc, modules meant for use
>  (also) in libraries cannot use xmalloc, since a library is not supposed
>  to terminate the program, even when memory is tight.

GMP more or less has this behavior.

>  The solution we found for this dilemma is that the *-set and *-map 
> modules
>  use just