Re: [PATCH 2/2] maint.mk: Replace grep with $(GREP)

2018-12-02 Thread Bruno Haible
Roman Bolshakov wrote:
> I'm quite new to gnulib but thanks to Eric and your comments that should do 
> it:
> 
> diff --git a/modules/maintainer-makefile b/modules/maintainer-makefile
> index 39b51583c..13b8c546a 100644
> --- a/modules/maintainer-makefile
> +++ b/modules/maintainer-makefile
> @@ -14,6 +14,7 @@ configure.ac:
>  AC_CONFIG_COMMANDS_PRE([m4_ifdef([AH_HEADER],
>[AC_SUBST([CONFIG_INCLUDE], m4_defn([AH_HEADER]))])])
>  AC_REQUIRE([AC_PROG_SED])
> +AC_REQUIRE([AC_PROG_GREP])
> 
>  Makefile.am:
>  EXTRA_DIST += $(top_srcdir)/maint.mk
> 

Yes, this will do it.

Can you please resubmit the entire patch 'maint.mk: Replace grep with $(GREP)'
as a whole?

Bruno




Re: [PATCH 1/2] maint.mk: Split long argument lists

2018-12-02 Thread Bruno Haible
Hi Roman,

> May I ask you to review what way we should go with ARG_MAX?
> 
> I'm okay with both ways whether it's:
>  * computing effective argument length and passing it to "-s" option;
>  * or exploiting behaviour of GNU/BSD xargs and specifying "-n" beyond
>the limit.

Use the approach that makes the least undocumented assumptions. In other
words, rely on what the documentation says and on nothing else. The
relevant documentation here is POSIX [1].

Bruno

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html




Re: gnulib's translation

2018-12-02 Thread Bruno Haible
Hi Akim,

> > This shift of work from the maintainers to the translation coordinator
> > would require
> >  1. that the translation coordinator installs whatever tooling is needed
> > to build a particular package,
> 
> It is true that some of the messages might not be in the package
> itself, but I would expect that it's a common case that all the
> _() mark up is already available.

Yes, the POT file can usually be generated from the sources found in a git
repository. But I argue that it's not the job of the TP coordinator to
build all kinds of packages.

In other words:
  - In a version control repository you find: sources.
  - In a tarball you find: sources + generated files.
  - The POT file is a generated file.
  - Thus the interface between the package maintainers and the TP is simpler if
the package maintainers submit a tarball to the TP.

> I was not suggesting that the
> TP would run code from the repo.  I guess I'm too naive thinking
> that enough information is available from a commit.

Ah, you were assuming that the POT file is stored in the version control system?
This is a practice that produces problems, and will be become rare after the
next gettext release.

Bruno




Re: [RFC] Adding a real HashTable implementation to gnulib

2018-12-02 Thread Bruno Haible
Hi,

Darshit Shah wrote:
> I recently tried to use the hash table implementation in gnulib which resides
> in the "hash" module. However, I quickly realised that the hash table in 
> gnulib
> seems to be what is otherwise popularly known as a hash set, i.e., it supports
> storing and retrieving just values from the structure. 
> 
> On the other hand, a hash table is usually expected to have a key->value
> mapping that is stored.

I agree that the gnulib 'hash' module is just a particular case, and
probably the module name is not very descriptive.

> Within GNU Wget, we have a fairly portable version of a hash table implemented
> which I think would be a good addition for gnulib. What do you think?

There's not only the one from wget
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c

but also the one from gettext
  
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h
  
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c

and the one from glib
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c

and the one from libxml
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c

and the ones from CLN
  https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash

and many more.

The implementation you are proposing is an "open-addressed table with linear
probing collision resolution". To me that is unacceptable. When I used Kyoto
Common Lisp (KCL) many years ago, I got an endless loop during a hash table
access, and it was precisely because of this open-addressed table structure.
I don't want a code which requires careful setting of parameters in order
not to run into an endless loop. Instead better have a code that cannot run
into an endless loop *by design*.

The hash_string function that you propose shifts by 5 bits at each step;
I suspect that it has the same problem as the one I tested and discussed in
https://haible.de/bruno/hashfunc.html .

For Gnulib, I would want a generic container, i.e. a "map", like we have
"list" and "ordered set" already (modules 'list' and 'oset'). Other GNU
maintainers have reported that they like this approach.

However, this will still not fit all possible needs because there are
special cases that people want to see optimized:
  - The case when the key is a string; additionally when the key is
allocated in an obstack and there is no remove.
  - The struniq function (as in localename.c).

Then, what about extra requirements?
  - The existing gnulib 'hash' module is pretty unique: it keeps statistics.
But is anyone really using this feature?
  - malloc vs. xmalloc.
  - Multithread-safety should IMO not be considered as an extra requirement.
This is better done in application logic, because typically in the scope
of the lock the application will do more than just the hash table lookup.

Bruno




Re: [RFC] Adding a real HashTable implementation to gnulib

2018-12-02 Thread LRN
On 02.12.2018 16:41, Bruno Haible wrote:
> Hi,
> 
> Darshit Shah wrote:
>> I recently tried to use the hash table implementation in gnulib which
>> resides in the "hash" module. However, I quickly realised that the hash
>> table in gnulib seems to be what is otherwise popularly known as a hash
>> set, i.e., it supports storing and retrieving just values from the
>> structure.
>> 
>> On the other hand, a hash table is usually expected to have a key->value 
>> mapping that is stored.
> 
> I agree that the gnulib 'hash' module is just a particular case, and 
> probably the module name is not very descriptive.
> 
>> Within GNU Wget, we have a fairly portable version of a hash table
>> implemented which I think would be a good addition for gnulib. What do you
>> think?
> 
> There's not only the one from wget but also the one from gettext and the one
> from glib https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h 
> https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c
> 
> and the one from libxml and the ones from CLN and many more.
> 

There was a hashtable shootout[1] recently, with a followup[2] (although that
one is glib-specific):

[1]: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/
[2]: https://hpjansson.org/blag/2018/08/29/what-ails-ghashtable/



signature.asc
Description: OpenPGP digital signature


Re: gnulib's translation

2018-12-02 Thread Paul Smith
On Sun, 2018-12-02 at 15:38 +0100, Benno Schulenberg wrote:
> > Ah, you were assuming that the POT file is stored in the version
> > control system?  This is a practice that produces problems,
> 
> What problems does this produce?  (Probably this was discussed
> earlier and elsewhere?  Maybe have a URL of an archived message?)

I'm not sure what Bruno was referring to but one issue is that POT
files contain timestamps, so source control systems consider them
"modified" every time they're rebuilt even if nothing else changes.

It's also possible that different versions of gettext could generate
slightly different POT files, so if committers have different versions
installed you get "warring commits" which change them back and forth. 
I don't know if POT files have this issue but other files generated by
autotools certainly do.




Re: [PATCH 1/2] maint.mk: Split long argument lists

2018-12-02 Thread Bruno Haible
Roman Bolshakov wrote:
> But then we will need to correct calculation of VC_ARG_MAX. We can take
> formulae from [2]:
> expr `getconf ARG_MAX` - `env|wc -c` - `env|egrep '^[^ ]+='|wc -l` \* 4 - 2048

This formula assumes that a pointer in the 'environ' array is 4 bytes long.
On 64-bit platforms it surely is 8 bytes long.

More generally, I find this formula too fragile. It assumes so many things.
I would prefer a formula which does not attempt to produce the highest possible
value, but makes less assumptions. How about
  expr `getconf ARG_MAX` / 2
?

Bruno




Re: gnulib's translation

2018-12-02 Thread Bruno Haible
Paul Smith wrote:
> It's also possible that different versions of gettext could generate
> slightly different POT files, so if committers have different versions
> installed you get "warring commits" which change them back and forth. 
> I don't know if POT files have this issue but other files generated by
> autotools certainly do.

Yes, POT files have this issue as well. Each time there is a change to
xgettext - even if it's merely a bug fix -, different xgettext versions
will produce different POT files.

Bruno




Re: gnulib's translation

2018-12-02 Thread Bruno Haible
Hi Benno,

> > Ah, you were assuming that the POT file is stored in the version control 
> > system?
> > This is a practice that produces problems,
> 
> What problems does this produce?



There are basically three ways to deal with generated files in the context
of a version controlled repository, such as ‘configure’ generated from
‘configure.ac’, parser.c generated from parser.y, or po/Makefile.in.in
autoinstalled by gettextize or autopoint.

“Never”
Generated files are never committed into the repository. 
“Occasionally”
All generated files are committed into the repository occasionally, for
example each time a release is made. 
“Always”
All generated files are always committed into the repository. 

Each of these three approaches has different advantages and drawbacks.

“Never”
The advantage is less work for the maintainers. In particular, the
handling of branches becomes much easier, both for release branches
and “topic branches” as in Git. The drawback is that anyone who checks
out the source not only needs tools like GNU automake, GNU autoconf,
GNU m4 installed in his PATH, but also that he needs to perform a
package specific pre-build step before being able to "./configure; make".
“Occasionally”
The advantage is that anyone can check out the source, and the usual
"./configure; make" will work. The drawbacks are: 1. The one who checks
out the repository needs tools like GNU automake, GNU autoconf, GNU m4
installed in his PATH; sometimes he even needs particular versions of
them. 2. When a release is made and a commit is made on the generated
files, the other developers get conflicts on the generated files when
merging the local work back to the repository. Although these conflicts
are easy to resolve, they are annoying. 3. Copying a change to a branch
is not easy, because it involves separating the patch into a hand-made
part, to be applied literally, and a set of commands or Makefile targets
which will produce the other part. 4. Working with branches is time-
consuming and error-prone.
“Always”
The advantage is that anyone can check out the source at any moment and
gets a working build. The drawbacks are: 1. It requires some frequent
"push" actions by the maintainers. 2. The repository grows in size quite
fast. 3. and 4. as for “Occasionally”. 

The “Never” approach is the predominant one nowadays, especially in projects 
that use branches like in Git. 



> (Probably this was discussed earlier
> and elsewhere?  Maybe have a URL of an archived message?)

It will be documented in the next release of the GNU gettext manual.

Bruno




Re: gnulib's translation

2018-12-02 Thread Benno Schulenberg


> Akim Demaille wrote:
>> maybe the translation project could work on top of git now?

No, I'm not going to do that.  The TP is geared toward packages that
make releases.  So I need a (prerelease) tarball that contains the
corresponding POT file.

Benno




Re: gnulib's translation

2018-12-02 Thread Benno Schulenberg


Op 01-12-18 om 17:00 schreef Bruno Haible:
> Akim has just moved some code from Bison to Gnulib. He writes:
> 
>> Also, I feel sorry for Bison's translators when submitting modules to
>> gnulib: it's kind of throwing away their work; it would be great if there
>> were a means to preserve these translations.  Maybe the translation project
>> is able to fill the translations of one project based on that of another, I
>> don't know.
> 
> What is the recommended procedure, to save translator work, when a number of
> messages have been moved from one domain to another domain (both domains are
> managed by the TP)?

There is no procedure, as such moving of stuff across domains is a rare
affair.  But... it should not result in any Bison strings going untranslated
because Bison will now include the relevant modules from gnulib, and the
strings they contain will still be included in Bison's POT file -- they
will just be marked with "#: lib/..." instead of "#: src/...", no?

(When a new gnulib POT file gets submitted, I could do a one-time msgmerge
to "import" translated strings from Bison to gnulib.)

Benno




Re: gnulib's translation

2018-12-02 Thread Benno Schulenberg


Op 02-12-18 om 13:10 schreef Bruno Haible:
>   - Thus the interface between the package maintainers and the TP is simpler 
> if
> the package maintainers submit a tarball to the TP.

Precisely.  Furthermore, when following git, how is the translator to
know that a release is approaching and the time has come to update
his/her PO file?  We don't want to prod the translators for every
little string change.

> Ah, you were assuming that the POT file is stored in the version control 
> system?
> This is a practice that produces problems,

What problems does this produce?  (Probably this was discussed earlier
and elsewhere?  Maybe have a URL of an archived message?)

Benno




Re: [PATCH 1/2] maint.mk: Split long argument lists

2018-12-02 Thread Bernhard Voelker
On 11/30/18 12:14 PM, Roman Bolshakov wrote:
> May I ask you to review what way we should go with ARG_MAX?
> 
> I'm okay with both ways whether it's:
>  * computing effective argument length and passing it to "-s" option;
>  * or exploiting behaviour of GNU/BSD xargs and specifying "-n" beyond
>the limit.

Actually, xargs (and any implementation of it) cares about the limit
itself.  That's what it is made for.

You would limit the number of args with "-n" if the executed program
can only handle up to that number, or if the logic requires it, e.g.
when input comes in as pairs:
  $ seq 6 | xargs -n2 echo diff -u
  diff -u 1 2
  diff -u 3 4
  diff -u 5 6

There's no need to worry about the other end of the range.
So in your patch, just omit the -n (and getconf).

Have a nice day,
Berny