Re: [PATCH 2/2] maint.mk: Replace grep with $(GREP)
Roman Bolshakov wrote: > I'm quite new to gnulib but thanks to Eric and your comments that should do > it: > > diff --git a/modules/maintainer-makefile b/modules/maintainer-makefile > index 39b51583c..13b8c546a 100644 > --- a/modules/maintainer-makefile > +++ b/modules/maintainer-makefile > @@ -14,6 +14,7 @@ configure.ac: > AC_CONFIG_COMMANDS_PRE([m4_ifdef([AH_HEADER], >[AC_SUBST([CONFIG_INCLUDE], m4_defn([AH_HEADER]))])]) > AC_REQUIRE([AC_PROG_SED]) > +AC_REQUIRE([AC_PROG_GREP]) > > Makefile.am: > EXTRA_DIST += $(top_srcdir)/maint.mk > Yes, this will do it. Can you please resubmit the entire patch 'maint.mk: Replace grep with $(GREP)' as a whole? Bruno
Re: [PATCH 1/2] maint.mk: Split long argument lists
Hi Roman, > May I ask you to review what way we should go with ARG_MAX? > > I'm okay with both ways whether it's: > * computing effective argument length and passing it to "-s" option; > * or exploiting behaviour of GNU/BSD xargs and specifying "-n" beyond >the limit. Use the approach that makes the least undocumented assumptions. In other words, rely on what the documentation says and on nothing else. The relevant documentation here is POSIX [1]. Bruno [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html
Re: gnulib's translation
Hi Akim, > > This shift of work from the maintainers to the translation coordinator > > would require > > 1. that the translation coordinator installs whatever tooling is needed > > to build a particular package, > > It is true that some of the messages might not be in the package > itself, but I would expect that it's a common case that all the > _() mark up is already available. Yes, the POT file can usually be generated from the sources found in a git repository. But I argue that it's not the job of the TP coordinator to build all kinds of packages. In other words: - In a version control repository you find: sources. - In a tarball you find: sources + generated files. - The POT file is a generated file. - Thus the interface between the package maintainers and the TP is simpler if the package maintainers submit a tarball to the TP. > I was not suggesting that the > TP would run code from the repo. I guess I'm too naive thinking > that enough information is available from a commit. Ah, you were assuming that the POT file is stored in the version control system? This is a practice that produces problems, and will be become rare after the next gettext release. Bruno
Re: [RFC] Adding a real HashTable implementation to gnulib
Hi, Darshit Shah wrote: > I recently tried to use the hash table implementation in gnulib which resides > in the "hash" module. However, I quickly realised that the hash table in > gnulib > seems to be what is otherwise popularly known as a hash set, i.e., it supports > storing and retrieving just values from the structure. > > On the other hand, a hash table is usually expected to have a key->value > mapping that is stored. I agree that the gnulib 'hash' module is just a particular case, and probably the module name is not very descriptive. > Within GNU Wget, we have a fairly portable version of a hash table implemented > which I think would be a good addition for gnulib. What do you think? There's not only the one from wget https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c but also the one from gettext https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c and the one from glib https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c and the one from libxml https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c and the ones from CLN https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash and many more. The implementation you are proposing is an "open-addressed table with linear probing collision resolution". To me that is unacceptable. When I used Kyoto Common Lisp (KCL) many years ago, I got an endless loop during a hash table access, and it was precisely because of this open-addressed table structure. I don't want a code which requires careful setting of parameters in order not to run into an endless loop. Instead better have a code that cannot run into an endless loop *by design*. The hash_string function that you propose shifts by 5 bits at each step; I suspect that it has the same problem as the one I tested and discussed in https://haible.de/bruno/hashfunc.html . For Gnulib, I would want a generic container, i.e. a "map", like we have "list" and "ordered set" already (modules 'list' and 'oset'). Other GNU maintainers have reported that they like this approach. However, this will still not fit all possible needs because there are special cases that people want to see optimized: - The case when the key is a string; additionally when the key is allocated in an obstack and there is no remove. - The struniq function (as in localename.c). Then, what about extra requirements? - The existing gnulib 'hash' module is pretty unique: it keeps statistics. But is anyone really using this feature? - malloc vs. xmalloc. - Multithread-safety should IMO not be considered as an extra requirement. This is better done in application logic, because typically in the scope of the lock the application will do more than just the hash table lookup. Bruno
Re: [RFC] Adding a real HashTable implementation to gnulib
On 02.12.2018 16:41, Bruno Haible wrote: > Hi, > > Darshit Shah wrote: >> I recently tried to use the hash table implementation in gnulib which >> resides in the "hash" module. However, I quickly realised that the hash >> table in gnulib seems to be what is otherwise popularly known as a hash >> set, i.e., it supports storing and retrieving just values from the >> structure. >> >> On the other hand, a hash table is usually expected to have a key->value >> mapping that is stored. > > I agree that the gnulib 'hash' module is just a particular case, and > probably the module name is not very descriptive. > >> Within GNU Wget, we have a fairly portable version of a hash table >> implemented which I think would be a good addition for gnulib. What do you >> think? > > There's not only the one from wget but also the one from gettext and the one > from glib https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h > https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c > > and the one from libxml and the ones from CLN and many more. > There was a hashtable shootout[1] recently, with a followup[2] (although that one is glib-specific): [1]: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/ [2]: https://hpjansson.org/blag/2018/08/29/what-ails-ghashtable/ signature.asc Description: OpenPGP digital signature
Re: gnulib's translation
On Sun, 2018-12-02 at 15:38 +0100, Benno Schulenberg wrote: > > Ah, you were assuming that the POT file is stored in the version > > control system? This is a practice that produces problems, > > What problems does this produce? (Probably this was discussed > earlier and elsewhere? Maybe have a URL of an archived message?) I'm not sure what Bruno was referring to but one issue is that POT files contain timestamps, so source control systems consider them "modified" every time they're rebuilt even if nothing else changes. It's also possible that different versions of gettext could generate slightly different POT files, so if committers have different versions installed you get "warring commits" which change them back and forth. I don't know if POT files have this issue but other files generated by autotools certainly do.
Re: [PATCH 1/2] maint.mk: Split long argument lists
Roman Bolshakov wrote: > But then we will need to correct calculation of VC_ARG_MAX. We can take > formulae from [2]: > expr `getconf ARG_MAX` - `env|wc -c` - `env|egrep '^[^ ]+='|wc -l` \* 4 - 2048 This formula assumes that a pointer in the 'environ' array is 4 bytes long. On 64-bit platforms it surely is 8 bytes long. More generally, I find this formula too fragile. It assumes so many things. I would prefer a formula which does not attempt to produce the highest possible value, but makes less assumptions. How about expr `getconf ARG_MAX` / 2 ? Bruno
Re: gnulib's translation
Paul Smith wrote: > It's also possible that different versions of gettext could generate > slightly different POT files, so if committers have different versions > installed you get "warring commits" which change them back and forth. > I don't know if POT files have this issue but other files generated by > autotools certainly do. Yes, POT files have this issue as well. Each time there is a change to xgettext - even if it's merely a bug fix -, different xgettext versions will produce different POT files. Bruno
Re: gnulib's translation
Hi Benno, > > Ah, you were assuming that the POT file is stored in the version control > > system? > > This is a practice that produces problems, > > What problems does this produce? There are basically three ways to deal with generated files in the context of a version controlled repository, such as ‘configure’ generated from ‘configure.ac’, parser.c generated from parser.y, or po/Makefile.in.in autoinstalled by gettextize or autopoint. “Never” Generated files are never committed into the repository. “Occasionally” All generated files are committed into the repository occasionally, for example each time a release is made. “Always” All generated files are always committed into the repository. Each of these three approaches has different advantages and drawbacks. “Never” The advantage is less work for the maintainers. In particular, the handling of branches becomes much easier, both for release branches and “topic branches” as in Git. The drawback is that anyone who checks out the source not only needs tools like GNU automake, GNU autoconf, GNU m4 installed in his PATH, but also that he needs to perform a package specific pre-build step before being able to "./configure; make". “Occasionally” The advantage is that anyone can check out the source, and the usual "./configure; make" will work. The drawbacks are: 1. The one who checks out the repository needs tools like GNU automake, GNU autoconf, GNU m4 installed in his PATH; sometimes he even needs particular versions of them. 2. When a release is made and a commit is made on the generated files, the other developers get conflicts on the generated files when merging the local work back to the repository. Although these conflicts are easy to resolve, they are annoying. 3. Copying a change to a branch is not easy, because it involves separating the patch into a hand-made part, to be applied literally, and a set of commands or Makefile targets which will produce the other part. 4. Working with branches is time- consuming and error-prone. “Always” The advantage is that anyone can check out the source at any moment and gets a working build. The drawbacks are: 1. It requires some frequent "push" actions by the maintainers. 2. The repository grows in size quite fast. 3. and 4. as for “Occasionally”. The “Never” approach is the predominant one nowadays, especially in projects that use branches like in Git. > (Probably this was discussed earlier > and elsewhere? Maybe have a URL of an archived message?) It will be documented in the next release of the GNU gettext manual. Bruno
Re: gnulib's translation
> Akim Demaille wrote: >> maybe the translation project could work on top of git now? No, I'm not going to do that. The TP is geared toward packages that make releases. So I need a (prerelease) tarball that contains the corresponding POT file. Benno
Re: gnulib's translation
Op 01-12-18 om 17:00 schreef Bruno Haible: > Akim has just moved some code from Bison to Gnulib. He writes: > >> Also, I feel sorry for Bison's translators when submitting modules to >> gnulib: it's kind of throwing away their work; it would be great if there >> were a means to preserve these translations. Maybe the translation project >> is able to fill the translations of one project based on that of another, I >> don't know. > > What is the recommended procedure, to save translator work, when a number of > messages have been moved from one domain to another domain (both domains are > managed by the TP)? There is no procedure, as such moving of stuff across domains is a rare affair. But... it should not result in any Bison strings going untranslated because Bison will now include the relevant modules from gnulib, and the strings they contain will still be included in Bison's POT file -- they will just be marked with "#: lib/..." instead of "#: src/...", no? (When a new gnulib POT file gets submitted, I could do a one-time msgmerge to "import" translated strings from Bison to gnulib.) Benno
Re: gnulib's translation
Op 02-12-18 om 13:10 schreef Bruno Haible: > - Thus the interface between the package maintainers and the TP is simpler > if > the package maintainers submit a tarball to the TP. Precisely. Furthermore, when following git, how is the translator to know that a release is approaching and the time has come to update his/her PO file? We don't want to prod the translators for every little string change. > Ah, you were assuming that the POT file is stored in the version control > system? > This is a practice that produces problems, What problems does this produce? (Probably this was discussed earlier and elsewhere? Maybe have a URL of an archived message?) Benno
Re: [PATCH 1/2] maint.mk: Split long argument lists
On 11/30/18 12:14 PM, Roman Bolshakov wrote: > May I ask you to review what way we should go with ARG_MAX? > > I'm okay with both ways whether it's: > * computing effective argument length and passing it to "-s" option; > * or exploiting behaviour of GNU/BSD xargs and specifying "-n" beyond >the limit. Actually, xargs (and any implementation of it) cares about the limit itself. That's what it is made for. You would limit the number of args with "-n" if the executed program can only handle up to that number, or if the logic requires it, e.g. when input comes in as pairs: $ seq 6 | xargs -n2 echo diff -u diff -u 1 2 diff -u 3 4 diff -u 5 6 There's no need to worry about the other end of the range. So in your patch, just omit the -n (and getconf). Have a nice day, Berny