Re: Command line parsing of ls with genparse: inline code
On Mon, Jun 16, 2008 at 09:43:37PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: ... I think emitting into a .h file would be a good solution. But wouldn't it then be better to directly add a genparse file (e.g. ls.gp) instead of extracting it from ls.c? ls.c would no longer have to be modified then. The disadvantage is that there would then be bits of code in ls.gp that depends on declarations in ls.c. I.e., static, file-scoped variables in ls.c would now also appear in ls.gp. That is too distant for my taste. That would also add ~100 more version-controlled *.gp source files. I uploaded a new patch to http://genparse.sourceforge.net/coreutils/genparse-07-18-2008.patch. With that patch the distribution tarball will include ls.in.c which contains the genparse input file at the bottom. Make will automatically extract it and move the generated parser to ls-gp.h which will also be part of the distribution tarball. ls.c will not be distributed but generating it from ls.in.c only requires sed, not genparse. So there is no code duplication in the distribution. Conceptionally genparse is not a build time requirement but actually it still is because the configure-script doesn't yet check if genparse is available. About translations: I suggest that gettext is applied on the generated parser (ls-gp.h), not on the genparse input at the bottom of ls.in.c. On the rest, shall gettext be applied on ls.c or ls.in.c? The difference between ls.in.c and ls.c is only genparse input file which is attached at the end of ls.c as a comment. ls.c is a generated file now. For this reason make distcheck fails with this patch (because it doesn't find ls.c during the translation checks) although make check succeeds. You will need genparse v0.7.7 in order to process the parser in this patch. It is available from http://genparse.sourceforge.net/ as usual. How about this concept? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: truncate.c fails to compile on make distcheck
On Wed, Jun 25, 2008 at 11:39:50PM +0100, Pádraig Brady wrote: There may be nothing wrong with your system, I'm just speculating. For example on 64 bit systems which I don't have access to, there may be a mismatch between intmax_t and off_t which is not obvious to me? Anyway it would be good to confirm that you can infact create a large file using: `truncate -s3G large_file` I could successfully create a 3GB file with `truncate -s3G large_file` using truncate from the sources I compiled yesterday. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
truncate.c fails to compile on make distcheck
Hi, When I'm running make distcheck on the Coreutils I get an error message ... gcc -std=gnu99 -I. -I../lib -I../lib -Werror -ansi -Wno-long-long -MT truncate.o -MD -MP -MF .deps/truncate.Tpo -c -o truncate.o truncate.c cc1: warnings being treated as errors truncate.c: In function 'parse_len': truncate.c:78: error: passing argument 4 of 'xstrtoimax' from incompatible pointer type make[6]: *** [truncate.o] Error 1 ... I'm running Linux on a PC and I compiled the Coreutils from git from this morning bootstrapped with Gnulib also from git from this morning. Is this a bug in the Coreutils or is there something wrong on my system? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: truncate.c fails to compile on make distcheck
On Wed, Jun 25, 2008 at 05:29:57PM +0100, Pádraig Brady wrote: Michael Geng wrote: Hi, When I'm running make distcheck on the Coreutils I get an error message ... gcc -std=gnu99 -I. -I../lib -I../lib -Werror -ansi -Wno-long-long -MT truncate.o -MD -MP -MF .deps/truncate.Tpo -c -o truncate.o truncate.c cc1: warnings being treated as errors truncate.c: In function 'parse_len': truncate.c:78: error: passing argument 4 of 'xstrtoimax' from incompatible pointer type make[6]: *** [truncate.o] Error 1 ... I'm running Linux on a PC and I compiled the Coreutils from git from this morning bootstrapped with Gnulib also from git from this morning. Is this a bug in the Coreutils or is there something wrong on my system? What version of glibc are you using. I make the assumption that OFF_T_MAX = INTMAX_MAX, but this is only true when _FILE_OFFSET_BITS=64 I'll look into fixing that up this evening. I wonder why that is not set in your lib/config.h ? cheers, Pádraig. I'm using glibc version 2.7 (Debian packages libc6 and libc6-dev version 2.7-12). In my lib/config.h in my sources of the Coreutils I have _FILE_OFFSET_BITS set to 64. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: truncate.c fails to compile on make distcheck
On Wed, Jun 25, 2008 at 07:21:51PM +0200, Jim Meyering wrote: Pádraig Brady [EMAIL PROTECTED] wrote: Michael Geng wrote: Hi, When I'm running make distcheck on the Coreutils I get an error message ... gcc -std=gnu99 -I. -I../lib -I../lib -Werror -ansi -Wno-long-long -MT truncate.o -MD -MP -MF .deps/truncate.Tpo -c -o truncate.o truncate.c cc1: warnings being treated as errors truncate.c: In function 'parse_len': truncate.c:78: error: passing argument 4 of 'xstrtoimax' from incompatible pointer type make[6]: *** [truncate.o] Error 1 ... I'm running Linux on a PC and I compiled the Coreutils from git from this morning bootstrapped with Gnulib also from git from this morning. Is this a bug in the Coreutils or is there something wrong on my system? What version of glibc are you using. I make the assumption that OFF_T_MAX = INTMAX_MAX, Humph. I should have caught that in review. Here's a tentative patch: diff --git a/src/truncate.c b/src/truncate.c index 8febd58..52500d2 100644 --- a/src/truncate.c +++ b/src/truncate.c @@ -74,10 +74,21 @@ static int parse_len (char const *str, off_t *size) { enum strtol_error e; - /* OFF_T_MAX = INTMAX_MAX */ - e = xstrtoimax (str, NULL, 10, size, EgGkKmMPtTYZ0); - errno = (e == LONGINT_OVERFLOW) ? EOVERFLOW : 0; - return (e == LONGINT_OK) ? 0 : -1; + intmax_t tmp_size; + e = xstrtoimax (str, NULL, 10, tmp_size, EgGkKmMPtTYZ0); + if (e == LONGINT_OK + !(OFF_T_MIN = tmp_size tmp_size = OFF_T_MAX)) +e = LONGINT_OVERFLOW; + + if (e == LONGINT_OK) +{ + errno = 0; + *size = tmp_size; + return 0; +} + + errno = (e == LONGINT_OVERFLOW ? EOVERFLOW : 0); + return -1; } static void With your patch I could successfully compile the Coreutils and it also passed make distcheck this time. Thank you very much for providing the patch so quickly! Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: truncate.c fails to compile on make distcheck
On Wed, Jun 25, 2008 at 08:19:56PM +0100, Pádraig Brady wrote: Micheal there still is something wrong with your system I think. Did you perhaps compile glibc from scratch and didn't pass a required configure option for Large File Support? I didn't compile glibc myself. So I don't know what's wrong on my system. But never mind, thanks to Jim's quick patch it works now :-) Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: check-AUTHORS fails because of non ansi characters
On Sat, Jun 21, 2008 at 06:10:28PM +0200, Bruno Haible wrote: Michael must be using a locale in ASCII encoding; if it were a Latin1 encoding, the output would have contained a cedilla, not a question mark. I did not have the en_US.UTF8 locale installed. After installing it the problem disappeared. Thanks, Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
check-AUTHORS fails because of non ansi characters
Hi, when I'm doing make distcheck on the present git version it fails with the following message: ... diff authors-actual authors-dotdot rm -f authors-actual authors-dotdot ... 58c58 ptx: Fran?ois Pinard --- ptx: François Pinard ... make[3]: *** [check-AUTHORS] Error 1 ... It clearly has to do with non ansi characters in names. But what exactly is wrong? Does AUTHORS now have to be updated? Or do the source files (ptx.c in this case) have to be updated? Or is there anything wrong with my Linux installation? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Command line parsing of ls with genparse: inline code
On Sat, Jun 14, 2008 at 06:17:30PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: when I posted a patch around Christmas which showed how genparse could generate the parser code for the ping command of the inetutils Alfred Szmidt replied that in that example there are 2 loops (see http://lists.gnu.org/archive/html/bug-inetutils/2007-12/msg7.html): 1. The switch statement which sets the arg_t struct with the results of getopt_long in ping-clp.c 2. A list of if-clauses which sets the already existing variables in ping.c from the arg_t struct I now released a new version of genparse (0.7.6) which allows to put the generated parser code inline of existing code, replacing almost exactly the existing C code. This version has 2 extensions: 1. A __CODE__(statements) directive which allows to let the user define the code which is executed in the switch statement which evaluates the results of getopt_long. 2. A global directive #no_struct. If specified in the genparse file then no arg_t struct struct is generated. I uploaded a patch which shows how this works for the ls command of the coreutils. It transforms ls.c.in which has the genparse file inline into ls.c which has almost exactly the same command line parser code inline as the present version of ls.c. The patch is targeted to today's git version of the coreutils. I verified that it passes make distcheck. Please use the following links: patch: http://genparse.sourceforge.net/coreutils/genparse-ls-inline-git-01-19-2008.patch ls.c.in: http://genparse.sourceforge.net/coreutils/ls.c.in (included in the patch) ls.c http://genparse.sourceforge.net/coreutils/ls.c(included in the patch) Hi Michael, If you're still interested (and willing to work! ;-) I think we're getting close. I hope you understand the delay. We're contemplating a massive change, and I want to be sure I don't end up regretting it. Now that coreutils-6.12 is out, once two remaining points are addressed, then I think we'll finally be ready to switch a few trial programs. Please add #line directives in the emitted C code, so that debuggers know the source file/line-numbers. I'd really like to avoid having to move the primary source for 100+ C files from .c to .c.in files (or .in.c). I'm not sure how best to address that. Here are some thoughts: We might arrange for automake to emit an augmented .c-to-.o rule that invokes genparse along the way (this is where #line directives become essential -- the postprocessed intermediate .c file may not even exist after it's compiled. Note however that that would add genparse as a *build-time* requirement, so isn't really an option. The usual approach to solving this problem is to distribute both primary sources and generated sources (e.g., yacc and lex sources ship with generated .c files), but that would bloat the distribution tarballs. One possibility would be to emit into a .h file the static definitions for functions like Cmdline. Then there would be far less duplication. Jim Hi Jim, I think emitting into a .h file would be a good solution. But wouldn't it then be better to directly add a genparse file (e.g. ls.gp) instead of extracting it from ls.c? ls.c would no longer have to be modified then. How about simply renaming the genparse generated ls-clp.c into a header file, modifying it the same way the Makefile with my last patch already does and including it in ls.c? Here's a drawing of this idea: genparse sed ls.gp -- ls-clp.c - ls-genparse.h This would mean that ls.c becomes more tiny and readable (because the command line part is removed) but ls.gp will be added and in order not to have genparse as a build-time requirement ls-genparse.h will also have to be added to the distribution. ls-clp.c becomes an intermediate file which need not be distributed, I hope that ls-clp.h will not be needed at all. There will be no code duplication except between the genparse file and the generated command line parser. I haven't yet had time to test my idea, I hope I will find time to do that in the next days. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Command line parsing of ls with genparse: inline code
Hi, when I posted a patch around Christmas which showed how genparse could generate the parser code for the ping command of the inetutils Alfred Szmidt replied that in that example there are 2 loops (see http://lists.gnu.org/archive/html/bug-inetutils/2007-12/msg7.html): 1. The switch statement which sets the arg_t struct with the results of getopt_long in ping-clp.c 2. A list of if-clauses which sets the already existing variables in ping.c from the arg_t struct I now released a new version of genparse (0.7.6) which allows to put the generated parser code inline of existing code, replacing almost exactly the existing C code. This version has 2 extensions: 1. A __CODE__(statements) directive which allows to let the user define the code which is executed in the switch statement which evaluates the results of getopt_long. 2. A global directive #no_struct. If specified in the genparse file then no arg_t struct struct is generated. I uploaded a patch which shows how this works for the ls command of the coreutils. It transforms ls.c.in which has the genparse file inline into ls.c which has almost exactly the same command line parser code inline as the present version of ls.c. The patch is targeted to today's git version of the coreutils. I verified that it passes make distcheck. Please use the following links: patch: http://genparse.sourceforge.net/coreutils/genparse-ls-inline-git-01-19-2008.patch ls.c.in: http://genparse.sourceforge.net/coreutils/ls.c.in (included in the patch) ls.c http://genparse.sourceforge.net/coreutils/ls.c(included in the patch) How do you think about it? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Translator comments and other extensions to genparse
Hi Bruno, Jim, Andreas, hello mailing list, I released genparse version 0.7.2 which addresses some issues which we recently discussed in the mailing list. Splitting usage text in smaller pieces == Regarding http://lists.gnu.org/archive/html/bug-coreutils/2007-10/msg00048.html: genparse now has a new command line option -P / --manyprints which causes the description for every command line parameter to be printed in a seperate print command. See example below. Note that genparse also has a global directive #break_lines which allows to break lines in the usage text automatically to a specified width. Translator comments === Release 0.7.2 also has a new directive __COMMENT__ which allows to add comments to the source code which prints the help text. The example below shows how this can be used especially for translator comments. puts instead of printf == Regarding http://lists.gnu.org/archive/html/bug-coreutils/2007-10/msg00087.html: genparse automatically uses the puts command now if possible. An updated version of genparse, updated docu and examples can be accessed from http://genparse.sourceforge.net/ as usual. Below is a modified version of the genparse file for the wc command: #include coreutils.h c / bytes flagprint the byte counts m / chars flagprint the character counts l / lines flagprint the newline counts NONE / files0-from=Fstring read input from the files specified by NUL-terminated names in file F L / max-line-length flag__COMMENT__(TRANSLATORS: the length here is actually the screen width) print the length of the longest line w / words flagprint the word counts #gp_include help_version.gp #usage_begin __COMMENT__(TRANSLATORS: Keep the colons aligned.) Usage: __PROGRAM_NAME__ [OPTION]... [FILE]... or: __PROGRAM_NAME__ [OPTION]...__NEW_PRINT__ Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. With no FILE, or when FILE is -, read standard input.__NEW_PRINT__ __GLOSSARY_GNU__(25) __COMMAND__(emit_bug_reporting_address ()) #usage_end And below is the usage () function of the parser genparse 0.7.2 generates when invoked with genparse --longmembers --internationalize --static-headers --gnulib --manyprints -o wc-clp wc.gp /* ** ** usage () ** ** Print out usage information, then exit ** **--*/ void usage (int status, char *program_name) { if (status != EXIT_SUCCESS) fprintf (stderr, _(Try `%s --help' for more information.\n), program_name); else { /* TRANSLATORS: Keep the colons aligned. */ printf (_(\ Usage: %s [OPTION]... [FILE]...\n\ or: %s [OPTION]...\n), program_name, program_name); puts (_(\ Print newline, word, and byte counts for each FILE, and a total line if\n\ more than one FILE is specified. With no FILE, or when FILE is -,\n\ read standard input.)); puts (_(\ -c, --bytesprint the byte counts)); puts (_(\ -m, --charsprint the character counts)); puts (_(\ -l, --linesprint the newline counts)); puts (_(\ --files0-from=Fread input from the files specified by\n\ NUL-terminated names in file F)); /* TRANSLATORS: the length here is actually the screen width */ puts (_(\ -L, --max-line-length print the length of the longest line)); puts (_(\ -w, --wordsprint the word counts)); puts (_(\ --help display this help and exit)); puts (_(\ --version output version information and exit)); emit_bug_reporting_address (); } exit (status); } Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Translator comments and other extensions to genparse
On Sat, Oct 13, 2007 at 04:01:09PM +0200, Bruno Haible wrote: Michael Geng wrote: I released genparse version 0.7.2 which addresses some issues which we recently discussed in the mailing list. Thank you very much! The only remaining question is: Why is --manyprints not implied by --internationalize? Or why does the doc of --internationalize not say: When you use this option, the option --manyprints is also recommended.? Bruno Is there really consensus that in future every command line parameter shall be output in a seperate print command? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls, tail and wc with genparse
On Sat, Oct 06, 2007 at 05:35:43PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: ... file you extracted above is from the wc command. You can watch the genparse generated parser for it from http://genparse.sourceforge.net/examples/wc_clp.c. It's nice to see the continuing improvements. I noticed that you transformed the uses of fputs in wc's usage function into uses of fprintf in your generated wc_clp.c. That is probably ok in most cases, but in some (with a string containing %), it's not -- unless you escape them. Besides, I switched from fprintf to fputs for a reason: so that I (and translators) don't have to worry about such escaping. In addition, I don't mind that fputs is lighter-weight than fprintf. Is there a special reason why you use fprintf (... , stdout) and fputs (stdout, ...) instead of printf and puts? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls, tail and wc with genparse
On Sat, Oct 06, 2007 at 05:09:15PM +0200, Bruno Haible wrote: Michael Geng wrote: My intention is in fact to invoke xgettext on the parser files which genparse generates. ... You can watch the genparse generated parser for it from http://genparse.sourceforge.net/examples/wc_clp.c. OK, this kind of generated code is not bad; it's acceptable to have translators look at it. But there are still two problems: 1) The gettext documentation [1] recommends to not make the _() arguments unnecessarily large. printf (_(\ --files0-from=Fread input from the files specified by\n\ NUL-terminated names in file F\n\ -L, --max-line-length print the length of the longest line\n\ -w, --wordsprint the word counts\n)); This is too large. When the maintainer adds another option, the translator should not have to re-proofread the 3 existing options. So you should better generate code like this: The text is partitioned exactly as it is in the existing code of tail.c (I'm looking at a cvs archive copy from sept 9). This is from tail.c: fputs (_(\ --files0-from=Fread input from the files specified by\n\ NUL-terminated names in file F\n\ -L, --max-line-length print the length of the longest line\n\ -w, --wordsprint the word counts\n\ ), stdout); - printf (_(\ --files0-from=Fread input from the files specified by\n\ NUL-terminated names in file F\n)); printf (_(\ -L, --max-line-length print the length of the longest line\n)); printf (_(\ -w, --wordsprint the word counts\n)); Genparse could easily generate the above code if you added more __NEW_PRINT__ directives to the genparse file like this: NONE / files0-from=Fstring read input from the files specified by NUL-terminated names in file F__NEW_PRINT__ L / max-line-length flagprint the length of the longest line__NEW_PRINT__ w / words flagprint the word counts__NEW_PRINT__ 2) What about translator comments? How can a maintainer provide additional information for the translators? Does this work? /* TRANSLATORS: the length here is actually the screen width */ L / max-line-length flagprint the length of the longest line /* TRANSLATORS: Keep the colons aligned. */ Usage: __PROGRAM_NAME__ [OPTION]... [FILE]... or: __PROGRAM_NAME__ [OPTION]... --files0-from=F__NEW_PRINT__ Bruno [1] http://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html That could be a reason for a new directive in the genparse file. When I grep for TRANSLATORS in the coreutils src/ directory I can find only 3 hits however and none of them is directly in the command line parsing code. system.h contains a TRANSLATORS comment in the emit_bug_reporting_address () function but that can stay as it is because genparse only instanstiates the call to emit_bug_reporting_address () (see e.g. http://genparse.sourceforge.net/examples/wc_clp.c). Do you see more examples of translator comments in the coreutils code for which you think genparse should handle them? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls, tail and wc with genparse
On Sat, Oct 06, 2007 at 05:35:43PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: ... file you extracted above is from the wc command. You can watch the genparse generated parser for it from http://genparse.sourceforge.net/examples/wc_clp.c. It's nice to see the continuing improvements. I noticed that you transformed the uses of fputs in wc's usage function into uses of fprintf in your generated wc_clp.c. That is probably ok in most cases, but in some (with a string containing %), it's not -- unless you escape them. Besides, I switched from fprintf to fputs for a reason: so that I (and translators) don't have to worry about such escaping. In addition, I don't mind that fputs is lighter-weight than fprintf. Genparse could issue fputs () calls whenever the printed string doesn't contain arguments. I will probably change this in one of the next genparse releases. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls, tail and wc with genparse
On Sun, Oct 07, 2007 at 02:48:43PM +0200, Bruno Haible wrote: Michael Geng wrote: 1) The gettext documentation [1] recommends to not make the _() arguments unnecessarily large. The text is partitioned exactly as it is in the existing code of tail.c (I'm looking at a cvs archive copy from sept 9). This is from tail.c: fputs (_(\ --files0-from=Fread input from the files specified by\n\ NUL-terminated names in file F\n\ -L, --max-line-length print the length of the longest line\n\ -w, --wordsprint the word counts\n\ ), stdout); This is too large already now in tail.c. I haven't told Jim about it in the past because there are dozens of program maintainers who would have to change hundreds of usage messages manually. The difference is that you are writing a tool, and your tool already knows where the description of each option starts and ends. Genparse could easily generate the above code if you added more __NEW_PRINT__ directives to the genparse file like this: NONE / files0-from=Fstring read input from the files specified by NUL-terminated names in file F__NEW_PRINT__ L / max-line-length flagprint the length of the longest line__NEW_PRINT__ w / words flagprint the word counts__NEW_PRINT__ Why should the maintainer have to write __NEW_PRINT__ everywhere here? It's clear from the input where the description of each option ends. Your tool can add __NEW_PRINT__ implicitly there. I didn't know that it is desired to have the description for every command line argument in a seperate string. I could add a new command line switch to genparse which automatically splits text this way. 2) What about translator comments? How can a maintainer provide additional information for the translators? Does this work? When I grep for TRANSLATORS in the coreutils src/ directory I can find only 3 hits however ... Do you see more examples of translator comments in the coreutils code for which you think genparse should handle them? This is not a strong argument. Bruno I understand that you don't want to loose the possibility to add such translator comments. Maybe I will add a new directive to the genparse file for it in one of the next versions. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls, tail and wc with genparse
On Sun, Oct 07, 2007 at 09:47:31AM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: ... The text is partitioned exactly as it is in the existing code of tail.c (I'm looking at a cvs archive copy from sept 9). This is from tail.c: I think tail.c is still in sync, but... Please don't use cvs anymore (use git instead). Something went wrong with a git-to-cvs commit-export, so that at least the cvs ChangeLog is out of sync with the upstream git repository. I'll fix it eventually, but the point is you shouldn't rely on cvs being up to date. I retrieved the git version today and bootstrapped it with an updated cvs version of Gnulib. Unfortunately I still have problems to compile it. It stops with the following messages: ... gcc -std=gnu99 -I. -I../lib -I../lib -g -O2 -MT cp.o -MD -MP -MF .deps/cp.Tpo -c -o cp.o cp.c mv -f .deps/cp.Tpo .deps/cp.Po gcc -std=gnu99 -I. -I../lib -I../lib -g -O2 -MT copy.o -MD -MP -MF .deps/copy.Tpo -c -o copy.o copy.c copy.c:42:22: error: file-set.h: No such file or directory copy.c:48:25: error: hash-triple.h: No such file or directory copy.c: In function 'dest_info_init': copy.c:908: error: 'triple_hash' undeclared (first use in this function) copy.c:908: error: (Each undeclared identifier is reported only once copy.c:908: error: for each function it appears in.) copy.c:909: error: 'triple_compare' undeclared (first use in this function) copy.c:910: error: 'triple_free' undeclared (first use in this function) copy.c: In function 'src_info_init': copy.c:930: error: 'triple_hash_no_name' undeclared (first use in this function) copy.c:931: error: 'triple_compare' undeclared (first use in this function) copy.c:932: error: 'triple_free' undeclared (first use in this function) copy.c: In function 'copy_internal': copy.c:1040: warning: implicit declaration of function 'seen_file' copy.c:1047: warning: implicit declaration of function 'record_file' make[2]: *** [copy.o] Error 1 make[2]: Leaving directory `/home/linux/tmp/coreutils-git-10-07-2007/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/linux/tmp/coreutils-git-10-07-2007/src' make: *** [all-recursive] Fehler 1 ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
tail.c: variable pid not initialized
Hi, I don't understand where the static variable pid in tail.c gets initialized when tail is invoked without --pid option. I'm looking at the latest version in the cvs archive. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls with genparse
On Wed, Aug 29, 2007 at 11:39:46PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: ... of text. In the present version of ls the usage() function calls fputs() several times. The genparse version prints everything in 1 single call to printf(). So the usage() text in the present ls.c is split into multiple _() macros, whereas ls-clp.c uses 1 single _() macro for the whole help screen. Consider separating it into strings no longer than 509 bytes each and printing them separately. That's a portability limitation imposed by some c89 compilers. (see gcc's -Woverlength-strings) If you were to run make distcheck, this and some other problems would be exposed. For example, you added at least one function that was not declared static. With your changes does make check still pass? Yes, I verified that that it passes make check. make distcheck is new to me however, I haven't yet tried that. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls with genparse
On Tue, Aug 28, 2007 at 09:08:51PM -0600, Eric Blake wrote: According to Michael Geng on 8/28/2007 12:33 PM: In the present version of genparse new strings are always printed in new lines. For example (also from the ls commmand): d / directory flaglist directory entries instead of contents, and do not dereference symbolic links Why not make genparse a bit smarter, and let the user supply free-form text as the option description. Genparse should then wrap it to fit an 80-column screen before generating the resulting usage() in the .c file. Then the above example would simply be: d / directory flag \ list directory entries instead of contents, and do not dereference symbolic links with the __GNU_GLOSSARY__(29) being the formatting hint of where the auto-wrapping should occur in the output English text. I think that's a good idea. How about adding a --linebreak[=width] command line switch to genparse which enables breaking lines on the help screen automatically to the specified width or 80 columns if --linebreak is given without argument? Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls with genparse
On Tue, Aug 28, 2007 at 01:21:46PM -0700, Eric Blake-1 wrote: 2. ls.c depends ls-clp.h (the generated parser) ls-clp.h depends on ls.gp (the genparse file) ls.gp depends on ls.c because ls.gp is embedded as a comment in ls.c - There is a circular dependency! That seems wrong to me. Isn't it really: ls$(EXEEXT) directly depends on ls.o and ls-clp.o ls.o directly depends on ls.c and ls-clp.h ls-clp.o directly depends on ls-clp.c and ls-clp.h ls-clp.c directly depends on ls.gp ls-clp.h directly depends on ls.gp ls.gp directly depends on ls.c No cycle there, even though ls.c is an indirect dependency of ls$(EXEEXT) through more than one leg of the transitive closure. You are right. I verified this and it builds properly with the following modifications on src/Makefile.am: --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-08-29 21:14:29.0 +0200 @@ -48,7 +48,7 @@ EXTRA_DIST = dcgen dircolors.hin tac-pipe.c \ groups.sh wheel-gen.pl extract-magic c99-to-c89.diff BUILT_SOURCES = -CLEANFILES = $(SCRIPTS) su +CLEANFILES = $(SCRIPTS) su *.gp *-clp.c *-clp.h AM_CPPFLAGS = -I$(top_srcdir)/lib @@ -185,14 +185,16 @@ __SOURCES = lbracket.c cp_SOURCES = cp.c copy.c cp-hash.c -dir_SOURCES = ls.c ls-dir.c -vdir_SOURCES = ls.c ls-vdir.c -ls_SOURCES = ls.c ls-ls.c +dir_SOURCES = ls.c ls-dir.c ls-clp.c ls-clp.h +vdir_SOURCES = ls.c ls-vdir.c ls-clp.c ls-clp.h +ls_SOURCES = ls.c ls-ls.c ls-clp.c ls-clp.h chown_SOURCES = chown.c chown-core.c chgrp_SOURCES = chgrp.c chown-core.c mv_SOURCES = mv.c copy.c cp-hash.c remove.c rm_SOURCES = rm.c remove.c +tail_SOURCES = tail.c tail-clp.c +wc_SOURCES = wc.c wc-clp.c md5sum_SOURCES = md5sum.c md5sum_CPPFLAGS = -DHASH_ALGO_MD5=1 $(AM_CPPFLAGS) @@ -363,3 +365,14 @@ | grep -Ev -f $$t \ { echo 'the above variables should have static scope' 12; \ exit 1; } || : + +ls.$(OBJEXT): ls-clp.c ls-clp.h +tail.$(OBJEXT): ls-clp.c tail-clp.h +wc.$(OBJEXT): ls-clp.c wc-clp.h + +%-clp.c %-clp.h: %.gp + genparse --longmembers --internationalize -o $(*F)-clp $ + +%.gp: %.c + sed -n -e '/genparse file starts here/,/genparse file ends here/p' $(*F).c | \ + sed -e '/genparse file ends here/d' -n -e '2,$$p' $@ Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls with genparse
On Tue, Aug 28, 2007 at 01:21:46PM -0700, Eric Blake-1 wrote: +++ coreutils-6.9/src/ls.c 2007-08-26 19:58:20.0 +0200 @@ -76,7 +76,6 @@ # define SA_RESTART 0 #endif -#include system.h #include fnmatch.h Why are you deleting this include? Without it, how do you ensure that config.h is pulled in before anything else? If you intend for ls-clp.h to fill this role, then it must be included before any system files. Also, are you sure you are not falling foul of any 'make distcheck' rules in Makefile.maint? I need the following definitions in ls-clp.c: 1. the i18n macro _() 2. the definition of PACKAGE_BUGREPORT 3. the definition of true and false I got everything by including system.h in ls-clp.c. Unfortunately I had to exclude it from ls.c then because there were duplicate definitons. I entered this when I wrote the patch for the wc command and at that time I was happy to get it compiled, that's all. I'm sure there is a better solution. Maybe I have to include other files. I agree that this has to be fixed. + Cmdline(cmdline, argc, argv); GNU coding standards want a space between the function name and open (. Right, thanks. +/* Extract the following section an process it with genparse + (see http://genparse.sourceforge.net) in order to generate a parser + for the command line arguments and a usage function for printing a help + screen. */ + +/* genparse file starts here +#include config.h +#include system.h +#include ls.h + +#exit_value LS_FAILURE I know the C standard requires this, but in practice, are all C preprocessors tolerant of comments that contain lines that look like preprocessing directives but which are not? That's potentially another drawback of embedding the genparse files in the C sources. +NONE / helpflagdisplay this help and exit +NONE / version flagoutput version information and exit It looks like one drawback of using genparse is that you lose the system.h magic that ensures consistency between all the apps with --help and --version, since you can't really use the preprocessor macros *_HELP_OPTION_* here. I could imagine that this can be solved by adding the capability to include parameter definitions in a genparse file, i.e. include genparse files in other genparse files. There could be a shared genparse file with the parameter definitions for help and version which could be included by all other genparse files. +Report bugs to __STRING__(PACKAGE_BUGREPORT). What happened to the TRANSLATOR comment that reminds them to add a second line, including the address to report translation bugs to? Also, it isn't very obvious how this will affect xgettext extraction of strings that need translation. Are you sure you haven't broken things for other locales? Would the generated ls-clp.c need to be added to POTFILES.in, or is your intent still to have all translatable strings reside in ls.c? If I understood the i18n mechanism right then the C preprocesor is needed for the _() macros to take effect. So the genparse files can't be translated directly, even if they are embedded in C files because they are still inside of a comment. So I think ls-clp.c would have to be added to POTFILES.in. I haven't investigated how a genparse based solution affects i18n and I generally have very view experiance with i18n. I would expect that problems are caused by different partitioning of text. In the present version of ls the usage() function calls fputs() several times. The genparse version prints everything in 1 single call to printf(). So the usage() text in the present ls.c is split into multiple _() macros, whereas ls-clp.c uses 1 single _() macro for the whole help screen. Do you agree that this is the main source of trouble? Do you see other problems? I haven't fully thought this through but I think I could change genparse such that the user can control when a new print command should start thus giving control of partitioning translatable text to the user. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Command line parsing of ls with genparse
On Tue, Aug 28, 2007 at 10:48:55AM -0700, Eric Blake-1 wrote: 1. The usage function for the ls command has the following explanation for the -F, --classify command liune switch: append indicator (one of */=@|) to entries. Can you rewrite it as append indicator (one of * /=@|) to entries, and rely on string concatenation to produce the same output? -- Eric Blake In the present version of genparse new strings are always printed in new lines. For example (also from the ls commmand): d / directory flaglist directory entries instead of contents, and do not dereference symbolic links in the genparse (.gp) file is translated into -d, --directorylist directory entries instead of contents,\n\ and do not dereference symbolic links\n\ in the generated usage() function. Or NONE / group-directories-first flag group directories before files could be used to generate --group-directories-first\n\ group directories before files\n\ in the usage() function. So presently it doesn't work, but your idea is not bad. I will think about it again and maybe change genparse such that multiple comment strings in the same line will simply be concatenated. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
[PATCH] Command line parsing of tail with genparse
Hi Jim, hello everybody, genparse version 0.6.8 has support for command line parameters with optional arguments now. This was the most important missing feature for generating the command line parsing code for the tail command. I focused on the tail command because in a previous email you suggested tail as a candidate for genparse. The attached patch contains the genparse extensions for both the wc command (as previously posted) and the tail command. I verified that it successfully passes make check. I also added tail to the examples on the genparse homepage (http://genparse.sourceforge.net/examples.html) so that you can have a look at the generated code without downloading and building the latest version of genparse. As usual: Any comments welcome! Michael diff -u -N -r coreutils-6.9.orig/doc/Makefile.am coreutils-6.9/doc/Makefile.am --- coreutils-6.9.orig/doc/Makefile.am 2007-03-18 22:36:43.0 +0100 +++ coreutils-6.9/doc/Makefile.am 2007-08-19 13:45:25.0 +0200 @@ -32,10 +32,10 @@ # old systems. AM_MAKEINFOFLAGS = --no-split -constants.texi: $(top_srcdir)/src/tail.c +constants.texi: $(top_srcdir)/src/tail.h LC_ALL=C \ sed -n -e 's/^#define \(DEFAULT_MAX[_A-Z]*\) \(.*\)/@set \1 \2/p' \ - $(top_srcdir)/src/tail.c t-$@ + $(top_srcdir)/src/tail.h t-$@ mv t-$@ $@ MAINTAINERCLEANFILES = constants.texi diff -u -N -r coreutils-6.9.orig/doc/Makefile.in coreutils-6.9/doc/Makefile.in --- coreutils-6.9.orig/doc/Makefile.in 2007-03-22 22:20:21.0 +0100 +++ coreutils-6.9/doc/Makefile.in 2007-08-19 13:46:04.0 +0200 @@ -937,10 +937,10 @@ uninstall-ps-am -constants.texi: $(top_srcdir)/src/tail.c +constants.texi: $(top_srcdir)/src/tail.h LC_ALL=C \ sed -n -e 's/^#define \(DEFAULT_MAX[_A-Z]*\) \(.*\)/@set \1 \2/p' \ - $(top_srcdir)/src/tail.c t-$@ + $(top_srcdir)/src/tail.h t-$@ mv t-$@ $@ $(DVIS): $(EXTRA_DIST) diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-08-19 13:46:56.0 +0200 @@ -363,3 +363,12 @@ | grep -Ev -f $$t \ { echo 'the above variables should have static scope' 12; \ exit 1; } || : + +wc_SOURCES = wc_clp.c wc.c +wc_clp.c: wc.gp + genparse --longmembers --internationalize -o wc_clp wc.gp + + +tail_SOURCES = tail_clp.c tail.c +tail_clp.c: tail.gp + genparse --longmembers --internationalize -o tail_clp tail.gp diff -u -N -r coreutils-6.9.orig/src/Makefile.in coreutils-6.9/src/Makefile.in --- coreutils-6.9.orig/src/Makefile.in 2007-03-22 22:23:20.0 +0100 +++ coreutils-6.9/src/Makefile.in 2007-08-19 13:47:14.0 +0200 @@ -569,8 +569,8 @@ tac_LDADD = $(LDADD) tac_DEPENDENCIES = ../lib/libcoreutils.a $(am__DEPENDENCIES_1) \ ../lib/libcoreutils.a -tail_SOURCES = tail.c -tail_OBJECTS = tail.$(OBJEXT) +am_tail_OBJECTS = tail_clp.$(OBJEXT) tail.$(OBJEXT) +tail_OBJECTS = $(am_tail_OBJECTS) tail_DEPENDENCIES = $(am__DEPENDENCIES_3) tee_SOURCES = tee.c tee_OBJECTS = tee.$(OBJEXT) @@ -636,8 +636,8 @@ vdir_OBJECTS = $(am_vdir_OBJECTS) vdir_DEPENDENCIES = $(am__DEPENDENCIES_2) $(am__DEPENDENCIES_1) \ $(am__DEPENDENCIES_1) $(am__DEPENDENCIES_1) -wc_SOURCES = wc.c -wc_OBJECTS = wc.$(OBJEXT) +am_wc_OBJECTS = wc_clp.$(OBJEXT) wc.$(OBJEXT) +wc_OBJECTS = $(am_wc_OBJECTS) wc_LDADD = $(LDADD) wc_DEPENDENCIES = ../lib/libcoreutils.a $(am__DEPENDENCIES_1) \ ../lib/libcoreutils.a @@ -677,10 +677,10 @@ rmdir.c seq.c setuidgid.c $(sha1sum_SOURCES) \ $(sha224sum_SOURCES) $(sha256sum_SOURCES) $(sha384sum_SOURCES) \ $(sha512sum_SOURCES) shred.c shuf.c sleep.c sort.c split.c \ - stat.c stty.c su.c sum.c sync.c tac.c tail.c tee.c test.c \ - touch.c tr.c true.c tsort.c tty.c uname.c unexpand.c uniq.c \ - unlink.c uptime.c users.c $(vdir_SOURCES) wc.c who.c whoami.c \ - yes.c + stat.c stty.c su.c sum.c sync.c tac.c $(tail_SOURCES) tee.c \ + test.c touch.c tr.c true.c tsort.c tty.c uname.c unexpand.c \ + uniq.c unlink.c uptime.c users.c $(vdir_SOURCES) $(wc_SOURCES) \ + who.c whoami.c yes.c DIST_SOURCES = $(__SOURCES) base64.c basename.c cat.c $(chgrp_SOURCES) \ chmod.c $(chown_SOURCES) chroot.c cksum.c comm.c $(cp_SOURCES) \ csplit.c cut.c date.c dd.c df.c $(dir_SOURCES) dircolors.c \ @@ -693,10 +693,10 @@ rmdir.c seq.c setuidgid.c $(sha1sum_SOURCES) \ $(sha224sum_SOURCES) $(sha256sum_SOURCES) $(sha384sum_SOURCES) \ $(sha512sum_SOURCES) shred.c shuf.c sleep.c sort.c split.c \ - stat.c stty.c su.c sum.c sync.c tac.c tail.c tee.c test.c \ - touch.c tr.c true.c tsort.c tty.c uname.c unexpand.c uniq.c \ - unlink.c uptime.c users.c $(vdir_SOURCES) wc.c who.c whoami.c \ -
Re: [PATCH] Command line parsing of wc with genparse
On Mon, Jul 16, 2007 at 09:55:47PM +0200, Michael Geng wrote: On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: I released version 0.6.6 of genparse which supports internationalization now. A link to the download page and to the updated documentation can be accessed from the genparse project page http://genparse.sourceforge.net/. I also prepared a patch which demonstrates how genparse could be used for parsing the command line arguments of the wc command. The patch is relative to the coreutils version 6.9. In order to compile the coreutils after you applied the patch you will need genparse version 0.6.6 in your path. Does this approach appear reasonable to you? What else should be improved on genparse? I took a quick look only at the patch (not at the code genparse generates) and spotted some minor problems: + if (cmdline.version) +{ + version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS, Here, it should be PROGRAM_NAME, not program_name. I changed it to PROGRAM_NAME. Thanks. diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp --- coreutils-6.9.orig/src/wc.gp 1970-01-01 01:00:00.0 +0100 +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200 @@ -0,0 +1,24 @@ ... +Report bugs to [EMAIL PROTECTED]. +#usage_end s/fileutils/coreutils/ or, perhaps better: PACKAGE_BUGREPORT Again I released a new version of genparse (0.6.7) which now allows to include string macros into the usage() function. So genparse is now able to use the PACKAGE_BUGREPORT macro. Did you ensure that wc still passes make check? I did this now - and I found a bug: wc -l doesn't work because I didn't assign the print_lines variable. That's not a bug in genparse but in my hand written code in wc.c. But after fixing this make check still fails because the usage() function doesn't print a proper PACKAGE_BUGREPORT as you also found. I 'm attaching an updated patch which passes make check. You need genpare version 0.6.7 in your path in order to compile coreutils after you applied the patch. Make sure valgrind wc ... shows no leaks, too. In particular, test that with the --files0-from=F option. valgrind wc --files0-from=F tells me something about 3 not-freed blocks. But it gives exactly the same message if I'm executing valgrind wc --files0-from=F from a coreutils build from the original sources of version 6.9. Is there a known memory leak with the --files0-from=F option of wc? I tried vagrind wc ... with some other options but it didn't report any errors except for the --files0-from option. Is there any significant difference in performance (I doubt it) or binary size? I inspected the wc which was generated using genparse with the original one and found that the .text segment increased by 16 bytes. BTW, how can I build the coreutils without debug information? configure CFLAGS=-O2 -g0 doesn't remove debug information completely. Would you build coreutils differently for comparing binary file sizes? How exactly would you compare binary file sizes? I think comparing the .text segment size is not enough. I suggest you also adapt a program that takes integer arguments. Can genparse handle arguments of all different types, from int through uintmax_t? Presently genparse is able to handle int, float, char, string or flag types. No unsigned int for example. That's another extension which I will have to add. tail looks like a good candidate. It also had an optional argument: --follow[={name|descriptor}] There are quite a few programs in the coreutils that have unusual argument-parsing constraints, so it's hard to point to a small subset and say if you adapt just these few, then that should be proof enough. I'm afraid that if you're interested in having coreutils convert, you'll have to do most of the work, and then demonstrate that there's no resulting regression (and presumably cleaner code). For now, let's start with tail, and maybe ls if you're ambitious :) That's probably a good path to move on, but now that I started with the wc command I wanted to get this one working first. Another detail: IMHO, the contents of your wc.gp file should reside in wc.c. Might as well leave it in a delimited comment in place of the usage function you're eliminating, then have a makefile rule use sed or awk to extract it into wc.gp. Could be done this way. Michael diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-07-10 18:17:23.0 +0200 @@ -363,3 +363,7 @@ | grep -Ev -f $$t \ { echo 'the above variables should have
Re: [PATCH] Command line parsing of wc with genparse
On Mon, Jul 16, 2007 at 10:02:13AM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: I released version 0.6.6 of genparse which supports internationalization now. A link to the download page and to the updated documentation can be accessed from the genparse project page http://genparse.sourceforge.net/. I also prepared a patch which demonstrates how genparse could be used for parsing the command line arguments of the wc command. The patch is relative to the coreutils version 6.9. In order to compile the coreutils after you applied the patch you will need genparse version 0.6.6 in your path. Does this approach appear reasonable to you? What else should be improved on genparse? I took a quick look only at the patch (not at the code genparse generates) and spotted some minor problems: + if (cmdline.version) +{ + version_etc (stdout, program_name, GNU_PACKAGE, VERSION, AUTHORS, Here, it should be PROGRAM_NAME, not program_name. ... diff -u -N -r coreutils-6.9.orig/src/wc.gp coreutils-6.9/src/wc.gp --- coreutils-6.9.orig/src/wc.gp1970-01-01 01:00:00.0 +0100 +++ coreutils-6.9/src/wc.gp 2007-07-10 18:17:33.0 +0200 @@ -0,0 +1,24 @@ ... +Report bugs to [EMAIL PROTECTED]. +#usage_end s/fileutils/coreutils/ or, perhaps better: PACKAGE_BUGREPORT Did you ensure that wc still passes make check? I did this now - and I found a bug: wc -l doesn't work because I didn't assign the print_lines variable. That's not a bug in genparse but in my hand written code in wc.c. But after fixing this make check still fails because the usage() function doesn't print a proper PACKAGE_BUGREPORT as you also found. Make sure valgrind wc ... shows no leaks, too. In particular, test that with the --files0-from=F option. Is there any significant difference in performance (I doubt it) or binary size? I suggest you also adapt a program that takes integer arguments. Can genparse handle arguments of all different types, from int through uintmax_t? tail looks like a good candidate. It also had an optional argument: --follow[={name|descriptor}] There are quite a few programs in the coreutils that have unusual argument-parsing constraints, so it's hard to point to a small subset and say if you adapt just these few, then that should be proof enough. I'm afraid that if you're interested in having coreutils convert, you'll have to do most of the work, and then demonstrate that there's no resulting regression (and presumably cleaner code). For now, let's start with tail, and maybe ls if you're ambitious :) Another detail: IMHO, the contents of your wc.gp file should reside in wc.c. Might as well leave it in a delimited comment in place of the usage function you're eliminating, then have a makefile rule use sed or awk to extract it into wc.gp. I will need some time to think about all the issues you mentioned and find solutions. I appreciate your constructive commments! Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
[PATCH] Command line parsing of wc with genparse
Hi, I released version 0.6.6 of genparse which supports internationalization now. A link to the download page and to the updated documentation can be accessed from the genparse project page http://genparse.sourceforge.net/. I also prepared a patch which demonstrates how genparse could be used for parsing the command line arguments of the wc command. The patch is relative to the coreutils version 6.9. In order to compile the coreutils after you applied the patch you will need genparse version 0.6.6 in your path. Does this approach appear reasonable to you? What else should be improved on genparse? Michael diff -u -N -r coreutils-6.9.orig/src/Makefile.am coreutils-6.9/src/Makefile.am --- coreutils-6.9.orig/src/Makefile.am 2007-03-20 08:24:27.0 +0100 +++ coreutils-6.9/src/Makefile.am 2007-07-10 18:17:23.0 +0200 @@ -363,3 +363,7 @@ | grep -Ev -f $$t \ { echo 'the above variables should have static scope' 12; \ exit 1; } || : + +wc_SOURCES = wc_clp.c wc.c +wc_clp.c: wc.gp + genparse --longmembers --internationalize -o wc_clp wc.gp diff -u -N -r coreutils-6.9.orig/src/wc.c coreutils-6.9/src/wc.c --- coreutils-6.9.orig/src/wc.c 2007-03-18 22:36:43.0 +0100 +++ coreutils-6.9/src/wc.c 2007-07-10 18:49:27.0 +0200 @@ -24,13 +24,13 @@ #include getopt.h #include sys/types.h -#include system.h #include error.h #include inttostr.h #include quote.h #include readtokens0.h #include safe-read.h #include wcwidth.h +#include wc_clp.h #if !defined iswspace !HAVE_ISWSPACE # define iswspace(wc) \ @@ -77,60 +77,6 @@ struct stat st; }; -/* For long options that have no equivalent short option, use a - non-character as a pseudo short option, starting with CHAR_MAX + 1. */ -enum -{ - FILES0_FROM_OPTION = CHAR_MAX + 1 -}; - -static struct option const longopts[] = -{ - {bytes, no_argument, NULL, 'c'}, - {chars, no_argument, NULL, 'm'}, - {lines, no_argument, NULL, 'l'}, - {words, no_argument, NULL, 'w'}, - {files0-from, required_argument, NULL, FILES0_FROM_OPTION}, - {max-line-length, no_argument, NULL, 'L'}, - {GETOPT_HELP_OPTION_DECL}, - {GETOPT_VERSION_OPTION_DECL}, - {NULL, 0, NULL, 0} -}; - -void -usage (int status) -{ - if (status != EXIT_SUCCESS) -fprintf (stderr, _(Try `%s --help' for more information.\n), -program_name); - else -{ - printf (_(\ -Usage: %s [OPTION]... [FILE]...\n\ - or: %s [OPTION]... --files0-from=F\n\ -), - program_name, program_name); - fputs (_(\ -Print newline, word, and byte counts for each FILE, and a total line if\n\ -more than one FILE is specified. With no FILE, or when FILE is -,\n\ -read standard input.\n\ - -c, --bytesprint the byte counts\n\ - -m, --charsprint the character counts\n\ - -l, --linesprint the newline counts\n\ -), stdout); - fputs (_(\ - --files0-from=Fread input from the files specified by\n\ - NUL-terminated names in file F\n\ - -L, --max-line-length print the length of the longest line\n\ - -w, --wordsprint the word counts\n\ -), stdout); - fputs (HELP_OPTION_DESCRIPTION, stdout); - fputs (VERSION_OPTION_DESCRIPTION, stdout); - printf (_(\nReport bugs to %s.\n), PACKAGE_BUGREPORT); -} - exit (status); -} - /* FILE is the name of the file (or NULL for standard input) associated with the specified counters. */ static void @@ -580,6 +526,7 @@ char *files_from = NULL; struct fstatus *fstatus; struct Tokens tok; + struct arg_t cmdline; initialize_main (argc, argv); program_name = argv[0]; @@ -589,44 +536,27 @@ atexit (close_stdout); - print_lines = print_words = print_chars = print_bytes = false; - print_linelength = false; total_lines = total_words = total_chars = total_bytes = max_line_length = 0; - while ((optc = getopt_long (argc, argv, clLmw, longopts, NULL)) != -1) -switch (optc) - { - case 'c': - print_bytes = true; - break; - - case 'm': - print_chars = true; - break; - - case 'l': - print_lines = true; - break; - - case 'w': - print_words = true; - break; - - case 'L': - print_linelength = true; - break; - - case FILES0_FROM_OPTION: - files_from = optarg; - break; - - case_GETOPT_HELP_CHAR; - - case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS); - - default: - usage (EXIT_FAILURE); - } + Cmdline(cmdline, argc, argv); + print_bytes = cmdline.bytes; + print_chars = cmdline.lines; + print_words = cmdline.words; + print_linelength = cmdline.max_line_length; + files_from = cmdline.files0_from; + print_bytes = cmdline.bytes; + print_bytes = cmdline.bytes; + print_bytes = cmdline.bytes; + + if (cmdline.help) +usage (EXIT_SUCCESS,
Re: Simplifying command line parsing with Genparse
On Sat, Jun 16, 2007 at 07:23:09PM +0200, Michael Geng wrote: On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: However, before I even consider it seriously, it'll need some improvements: - it must detect any and all write failures[*] I just added a new release of genparse (http://sourceforge.net/project/showfiles.php?group_id=4341package_id=4354release_id=516584) which now also checks file write and close operations, not only file open operations as it was before. - packaging so that I can run ./configure make make check, and if I don't happen to have cppunit infrastructure, gcj or something similar, it should tell me about it directly, rather than causing harder-to-diagnose build failures. I went through all these tools and updated configure.ac (and related files) such that you get a warning if something is missing but you can still compile genparse. For example if you don't have gcj installed then it automatically removes the java example from the list of build targets. It now checks for cunit, cppunit, junit, gcj, doxygen, texi2html, texi2pdf and man2html. - one of the generated .c files I looked at calls strdup but doesn't check for a NULL return value or (less important) attempt to avoid the leak when the corresponding --backup=S option appears twice or more. I'm working on that and I'll let you know when this is finished. I again released a new version of genparse (0.6.5) which doesn't allocate any memory on the heap any more because I think this is a better concept. So it doesn't generate any calls to strdup() or malloc() any more. As a consequence the parser function changed from struct arg_t *Cmdline(int argc, char *argv[]) to void Cmdline(struct arg_t *my_args, int argc, char *argv[]). See the Genparse examples page (http://genparse.sourceforge.net/examples.html) which is also updated. - not exactly essential, but highly recommended: it should work with the latest autoconf and automake I'm using autoconf version 2.59 and automake 1.9.5. Aditionally I renamed configure.in to configure.ac which appears to be the standard today and removed acconfig.h because it appears to be deprecated. This might have produced some warnings before. [*] with genparse-0.6.3, I get this: $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ echo exit status 0 creating /full/tmp/foo.h...done creating /full/tmp/foo.c...done exit status 0 $ tail -2 log write(3, /***..., 80) = -1 ENOSPC (No space left on device) write(1, creating /full/tmp/foo.c...done\n, 32) = 32 The two files it claims to have created are both empty, and genparse exited successfully. This means genparse is not detecting write or close failures. Note that in the example above, /full/tmp is a full file system that still has free inodes, so open can create new files, but writes always fail. I could verify this behavior and I hope it's fixed now. Now it tells you that it can't write to the file and returns with a non zero exit status. Thanks again for your input, Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Simplifying command line parsing with Genparse
On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: However, before I even consider it seriously, it'll need some improvements: - it must detect any and all write failures[*] I just added a new release of genparse (http://sourceforge.net/project/showfiles.php?group_id=4341package_id=4354release_id=516584) which now also checks file write and close operations, not only file open operations as it was before. - packaging so that I can run ./configure make make check, and if I don't happen to have cppunit infrastructure, gcj or something similar, it should tell me about it directly, rather than causing harder-to-diagnose build failures. I went through all these tools and updated configure.ac (and related files) such that you get a warning if something is missing but you can still compile genparse. For example if you don't have gcj installed then it automatically removes the java example from the list of build targets. It now checks for cunit, cppunit, junit, gcj, doxygen, texi2html, texi2pdf and man2html. - one of the generated .c files I looked at calls strdup but doesn't check for a NULL return value or (less important) attempt to avoid the leak when the corresponding --backup=S option appears twice or more. I'm working on that and I'll let you know when this is finished. - not exactly essential, but highly recommended: it should work with the latest autoconf and automake I'm using autoconf version 2.59 and automake 1.9.5. Aditionally I renamed configure.in to configure.ac which appears to be the standard today and removed acconfig.h because it appears to be deprecated. This might have produced some warnings before. [*] with genparse-0.6.3, I get this: $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ echo exit status 0 creating /full/tmp/foo.h...done creating /full/tmp/foo.c...done exit status 0 $ tail -2 log write(3, /***..., 80) = -1 ENOSPC (No space left on device) write(1, creating /full/tmp/foo.c...done\n, 32) = 32 The two files it claims to have created are both empty, and genparse exited successfully. This means genparse is not detecting write or close failures. Note that in the example above, /full/tmp is a full file system that still has free inodes, so open can create new files, but writes always fail. I could verify this behavior and I hope it's fixed now. Now it tells you that it can't write to the file and returns with a non zero exit status. Thanks again for your input, Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Simplifying command line parsing with Genparse
On Thu, Jun 07, 2007 at 03:49:53PM -0600, Bob Proulx wrote: Michael Geng wrote: I would expect Genparse to generate faster code than argp because it does part of the work at compile time while argp does everything at run time since it's a library function. Is the performance of parsing program arguments really a concern? I would think that the difference would not be significant. I would value clean code over minor performance improvements. For coreutils I would value object code size over performance. Bob I could imagine that especially for the Coreutils speed might be more a concern than for other tools because some of the Coreutils tools are called very often. And I would also expect that with Genparse you also get smaller code than with argp again because it does part of the work while it generates the parser code while argp is a library function and so it does everything at run time. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Simplifying command line parsing with Genparse
Hi, would it be an option to use Genparse (http://genparse.sourceforge.net/) for command line parsing in the GNU Coreutils? I'm one of the developers of Genparse and I recently used some of the well known Coreutils as an exercise for testing Genparse (see http://genparse.sourceforge.net/examples.html). Using Genparse for generating the command line parsing code could reduce the amount of coreutils source code because the input to Genparse is a short config file only. The overhead of writing the parser code would be delegated to the tool then. The Genparse generated parsers call getopt() (or getopt_long()) exactly the same way the Coreutils's command line parsers do it today. This wouldn't be changed. So the code Genparse generates will be very similar to the existing hand written parsers of the individual Coreutils tools. But calling getopt() is only part of the work, preparing and evaluating the results of getopt() also causes coding work which could be delegated to Genparse. Genparse also automatically generates a help screen which would no longer have to be done manually. Michael ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Simplifying command line parsing with Genparse
On Thu, Jun 07, 2007 at 02:44:38PM +0200, Jim Meyering wrote: [EMAIL PROTECTED] (Michael Geng) wrote: would it be an option to use Genparse (http://genparse.sourceforge.net/) for command line parsing in the GNU Coreutils? I'm one of the developers of Genparse and I recently used some of the well known Coreutils as an exercise for testing Genparse (see http://genparse.sourceforge.net/examples.html). Using Genparse for generating the command line parsing code could reduce the amount of coreutils source code because the input to Genparse is a short config file only. The overhead of writing the parser code would be delegated to the tool then. The Genparse generated parsers call getopt() (or getopt_long()) exactly the same way the Coreutils's command line parsers do it today. This wouldn't be changed. So the code Genparse generates will be very similar to the existing hand written parsers of the individual Coreutils tools. But calling getopt() is only part of the work, preparing and evaluating the results of getopt() also causes coding work which could be delegated to Genparse. Genparse also automatically generates a help screen which would no longer have to be done manually. Hi Michael, Genparse looks promising. I like the examples. But there are almost 100 programs in the coreutils. If genparse can really handle all of those use cases without causing any significant degradation in the tools, then it will be hard to object. How does it deal with internationalization? There is no internationalization. I looked at the Coreutils internationalization but I neither fully understood how it works nor do I have an idea right now how to internationalize Genparse's output. internationalization appears to be a difficult thing. However, before I even consider it seriously, it'll need some improvements: I appreciate your concerns very much. All of them are valid and valuable input for Genparse development. At least some of them will need more discussion however which I think is not interesting to everybody listening to the Coreutils mailing list. For this reason I started a new thread on the genparse-users mailing list (http://sourceforge.net/mailarchive/forum.php?thread_name=20070607150633.GA26405%40t-online.deforum_name=genparse-users). Any posts to the genparse-users mailing list from anybody is welcome! I will let you (and the coreutils-bug mailing list) know if there is any progress on ony of the problems you listed. - it must detect any and all write failures[*] - packaging so that I can run ./configure make make check, and if I don't happen to have cppunit infrastructure, gcj or something similar, it should tell me about it directly, rather than causing harder-to-diagnose build failures. - one of the generated .c files I looked at calls strdup but doesn't check for a NULL return value or (less important) attempt to avoid the leak when the corresponding --backup=S option appears twice or more. - not exactly essential, but highly recommended: it should work with the latest autoconf and automake [*] with genparse-0.6.3, I get this: $ strace -o log -e write ./genparse -o /full/tmp/foo ../examples/ls.gp \ echo exit status 0 creating /full/tmp/foo.h...done creating /full/tmp/foo.c...done exit status 0 $ tail -2 log write(3, /***..., 80) = -1 ENOSPC (No space left on device) write(1, creating /full/tmp/foo.c...done\n, 32) = 32 The two files it claims to have created are both empty, and genparse exited successfully. This means genparse is not detecting write or close failures. Note that in the example above, /full/tmp is a full file system that still has free inodes, so open can create new files, but writes always fail. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils