Re: [PATCH] Make comm check order of input files
Hi, The previous version did not warn if the final record in a file was out of order and `--check-order' was not in effect. Thanks, Bo From dc34eed9e6ee34f473a8d74b98bccaf082fe79c2 Mon Sep 17 00:00:00 2001 From: Bo Borgerson [EMAIL PROTECTED] Date: Sun, 20 Apr 2008 21:24:16 -0400 Subject: [PATCH] Make comm check order of input files * NEWS: List new behavior. * doc/coreutils.texi (checkOrderOption) New macro for describing `--check-order' and `--nocheck-order', used in both join and comm. * src/comm.c (main): Initialize new options. (usage): Describe new options. (compare_files): Keep an extra pair of buffers for the previous line from each file to check the internal order. (check_order): If an order-check is required, compare and handle the result appropriately. (copylinebuffer): Copy a linebuffer; used for copy before read. * tests/misc/Makefile.am: List new test. * tests/misc/comm: Tests for the comm program, including the new order-checking functionality and attendant command-line options. Signed-off-by: Bo Borgerson [EMAIL PROTECTED] --- NEWS |8 ++ doc/coreutils.texi | 39 +++--- src/comm.c | 178 +++ tests/misc/Makefile.am |1 + tests/misc/comm| 131 +++ 5 files changed, 329 insertions(+), 28 deletions(-) create mode 100755 tests/misc/comm diff --git a/NEWS b/NEWS index 04893c6..4038da2 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,13 @@ GNU coreutils NEWS-*- outline -*- +* Noteworthy changes in release ?? + +** New features + + comm now verifies that the inputs are in sorted order. This check can + be turned off with the --nocheck-order option. + + * Noteworthy changes in release 6.11 (2008-04-19) [stable] ** Bug fixes diff --git a/doc/coreutils.texi b/doc/coreutils.texi index f42e736..5ed7f43 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -4342,6 +4342,32 @@ status that does not depend on the result of the comparison. Upon normal completion @command{comm} produces an exit code of zero. If there is an error it exits with nonzero status. [EMAIL PROTECTED] checkOrderOption{cmd} +If the @option{--check-order} option is given, unsorted inputs will +cause a fatal error message. If the option @option{--nocheck-order} +is given, unsorted inputs will never cause an error message. If +neither of these options is given, wrongly sorted inputs are diagnosed +only if an input file is found to contain unpairable lines. If an +input file is diagnosed as being unsorted, the @command{\cmd\} command +will exit with a nonzero status (and the output should not be used). + +Forcing @command{\cmd\} to process wrongly sorted input files +containing unpairable lines by specifying @option{--nocheck-order} is +not guaranteed to produce any particular output. The output will +probably not correspond with whatever you hoped it would be. [EMAIL PROTECTED] macro [EMAIL PROTECTED] + [EMAIL PROTECTED] @samp + [EMAIL PROTECTED] --check-order +Fail with an error message if either input file is wrongly ordered. + [EMAIL PROTECTED] --nocheck-order +Do not check that both input files are in sorted order. + [EMAIL PROTECTED] table + @node tsort invocation @section @command{tsort}: Topological sort @@ -5183,18 +5209,7 @@ c c1 c2 b b1 b2 @end example -If the @option{--check-order} option is given, unsorted inputs will -cause a fatal error message. If the option @option{--nocheck-order} -is given, unsorted inputs will never cause an error message. If -neither of these options is given, wrongly sorted inputs are diagnosed -only if an input file is found to contain unpairable lines. If an -input file is diagnosed as being unsorted, the @command{join} command -will exit with a nonzero status (and the output should not be used). - -Forcing @command{join} to process wrongly sorted input files -containing unpairable lines by specifying @option{--nocheck-order} is -not guaranteed to produce any particular output. The output will -probably not correspond with whatever you hoped it would be. [EMAIL PROTECTED] The defaults are: @itemize diff --git a/src/comm.c b/src/comm.c index cbda362..0a9e8b9 100644 --- a/src/comm.c +++ b/src/comm.c @@ -52,8 +52,31 @@ static bool only_file_2; /* If true, print lines that are found in both files. */ static bool both; +/* If nonzero, we have seen at least one unpairable line. */ +static bool seen_unpairable; + +/* If nonzero, we have warned about disorder in that file. */ +static bool issued_disorder_warning[2]; + +/* If nonzero, check that the input is correctly ordered. */ +static enum + { +CHECK_ORDER_DEFAULT, +CHECK_ORDER_ENABLED, +CHECK_ORDER_DISABLED + } check_input_order; + +enum +{ + CHECK_ORDER_OPTION = CHAR_MAX + 1, + NOCHECK_ORDER_OPTION +}; + + static struct option const long_options[] = { + {check-order, no_argument, NULL, CHECK_ORDER_OPTION}, + {nocheck-order,
Re: coreutils-6.11 released
Elbert Pol [EMAIL PROTECTED] wrote: Hoi Jim, and the rest I try to debug some things for os2 and it seems a hell of a job :( Espicely if you have no backgrounds about the debugger. But i saw newer Coreutils 6.11 i thought i try and the configuratie went smootly this time :P Then i did make and now it stops at gcc -std=gnu99 -D__EMX__ -DOS2 -D__ST_MT_ERRNO__ -O3 -mcpu=pentium3 -Zexe -Zomf -Zmap -Zargs-wild -Zbin-files -D__ST_MT_ERRNO__ -s -o su.exe su.o ../lib/libcoreutils.a ../lib/libcoreutils.a make.exe[3]: Leaving directory `U:/coreutils-6.11/src' make.exe[2]: Leaving directory `U:/coreutils-6.11/src' Making all in doc make.exe[2]: Entering directory `U:/coreutils-6.11/doc' make.exe[2]: Nothing to be done for `all'. make.exe[2]: Leaving directory `U:/coreutils-6.11/doc' Making all in man make.exe[2]: Entering directory `U:/coreutils-6.11/man' make.exe[2]: *** No rule to make target `uname.1 ', needed by `all-am'. Stop. make.exe[2]: Leaving directory `U:/coreutils-6.11/man' make.exe[1]: *** [all-recursive] Error 1 make.exe[1]: Leaving directory `U:/coreutils-6.11' make: *** [all] Error 2 Hi Elbert, Thanks for the report. Please show precisely what you have done, starting with the distribution tarball. E.g, if you ran these commands, say that, and send the log file: gzip -dc coreutils-6.11.tar.gz|tar xf - ./configure log 21 make log 21 ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and i18n
Bruno Haible [EMAIL PROTECTED] wrote: Jim Meyering wrote: As for i18n, some students nearly took on the project of implementing a palatable solution recently, but that's been deferred for a few months. Interesting... In 2001 you set out the following requirements for such a solution: - Processing in unibyte locales should not become significantly slower than before. - Code duplication should be avoided, for maintainability. - Macros which expand to one thing in the multibyte case and to another thing for the unibyte case are not acceptable. How will this students' project solve this dilemma? There's no guarantee, but Paul and I will be supervising. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and xattr
Mike Frysinger [EMAIL PROTECTED] wrote: On Sunday 20 April 2008, Jim Meyering wrote: Mike Frysinger [EMAIL PROTECTED] wrote: On Sunday 20 April 2008, Mike Frysinger wrote: On Sunday 20 April 2008, Jim Meyering wrote: Mike Frysinger [EMAIL PROTECTED] wrote: has work on merging Andreas' patch just stalled ? that and the big nasty i18n patch are about the only thing i carry in Gentoo anymore as everything else has been merged ... I haven't looked at any xattr-related changes for a long time. Do you know if there are other variants of that patch (probably), and if so, how they differ? so we know we're talking about the same thing, i think you're referring to the patch as you cited in an old e-mail and found here: http://www.suse.de/~agruen/coreutils/5.91/ the patch has been updated in opensuse since (i'm attaching the latest one i can find from their coreutils-6.9-43 version). the only thing i have in Gentoo beyond this patch is to add AC_ARG_ENABLE(xattr) support. hmm, you probably want one that applies nicely ;) Thank you. That's a good first step. here is the Gentoo version which applies to 6.11 Hmmm... I applied the patch and tried to build/compile. No go: copy.c: In function 'copy_xattr_filter': copy.c:191: warning: implicit declaration of function 'attr_copy_action' copy.c:192: error: 'ATTR_ACTION_SKIP' undeclared (first use in this function) copy.c:192: error: (Each undeclared identifier is reported only once copy.c:192: error: for each function it appears in.) copy.c:194: error: 'ATTR_ACTION_PERMISSIONS' undeclared (first use in this function) So maybe another patch is required? i researched it and opensuse applies a custom patch to their attr package which creates a new function / enum and coreutils leverages that (which makes for the coreutils code to be much simpler). no idea what Andreas' intentions are here though as i cant seem to find any mention with upstream attr and these changes. Let's see what he has to say... Hi Andreas, We're looking at integrating some of your xattr patches into upstream coreutils, and hit a snag. Do you know when/if upstream attr will be updated to include attr_copy_action and these ATTR_* enum values? Jim ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [OpenAFS] Re: coreutils-6.11 released
If by unknown you mean nameless, that's not what the patch does. Such a patch would not even have been considered. I agree that hiding this information in some cases might not be optimal, but the main problem is that through this the 'groups' command becomes utterly useless and confused quite a lot of users. $ groups users id: cannot find name for group ID 1091323188 1091323188 further $ id -Gn users id: cannot find name for group ID 1091323188 1091323188 Because of this I get many scripts that scan /etc/group and /etc/passwd in a loop and when I ask why they don't use 'grous' or 'id' I get Ahh this has been broken for a long time or Somehow my computer is broken. Is there a way of maybe instead of giving an error message to give back a pag name. My intent was not to hide the number it was to hide the error because people think their systems are broken even if there are not. I can see the conflict here between users and sysadmins. Does someone know if there is a way to find out if the group is a pag group or not. Then I could write a version that still shows the group number but suppresses the error. Would that be OK? Cheers Didi --- www.cern.ch/ribalba / www.ribalba.de Email / Jabber: [EMAIL PROTECTED] Phone (Work) : +41 22 7679376 Skype : ribalba Address : CERN / IT-FIO-FS / GENEVE 23/ SCHWEIZ ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and i18n
Jim Meyering wrote: - Processing in unibyte locales should not become significantly slower than before. - Code duplication should be avoided, for maintainability. - Macros which expand to one thing in the multibyte case and to another thing for the unibyte case are not acceptable. How will this students' project solve this dilemma? There's no guarantee, but Paul and I will be supervising. I mean, what is technically the solution to the dilemma? The typical idiom for keeping the speed of the unibyte case is - see e.g. gnulib/lib/mbscasecmp.c as an example - #if HAVE_MBRTOWC if (MB_CUR_MAX 1) ... multibyte case ... else #endif ... unibyte case ... but it does have code duplication. What approach are they going after? Put a big switch (c) { case 'A'..'Z': ... handle printable ASCII characters ... default: ... handle multibyte case ... } into every loop? This approach has not even sufficed for lib/mbswidth.c. Do they want to speed up the multibyte case code by some tricks? Or are you giving up one of the three requirements? If so, which one? Bruno ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and i18n
Bruno Haible [EMAIL PROTECTED] wrote: Jim Meyering wrote: - Processing in unibyte locales should not become significantly slower than before. - Code duplication should be avoided, for maintainability. - Macros which expand to one thing in the multibyte case and to another thing for the unibyte case are not acceptable. How will this students' project solve this dilemma? There's no guarantee, but Paul and I will be supervising. I mean, what is technically the solution to the dilemma? The typical idiom for keeping the speed of the unibyte case is - see e.g. gnulib/lib/mbscasecmp.c as an example - #if HAVE_MBRTOWC if (MB_CUR_MAX 1) ... multibyte case ... else #endif ... unibyte case ... but it does have code duplication. Right. That is the problem. When it's just a couple lines, it's not a big deal, but when each of 10 or more programs ends up with duplicated blocks of core logic, it's much harder to justify. What approach are they going after? Put a big switch (c) { case 'A'..'Z': ... handle printable ASCII characters ... default: ... handle multibyte case ... } into every loop? This approach has not even sufficed for lib/mbswidth.c. Do they want to speed up the multibyte case code by some tricks? Or are you giving up one of the three requirements? If so, which one? I'm open to all reasonable solutions, especially when accompanied with sample code. BTW, your #3 looks like something written (or paraphrased?) by you, not me: - Macros which expand to one thing in the multibyte case and to another thing for the unibyte case are not acceptable. That does ring a faint bell, but I don't have time to dig right now. Do you recall the context? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and i18n
Bruno Haible wrote: Jim Meyering wrote: - Processing in unibyte locales should not become significantly slower than before. - Code duplication should be avoided, for maintainability. - Macros which expand to one thing in the multibyte case and to another thing for the unibyte case are not acceptable. How will this students' project solve this dilemma? There's no guarantee, but Paul and I will be supervising. I mean, what is technically the solution to the dilemma? The typical idiom for keeping the speed of the unibyte case is - see e.g. gnulib/lib/mbscasecmp.c as an example - #if HAVE_MBRTOWC if (MB_CUR_MAX 1) ... multibyte case ... else #endif ... unibyte case ... but it does have code duplication. That's the obvious solution that is not really required/desired. If I was being paid to do it (I have very little free time unfortunately), then I would do something like... 1. identify filters that require multibyte handling. 2. refactor line input processing etc. to shared code. 3. Intelligently apply multibyte processing. For illustration look at the performance various `uniq` implementations currently: $ rpm -q coreutils coreutils-6.9-9.fc8 $ echo $LANG en_IE.UTF-8 # The default one uses the existing i18n patch $ time uniq lines.test /dev/null real0m27.724s $ time LC_CTYPE=C uniq lines.test /dev/null real0m1.314s $time ~/git/coreutils/src/uniq lines.test /dev/null real0m1.187s $ time ~/myuniq lines.test /dev/null real0m0.827s $ time ~/uniq.py lines.test /dev/null real0m2.657s Yes the python version (which I nearly wrote in the same time and the default uniq took to complete the test) is much better! `myuniq` is a version I implemented from scratch, to understand some of what the issues involved would be: http://lists.gnu.org/archive/html/bug-coreutils/2006-07/msg00153.html It's not just performance. The functionality of the i18n patch for uniq is buggy in the presence of NUL characters for example: for i in 1 2 3; do echo -e 1234\x0056789; done | uniq 123456789 123456789 123456789 for i in 1 2 3; do echo -e 1234\x0056789; done | LANG=C uniq 123456789 It's great that Paul Jim are looking at this interesting project as it really is important as I've mentioned before. cheers, Pádraig. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [PATCH] Make comm check order of input files
Hi, Pádraig pointed out that there's no reason to copy data around here. This version avoids the copies. Thanks Pádraig, Bo From 49ec3883efc8a89e8a4260f25bb50178aced1be4 Mon Sep 17 00:00:00 2001 From: Bo Borgerson [EMAIL PROTECTED] Date: Sun, 20 Apr 2008 21:24:16 -0400 Subject: [PATCH] Make comm check order of input files * NEWS: List new behavior. * doc/coreutils.texi (checkOrderOption) New macro for describing `--check-order' and `--nocheck-order', used in both join and comm. * src/comm.c (main): Initialize new options. (usage): Describe new options. (compare_files): Keep an extra pair of buffers for the previous line from each file to check the internal order. (check_order): If an order-check is required, compare and handle the result appropriately. (copylinebuffer): Copy a linebuffer; used for copy before read. * tests/misc/Makefile.am: List new test. * tests/misc/comm: Tests for the comm program, including the new order-checking functionality and attendant command-line options. Signed-off-by: Bo Borgerson [EMAIL PROTECTED] --- NEWS |8 ++ doc/coreutils.texi | 39 src/comm.c | 166 ++-- tests/misc/Makefile.am |1 + tests/misc/comm| 131 ++ 5 files changed, 313 insertions(+), 32 deletions(-) create mode 100755 tests/misc/comm diff --git a/NEWS b/NEWS index 04893c6..4038da2 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,13 @@ GNU coreutils NEWS-*- outline -*- +* Noteworthy changes in release ?? + +** New features + + comm now verifies that the inputs are in sorted order. This check can + be turned off with the --nocheck-order option. + + * Noteworthy changes in release 6.11 (2008-04-19) [stable] ** Bug fixes diff --git a/doc/coreutils.texi b/doc/coreutils.texi index f42e736..5ed7f43 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -4342,6 +4342,32 @@ status that does not depend on the result of the comparison. Upon normal completion @command{comm} produces an exit code of zero. If there is an error it exits with nonzero status. [EMAIL PROTECTED] checkOrderOption{cmd} +If the @option{--check-order} option is given, unsorted inputs will +cause a fatal error message. If the option @option{--nocheck-order} +is given, unsorted inputs will never cause an error message. If +neither of these options is given, wrongly sorted inputs are diagnosed +only if an input file is found to contain unpairable lines. If an +input file is diagnosed as being unsorted, the @command{\cmd\} command +will exit with a nonzero status (and the output should not be used). + +Forcing @command{\cmd\} to process wrongly sorted input files +containing unpairable lines by specifying @option{--nocheck-order} is +not guaranteed to produce any particular output. The output will +probably not correspond with whatever you hoped it would be. [EMAIL PROTECTED] macro [EMAIL PROTECTED] + [EMAIL PROTECTED] @samp + [EMAIL PROTECTED] --check-order +Fail with an error message if either input file is wrongly ordered. + [EMAIL PROTECTED] --nocheck-order +Do not check that both input files are in sorted order. + [EMAIL PROTECTED] table + @node tsort invocation @section @command{tsort}: Topological sort @@ -5183,18 +5209,7 @@ c c1 c2 b b1 b2 @end example -If the @option{--check-order} option is given, unsorted inputs will -cause a fatal error message. If the option @option{--nocheck-order} -is given, unsorted inputs will never cause an error message. If -neither of these options is given, wrongly sorted inputs are diagnosed -only if an input file is found to contain unpairable lines. If an -input file is diagnosed as being unsorted, the @command{join} command -will exit with a nonzero status (and the output should not be used). - -Forcing @command{join} to process wrongly sorted input files -containing unpairable lines by specifying @option{--nocheck-order} is -not guaranteed to produce any particular output. The output will -probably not correspond with whatever you hoped it would be. [EMAIL PROTECTED] The defaults are: @itemize diff --git a/src/comm.c b/src/comm.c index cbda362..b2b2bba 100644 --- a/src/comm.c +++ b/src/comm.c @@ -52,8 +52,31 @@ static bool only_file_2; /* If true, print lines that are found in both files. */ static bool both; +/* If nonzero, we have seen at least one unpairable line. */ +static bool seen_unpairable; + +/* If nonzero, we have warned about disorder in that file. */ +static bool issued_disorder_warning[2]; + +/* If nonzero, check that the input is correctly ordered. */ +static enum + { +CHECK_ORDER_DEFAULT, +CHECK_ORDER_ENABLED, +CHECK_ORDER_DISABLED + } check_input_order; + +enum +{ + CHECK_ORDER_OPTION = CHAR_MAX + 1, + NOCHECK_ORDER_OPTION +}; + + static struct option const long_options[] = { + {check-order, no_argument, NULL, CHECK_ORDER_OPTION}, + {nocheck-order,
Re: [PATCH] Make comm check order of input files
Bo Borgerson [EMAIL PROTECTED] wrote: Pádraig pointed out that there's no reason to copy data around here. This version avoids the copies. Thanks Pádraig, Thanks from me, too. If you guys can help by reviewing others' changes, that would help me. In the run-up to 6.11, quite a few large patches have accumulated, and all by myself, it's going to take a while to work through all that. If you do review something, and find nothing wrong with it, please say so, publicly. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils-6.11 released
Hoi Jim, I used the lzma and after that i untar the tar file I attach the log Hope it's fixable :) Jim Meyering wrote: Elbert Pol[EMAIL PROTECTED] wrote: Hoi Jim, and the rest I try to debug some things for os2 and it seems a hell of a job :( Espicely if you have no backgrounds about the debugger. But i saw newer Coreutils 6.11 i thought i try and the configuratie went smootly this time :P Then i did make and now it stops at gcc -std=gnu99 -D__EMX__ -DOS2 -D__ST_MT_ERRNO__ -O3 -mcpu=pentium3 -Zexe -Zomf -Zmap -Zargs-wild -Zbin-files -D__ST_MT_ERRNO__ -s -o su.exe su.o ../lib/libcoreutils.a ../lib/libcoreutils.a make.exe[3]: Leaving directory `U:/coreutils-6.11/src' make.exe[2]: Leaving directory `U:/coreutils-6.11/src' Making all in doc make.exe[2]: Entering directory `U:/coreutils-6.11/doc' make.exe[2]: Nothing to be done for `all'. make.exe[2]: Leaving directory `U:/coreutils-6.11/doc' Making all in man make.exe[2]: Entering directory `U:/coreutils-6.11/man' make.exe[2]: *** No rule to make target `uname.1 ', needed by `all-am'. Stop. make.exe[2]: Leaving directory `U:/coreutils-6.11/man' make.exe[1]: *** [all-recursive] Error 1 make.exe[1]: Leaving directory `U:/coreutils-6.11' make: *** [all] Error 2 Hi Elbert, Thanks for the report. Please show precisely what you have done, starting with the distribution tarball. E.g, if you ran these commands, say that, and send the log file: gzip -dc coreutils-6.11.tar.gz|tar xf - ./configure log 21 make log 21 log.bz2 Description: Binary data ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
wordcount (wc)
Hello, I have been using the 'wc' program (version 5.97) to manually verify some counts outputted by a component part of an application I am developing. I noticed that: echo 12345 | wc -m Gives me '6' as output. But I don't entirely understand why. On multi-line input 'wc' seems to add '1' to the character count in each sentence. One would say then that this '1' is caused by counting 'invisible' newline characters, but there is no newline in the example above. This off-by-one is probably intended behaviour (even though I am curious to find out why). I would expect something about this to be listed in the man page of 'wc', but could not find it there. With kind regards, Almer S. Tigelaar ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: wordcount (wc)
On Mon, Apr 21, 2008 at 9:27 AM, Almer S. Tigelaar [EMAIL PROTECTED] wrote: Hello, I have been using the 'wc' program (version 5.97) to manually verify some counts outputted by a component part of an application I am developing. I noticed that: echo 12345 | wc -m Gives me '6' as output. But I don't entirely understand why. On multi-line input 'wc' seems to add '1' to the character count in each sentence. One would say then that this '1' is caused by counting 'invisible' newline characters, but there is no newline in the example above. This off-by-one is probably intended behaviour (even though I am curious to find out why). I would expect something about this to be listed in the man page of 'wc', but could not find it there. Its counting the trailing newline. $ echo 12345 | wc -m 6 $ printf 12345\n | wc -m 6 $ printf 12345 | wc -m 5 $ echo -n 12345 | wc -m 5 Brock ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: wordcount (wc)
Hi, On Mon, Apr 21, 2008 at 04:27:35PM +0200, Almer S. Tigelaar wrote: I have been using the 'wc' program (version 5.97) to manually verify some counts outputted by a component part of an application I am developing. I noticed that: echo 12345 | wc -m Gives me '6' as output. But I don't entirely understand why. On multi-line input 'wc' seems to add '1' to the character count in each sentence. One would say then that this '1' is caused by counting 'invisible' newline characters, but there is no newline in the example above. There is a newline added by echo. Use echo -n to avoid this. Erik ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [OpenAFS] Re: coreutils-6.11 released
Didi [EMAIL PROTECTED] wrote: If by unknown you mean nameless, that's not what the patch does. Such a patch would not even have been considered. I agree that hiding this information in some cases might not be optimal, but the main problem is that through this the 'groups' command becomes utterly useless and confused quite a lot of users. $ groups users id: cannot find name for group ID 1091323188 1091323188 further $ id -Gn users id: cannot find name for group ID 1091323188 1091323188 If someone can provide code to determine efficiently whether a nameless GID is a PAG then we can probably make everyone happy. If that happens, I'll need to know if there's a standard or accepted mapping from GID to PAG group name. Pointers to unencumbered code would be welcome. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: [OpenAFS] Re: coreutils-6.11 released
On Apr 21, 2008, at 4:23 , Didi wrote: If by unknown you mean nameless, that's not what the patch does. Such a patch would not even have been considered. I agree that hiding this information in some cases might not be optimal, but the main problem is that through this the 'groups' command becomes utterly useless and confused quite a lot of users. $ groups users id: cannot find name for group ID 1091323188 1091323188 I touched on a solution to that in the part of my message that you elided. I admit I was rather confused by what exactly was being cited as the problem, but this is on I do know about. (I'd actually love to see some kind of plugin setup so an AFS PAG shower could be added, but at the same time that seems just a little bit silly/ overblown for simple stuff like id and groups.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
test revamp pushed: new snapshot available: coreutils-6.11.8-a6894
FYI, I've just pushed a big change to the test infrastructure. It eliminates most of the test/*/Makefile.am files and also solves Bruno's problem where tests fail if . is too early in PATH. coreutils snapshot: http://meyering.net/cu/coreutils-ss.tar.gz8.7 MB http://meyering.net/cu/coreutils-ss.tar.lzma 3.6 MB http://meyering.net/cu/coreutils-ss.tar.gz.sig http://meyering.net/cu/coreutils-ss.tar.lzma.sig aka http://meyering.net/cu/coreutils-6.11.8-a6894.tar.gz http://meyering.net/cu/coreutils-6.11.8-a6894.tar.lzma Changes since 6.11: Jim Meyering (8): * .prev-version: Record previous version: 6.11. Use env to invoke potential built-ins. * tests/misc/Makefile.am (built_programs): Remove. Unused. Revamp test-related Makefiles. tests: clean up root tests; adapt to new layout tests: adjust perl -I to use $top_srcdir/tests, not $srcdir/.. tests: convert umask-check to a function tests: skip (don't fail) rm/one-file-system when mount --bind fails ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: coreutils and i18n
Jim Meyering wrote: I'm open to all reasonable solutions, especially when accompanied with sample code. This is the proposed sample code: the 'expand' program. Here the core of the program is in the single function expand(). The proposed solution is like this. It uses a set of macros, which - in uppercase - are like the lowercase functions/macros that are already present in gnulib. (Attached to this mail.) The patch is relative to coreutils-6.11. The transformation of the code from using 'char' to using 'MBF_CHAR' took me less than half an hour. You could do the 'unexpand' program in another half an hour. 2008-04-22 Bruno Haible [EMAIL PROTECTED] Make 'expand' work in multibyte locales. * src/expandloop.h: New file, extracted from src/expand.c. * src/expand.c: Include expandloop.h twice. (expand): Dispatch between multibyte and unibyte locales. *** src/expand.c.bak2008-04-19 23:34:23.0 +0200 --- src/expand.c2008-04-22 03:53:20.0 +0200 *** *** 267,367 /* Change tabs to spaces, writing to stdout. Read each file in `file_list', in order. */ static void expand (void) { ! /* Input stream. */ ! FILE *fp = next_file (NULL); ! ! if (!fp) ! return; ! ! for (;;) ! { ! /* Input character, or EOF. */ ! int c; ! ! /* If true, perform translations. */ ! bool convert = true; ! ! ! /* The following variables have valid values only when CONVERT !is true: */ ! ! /* Column of next input character. */ ! uintmax_t column = 0; ! ! /* Index in TAB_LIST of next tab stop to examine. */ ! size_t tab_index = 0; ! ! ! /* Convert a line of text. */ ! ! do ! { ! while ((c = getc (fp)) 0 (fp = next_file (fp))) ! continue; ! ! if (convert) ! { ! if (c == '\t') ! { ! /* Column the next input tab stop is on. */ ! uintmax_t next_tab_column; ! ! if (tab_size) ! next_tab_column = column + (tab_size - column % tab_size); ! else ! for (;;) ! if (tab_index == first_free_tab) ! { ! next_tab_column = column + 1; ! break; ! } ! else ! { ! uintmax_t tab = tab_list[tab_index++]; ! if (column tab) ! { ! next_tab_column = tab; ! break; ! } ! } ! ! if (next_tab_column column) ! error (EXIT_FAILURE, 0, _(input line is too long)); ! ! while (++column next_tab_column) ! if (putchar (' ') 0) ! error (EXIT_FAILURE, errno, _(write error)); ! ! c = ' '; ! } ! else if (c == '\b') ! { ! /* Go back one column, and force recalculation of the !next tab stop. */ ! column -= !!column; ! tab_index -= !!tab_index; ! } ! else ! { ! column++; ! if (!column) ! error (EXIT_FAILURE, 0, _(input line is too long)); ! } ! ! convert = convert_entire_line | !! isblank (c); ! } ! ! if (c 0) ! return; ! ! if (putchar (c) 0) ! error (EXIT_FAILURE, errno, _(write error)); ! } ! while (c != '\n'); ! } } int --- 267,295 /* Change tabs to spaces, writing to stdout. Read each file in `file_list', in order. */ + #if HAVE_MBRTOWC + # define FUNC expand_multi + # include mbfile_multi.h + # include expandloop.h + # include mbfile_undef.h + # undef FUNC + #endif + + #define FUNC expand_8bit + #include mbfile_8bit.h + #include expandloop.h + #include mbfile_undef.h + #undef FUNC + static void expand (void) { ! #if HAVE_MBRTOWC ! if (MB_CUR_MAX 1) ! expand_multi (); ! else ! #endif ! expand_8bit (); } int *** /dev/null 2003-09-23 19:59:22.0 +0200 --- src/expandloop.h2008-04-22 03:52:50.0 +0200 *** *** 0 --- 1,119 + /* Working loop for expand. +Copyright (C) 89, 91, 1995-2006, 2008 Free Software Foundation, Inc. + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied