make check (8.1) hangs forever
The last message I get from make check is: PASS: misc/help-version Then it sleeps for a long time. During this time, the inotify-race test is running. There's a timeout 10s gdb ... process that's been running for a lot longer than 10 seconds. After ^C stops the make, the timeout, gdb, and a couple of tail processes are lingering and have to be killed manually. So far I've looked at it with strace, which revealed that timeout has sent a SIGTERM to gdb, but gdb has SIGTERM blocked so nothing happens. This part of the script is doing what it was expected to do: tail --pid=$pid -f tail.out | (read; kill $pid) The kill $pid is executed, but $pid is the timeout process, which responds to the SIGTERM by passing along a SIGTERM+SIGCONT to gdb, and then waiting for gdb to die, which never happens. Therefore $pid doesn't die, the tail --pid never dies, and the script makes no further progress. It seems this entire script depends, in both the fail and pass cases, on the ability to end a gdb process by sending a SIGTERM, which doesn't actually work. My gdb is the one from Debian's 6.8-3 package. If necessary, I'll dig into why it's refusing to deal with SIGTERM. -- Hoping it won't be necessary, Alan Curry
Re: stable coreutils-8.1 today, fingers crossed
Jim Meyering writes: Gilles Espinasse wrote: ... [chroot-i486] root:/$ umask 0022 [chroot-i486] root:/$ rm -rf /usr/src/coreutils* [chroot-i486] root:/$ cd /usr/src [chroot-i486] root:/usr/src$ tar xf cache/coreutils-8.1.tar.gz [chroot-i486] root:/usr/src$ ls -ld /usr /usr/src /usr/src/coreutils-8.1 ... drwxrwxrwx 13 root root 4096 Nov 18 18:55 /usr/src/coreutils-8.1 don't know why Just the side effect of using tar as root --no-same-permissions let umask be applied Thanks for explaining. That's another good reason to do less as root. So was the drwxrwxrwx in the tarball put there to teach a lesson to those who trust a tarball to have sane permissions? Or is it a bug? -- Alan Curry
Re: make check (8.1) hangs forever
Jim Meyering writes: P=C3=A1draig Brady wrote: Alan Curry wrote: SIGTERM to gdb, but gdb has SIGTERM blocked so nothing happens. thanks for investigating. Perhaps we need to use `timeout -sKILL ...` Sounds good to me. I added that and re-ran make check. It worked but gdb's child process (tail -f file) is still lingering afterward until I kill it manually. Why has nobody else noticed this? Are other versions of gdb less stubborn? Maybe I did something to make it stubborn, but I don't know what that could be. In case you're keeping score: Debian 5.0r3, ppc32 All 366 tests passed (45 tests were not run) make[4]: Leaving directory `/tmp/coreutils-8.1/tests' All 177 tests passed (14 tests were not run) make[6]: Leaving directory `/tmp/coreutils-8.1/gnulib-tests' -- Alan Curry
Re: errors on date
=?ISO-8859-1?Q?P=E1draig_Brady?= writes: David Gonzalez Marquez wrote: Hi! I am student of computer science at university of Buenos Aires. I am using the command date for calculing days. I need to calculate the following day for any day. Doing that I see a error. I use for example: date --date 1920-05-02 1 days +%F and for the followings days, I see a error: date: invalid date `1920-05-01 1 days' Seems to work fine here with coreutils 7.2 and 8.1 What version of date are you using? The dates in question are timezone transitions (mostly daylight savings time, but the first one was a transition from old-fashioned local time to a modern time zone). date is filling in 00:00:00 since no specific time was specified, and trying to find the time_t corresponding to 1920-05-01 00:00:00 and it fails because that time never existed in Buenos Aires. It never gets as far as trying to add the 1 days. In Argentina, the jump forward happens at 00:00:00 (according to my reading of the tzdata file), so if you use 12:00:00 you should be safe for any day. $ TZ=America/Buenos_Aires date --date 1920-05-01 12:00:00 1 days +%F 1920-05-02 -- Alan Curry
Re: make check (8.1) hangs forever
=?ISO-8859-1?Q?P=E1draig_Brady?= writes: Note my gdb --version is 6.8.50.20090302-39.fc11 So there's probably a bug in my 6.8-debian that is fixed in that version. To follow up on that theory, I compiled gdb 7.0 and tried again. It passes. I believe the difference is this change from 2008-07-10 (after the 6.8 release, but before your 20090302 snapshot): (linux_nat_kill): Stop lwps before killing them. The ptrace man page says For requests other than PTRACE_KILL, the child process must be stopped. When I strace gdb 6.8, I see it using PTRACE_KILL yet the child process doesn't die. gdb 7.0 does the same thing but inserts a tkill(SIGSTOP) first and it works. Maybe a kernel change slipped past the man page maintainer. Possible solutions: timeout -sKILL (as seen already, leaves a lingering tail process) skip the test if gdb version 7.0 (except known-patched distro versions?) teach timeout to do a whole process tree instead of just a pgrp Over the course of various experiments, I found another flaw in this test: tail_forever_inotify can be inlined, and then gdb can't break on it. -- Alan Curry
Re: git coreutils 'make check' hangs
Ralf Wildenhues writes: Hi Pádraig, * Pádraig Brady wrote on Tue, Dec 08, 2009 at 02:00:20AM CET: Ralf Wildenhues wrote: for some time now, 'make check' in the git coreutils tree hangs for me: [...] I think it may be a gdb issue: http://lists.gnu.org/archive/html/bug-coreutils/2009-12/msg00025.htm For the moment we've marked that test as very expensive so that it will not be run by default. Ah, ok, I didn't see that thread. Sorry about the noise then. If yours is caused by the same thing (gdb failing to kill its child process and exit) could you tell me your uname -a? I'd like to be sure it's not just an arch-dependent kernel bug before reporting it as a man page bug. Something still needs to get fixed, even if coreutils no longer cares about it. -- Alan Curry
Re: btwowc(EOF) hang with gcc 4.4.2
Jim Meyering writes: The code in question is calling btowc(EOF), which uses this definition from wchar.h: extern wint_t __btowc_alias (int __c) __asm (btowc); extern __inline wint_t __NTH (btowc (int __c)) { return (__builtin_constant_p (__c) __c = '\0' __c = '\x7f' ? (wint_t) __c : __btowc_alias (__c)); } Since I don't even see any code that might loop there, (though I didn't look up what __asm (btowc) does) I'd suspect a compiler problem. That asm seems to be one of these: http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Asm-Labels.html It's saying that the function __btowc_alias is actually to be named btowc in the generated assembly code. This enables the inline function named btowc to call an external function also named btowc, by going through that alias. Tricky of them! Running gdb on conftest gets a backtrace like this: #0 0x080491d1 in btowc (__c=-1) at /usr/include/wchar.h:331 #1 0x08049431 in btowc () at /usr/include/wchar.h:332 #2 main () at conftest.c:276 Where that line 331 is: 331 { return (__builtin_constant_p (__c) __c = '\0' __c = '\x7f' This does not happen with other versions of gcc. I can't tell if it's a gcc bug or a system include file bug or something actually in coreutils, but here's the report anyway. It was pretty unsettling to have configure hang. To decide whether a compiler bug is the answer, make a copy of the conftest.c during the hang, run gcc -c -save-temps on it, and publish the resulting .i and .s files for inspection. The conftest programs are already pretty minimal so it should be easy to determine whether the assembly code correctly corresponds to the preprocessed C code.
Re: btwowc(EOF) hang with gcc 4.4.2
Karl Berry writes: Hi Alan, run gcc -c -save-temps on it, and publish the resulting .i and .s files for inspection. The conftest programs are already pretty minimal so it should be easy to determine whether the assembly code correctly corresponds to the preprocessed C code. I'm afraid my x86 assembler knowledge is near-nil, so it's not easy for me :). Before I try to make this an official gcc bug report, maybe you I did mean easy for the collective mind of the mailing list. could take a look/ The attached tar file has the .i and the .s both for -O (which gets a seg fault) and -O2 (which hangs). With no -O option at all, the binary exits (successfully). The presence or absence of -g makes no difference. I threw in the .c and .h for the heck of it. It's definitely a compiler problem. That extern inline asm alias trickery failed to work. (Much effort there to optimize a function that according to its own man page should never be used) It ended up as Andreas Schwab suggested: an infinite tail recursion. -O1 segfaults eventually because the recursion grows the stack to infinite size. -O2 optimized the recursion into a jump, eliminating the stack growth. In the future, .s files are usually better without -g (as long as you're not looking for a bug in the part of the compiler that generates the debugging info). The assembler directives that produce the debugging symbols add a lot of clutter. -- Alan Curry
Re: btwowc(EOF) hang with gcc 4.4.2
Karl Berry writes: It's definitely a compiler problem. That extern inline asm alias trickery The gcc people say that the behavior is correct; not a bug. (I don't understand all of their replies, but the conclusion seems clear.) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42440 OK, I understood the replies so I'll try to sum up: The oxymoron extern inline used to have one interpretation before the inline keyword was standardized in C99. In that standardization, a different interpretation for extern inline was mandated. The inline/alias trick used by glibc here needs the old interpretation, which should be requested with the gnu_inline attribute. Your version of glibc doesn't specify gnu_inline. So the problem boils down to: your gcc is too new for your glibc. Downgrade one or upgrade the other.
Re: rm - bug or user error?
Michael Webb writes: I am within a directory containing directories dir1 and dir2 and *no* files starting with f. shell rm -rf dir1 dir2 f* rm: No match. [...] I suspect the No match is coming from the command line parsing and not rm itself. However, the message starts with rm. That's just how [t]csh reports non-matching globs: prefixed with the name of the command that wasn't run because of the error. This might help you figure out which line in a script had failed if you had multiple commands with globs. Since rm wasn't ever actually run, it had no influence on the format of the error message. -- Alan Curry
Re: rm - bug or user error?
Jon Stanley writes: Yeah, like Eric said, I think that this is a csh problem rather than a coreutils problem. I would even think that csh is behaving wrongly here - rather than refusing to run rm because the glob didn't match, it should pass the f* straight through to rm to deal with as it pleases, unless you explicitly told the shell to fail (as Eric did in his example). I don't have any standards to back that up though, Eric is the POSIX-citing guy around here :) Any standards for that Eric? csh is not Bourne shell. This is one of the things that csh got right, which Bourne shell had wrong[1]. Sadly Bourne's behavior got the blessing of POSIX and csh didn't. But csh isn't being wrong here, just being csh. [1] Consider, in a Bourne/POSIX style shell, how different the two possible behaviors are for: grep a.*b foo depending on whether the word a.*b matches as a glob or not. People write junk code like this, it works because of the old Bourne shell misfeature of passing non-matching globs straight through, and then much later it mysteriously breaks because a file called a.b has been created. We'd be much better off if non-matching globs had always been treated as errors. -- Alan Curry
Re: Stty bug?
Tom writes: I'm using Ubuntu 9.10 (32-bit). I'm trying to use an ASR-33 Teletype (uppercase only) Are you trying to write the best message this mailing list has ever aseen? Because so far, it is. on ttyS0. I have -U specified in getty for uppercase conversions and stty -F/dev/ttyS0 shows this: speed 110 baud; line = 0; -brkint ixoff iuclc -imaxbel olcuc -iexten xcase As you can see, iuclc is set but no commands work I believe -iexten may be the problem. In your tty1 test, was iexten enabled? Maybe a getty bug, it should be turning on iexten if it turns on iuclc.
Re: Ubuntu stty
Bob Proulx writes: Tom Lake wrote: I have an ASR-33 Teletype on ttyS0 which can only output uppercase characters that I'm trying to use as a serial console. I like it. I already answered this the first time it was sent. The archive has my message: http://lists.gnu.org/archive/html/bug-coreutils/2010-02/msg00020.html It didn't get through to Tom directly, though. His part of the Internet doesn't accept mail from my part. Someone in the middle (Bob?) could maybe tell him that he's missing out, by not being subscribed to the list and not checking the archive. -- Alan Curry
Re: Stty bug?
Tom Lake writes: I believe -iexten may be the problem. In your tty1 test, was iexten enabled? Maybe a getty bug, it should be turning on iexten if it turns on iuclc. I was able to recreate the original problem. Reading Alan's response here I was surprised to see that adding iexten to iuclc did enable the desired behavior. This would not have been required on traditional systems and its combination here isn't obvious to me. There doesn't seem to be very much documentation available about iexten. I am curious how you deduced that adding it would produce the desired behavior? Sorry to report that it didn't work for me. I tried both iexten and -iexten for good measure. It seems like whenever getty is respawned, it changes /dev/ttyS0 to whatever it wants no matter what stty does. I still couldn't To do a successful test without patching getty, you'd have to do the stty from another terminal after entering the username but before the password. Or make a test account with a password that doesn't have any letters in it. Either way, even with the buggy getty, you should at least get an all-uppercase PASSWORD: prompt after typing an uppercase username. Does the password prompt look OK? (What would it look like if the computer was sending it lowercase ASCII codes? garbage I assume, so a readable password prompt is a sign things are working up to that point.) By the way, suggested getty diff: --- agetty.c.orig 2010-02-12 18:12:46.0 -0500 +++ agetty.c2010-02-12 18:13:38.0 -0500 @@ -1138,7 +1138,7 @@ /* General terminal-independent stuff. */ tp-c_iflag |= IXON | IXOFF; /* 2-way flow control */ -tp-c_lflag |= ICANON | ISIG | ECHO | ECHOE | ECHOK| ECHOKE; +tp-c_lflag |= ICANON | ISIG | ECHO | ECHOE | ECHOK| ECHOKE | IEXTEN; /* no longer| ECHOCTL | ECHOPRT*/ tp-c_oflag |= OPOST; /* tp-c_cflag = 0; */ -- Alan Curry
Re: chmod directory mode extention
seaking1 writes: Hello, I would like to suggest and offer the code to extend chmod in a small way. My extension merely allows for a different mode to be applied to directories than the one applied to all other files. I have looked for this utility but never found it and being such a useful addition though it may be possible to add it to the standard release. What it does: Add an option -d|--dirmode to chmod that will give all directories in the files chmod is told to change the mode specified by the -d argument instead of the other mode. Reason: I have found this to be a necessity. Say for example I have a directory structure filled with data files of some sort and they are of assorted permissions; If I chmod -R 664 foo/ or something to that effect It will of course give all the permissions 664 including directories hence making them inaccessible. With this is could run chmod -R -d 775 644 foo/ and give the directories the permissions 775. chmod -R ug=rwX,o=rX is pretty close to what you're asking for. The only difference is that the X also adds x permission to files that already have at least one x bit. Your suggestion is more generalized, so no necessarily a bad idea. I just mention this because lots of people overlook the +X option. -- Alan Curry
bug#5970: regex won't do lazy matching
a g writes: This may be a usage problem, but it does not exist with other regex packages (such as slre) and I can't find anything in the documentation to indicate that the syntax should be different for coreutils. I am using coreutils 8.4 on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher to do lazy matching. Here is my code: By lazy do you mean non-greedy? Here is the problem. If you execute: regex_test a[^x]*?a a1a2a The non-greedy quantifiers like *? are not part of standard regex, they are extensions found in perl, and in other packages inspired by perl. -- Alan Curry
bug#5958: Sort-8.4 bug
A full investigation has revealed: This bug was introduced between coreutils 7.1 and 7.2, here: commit 224a69b56b716f57e3a018af5a9b9379f32da3fc Author: Pádraig Brady p...@draigbrady.com Date: Tue Feb 24 08:37:18 2009 + sort: Fix two bugs with determining the end of field * src/sort.c: When no specific number of chars to skip is specified for the end field, always skip the whole field. Also never include leading spaces from next field. * tests/misc/sort: Add 2 new tests for these cases. * NEWS: Mention this bug fix. * THANKS: Add bug reporter. Reported by Davide Canova. In the diff of that commit, an eword++ was removed from the case 'k' section of option parsing, where it did not affect traditional options, and added to the limfield() function, where it takes effect regardless of how fields were specified. So it fixed a -k option parsing bug and added a traditional option parsing bug. And on the way, it removed a comment describing the correct correspondence between the two! The following patch moves the eword++ back to its old location (under the case 'k') but keeps the new test for when it should be applied (echar==0, whether by explicit .0 on the field end specifier or by omission of the field end specifier). This allows the -k bug that was fixed to stay fixed, while undoing the damage to the traditional options. With this patch applied, all the sort tests in make check still pass, including the tests added in the above commit, which I take as a sign that I got it right. And the traditional options are back to working again. I'd suggest the following new test case: printf a b c\na c b\n | sort +0 -1 +2 should output a c b\na b c\n I'd put that in the diff too, but the organization of tests/misc/sort is baffling. --- coreutils-8.4.orig/src/sort.c 2010-04-20 02:45:35.0 -0500 +++ coreutils-8.4/src/sort.c2010-04-20 03:12:57.0 -0500 @@ -1460,9 +1460,6 @@ char *ptr = line-text, *lim = ptr + line-length - 1; size_t eword = key-eword, echar = key-echar; - if (echar == 0) -eword++; /* Skip all of end field. */ - /* Move PTR past EWORD fields or to one past the last byte on LINE, whichever comes first. If there are more than EWORD fields, leave PTR pointing at the beginning of the field having zero-based index, @@ -3424,6 +3421,8 @@ s = parse_field_count (s + 1, key-echar, N_(invalid number after `.')); } + if (key-echar == 0) +key-eword++; /* Skip all of end field. */ s = set_ordering (s, key, bl_end); } if (*s) OK now let's not say I haven't done any legwork. -- Alan Curry
bug#6007: en_US sorting is completely stupid.
Bob Proulx writes: You don't like it and I don't like it but the-powers-that-be have Who's the power here anyway? Who do we have to impeach? Seriously. The en_US locale is an unmitigated disaster. It's officially called not a bug every time it comes up, which seems to be once a week on this list alone, so what volume of complaints is required to tip the balance to all right it's a damn bug let's fix it? From the name en_US one might guess that it represents the behavior expected by English-speaking users in or from the US. But those users have lived with computers for a generation or two. What they expect is ASCIIbetical. The only people who actually expect phone-book-style sorting are old geezers who remember what a phone book was. Most of them have never used a computer and never will, so why do we (and by we I mean whoever makes the locale rules) bend the default to accommodate them? -- Alan Curry
bug#6007: en_US sorting is completely stupid.
Andreas Schwab writes: Alan Curry pacman...@kosh.dhis.org writes: Who's the power here anyway? You are, actually. Everyone can define locales to behave the way he likes, see localedef(1). I avoid this by not having any locales installed. But that doesn't help all the other victims. From the name en_US one might guess that it represents the behavior expected by English-speaking users in or from the US. But those users have lived with computers for a generation or two. What they expect is ASCIIbetical. Nowadays most people don't know what ASCII is. They may not know how to name it, but they do complain when it isn't used, enough that it's a FAQ. People install a GNU/Linux distribution, pick English from the language menu, and get a set of sorting rules that doesn't makes sense. Sorry, should have told the installer you speak C. Donna Summer just doesn't belong between Don Adams and Don Pardo, and everyone knows it. Not a bug? Bah. Not a coreutils bug, but it's a bug. If glibc was in the same bug tracking system with coreutils, reports like this one could be reassigned there. -- Alan Curry
bug#5926: feature request: mv -p to create missing target dir
Bob Proulx writes: As a side comment I don't see the point of: $(which mv) $@ I can guess the point: bash$ alias mv='mv -i' bash$ touch a b bash$ mv a b mv: overwrite `b'? ^C bash$ $(which mv) a b bash$ ls -l a b ls: cannot access a: No such file or directory -rw--- 1 pacman users 0 Apr 24 17:55 b Silly aliases. -- Alan Curry
bug#6056: base32 output for md5sum sha1sum etc.
In the dark ages before the bug tracker (i.e. November), a message was sent: http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00206.html providing an RFC4648 base32 output option for the cryptographic hash utilities. I'm sending this now to 1. endorse the idea 2. get it a bug number so it might be noticed
bug#6056: base32 output for md5sum sha1sum etc.
Jim Meyering writes: tags 6056 + moreinfo I've given all the moreinfo I could. I thought It's a standards-track RFC plus seen in the wild would have been enough. And the applications where it's relevant (Gnutella, Bitzi) are pretty well-known. md5sum ... | perl -anle 'use Convert::Base32;\ $h32=3Duc(encode_base32(pack(H40, $F[0]))); print $h32 $F[1];' This is a strong argument not to encumber the tool with a new option. Since it's broken, I saw it as an argument the other way. The 16to32 converter is just complex enough that attempts to do it freehand will not quite be right. But seeing a GNU maintainer argue against a new option based on the bloat/benefit ratio is a pleasant surprise. From the people who gave us sed -i (that's just a redirect and a mv) and grep -r (why learn to use find when you can just add recursion to every tool). I don't want to fight this trend too hard. -- Alan Curry
bug#6104: [Expert] Bug in mv?
Note: I saw this on bug-coreutils, haven't read the whole thread. Gene Heskett writes: On Tuesday 04 May 2010, Jo=E3o Victor Martins wrote: On Tue, May 04, 2010 at 10:36:19PM -0400, Gene Heskett wrote: I tried to mv amanda* /home/amanda/* as root and which which I recall I have done successfully several times before. The shell expand * _before_ passing the args to mv. So mv saw all files starting with 'amanda' and all files (besides . hidden ones) i= n /home/amanda/ as arg. It then picked the last one listed (probably /home/amanda/tmp/) as destination. I had two files whose names started with amanda in that directory. I would have assumed it would expand the src pattern of amanda* to match on ly those It's not the first * that's the problem. The second one (/home/amanda/*) expands to a list of everything that was in /home/amanda (except dotfiles) and that happens before mv is executed. There are several possibilities of what that command can do: 1. /home/amanda contained no files before the move. In that case the /home/amanda/* is passed through literally as the final argument to mv, so mv sees 3 arguments (your 2 files, then /home/amanda/* which doesn't exist) and it fails, because with more than 2 arguments, the last argument must be an existing directory. 2. /home/amanda contained some stuff, and the last item in the expanded list (alphabetically sorted) was not a directory. Same result as #1. 3. /home/amanda contained some stuff, and the last item in the expanded list happened to be a directory (say you have a directory called /home/amanda/): then the list expands, the final argument to mv is an existing directory, so you have success! Your 2 files, plus everything in /home/amanda, gets moved into the directory. If this isn't what you meant, you did something wrong. mv just did what it was told. 4. Like #1, but with a nomatch shell option enabled, you get a No match error message. Your career as a unix wizard isn't complete until you've done something like #3 *on purpose*. -- Alan Curry
bug#6897: date -d '1991-04-14 +1 day' fails
Bob Proulx writes: date -d '1991-04-14 12:00 +1 day' I'm from china by the way, and the time zone I am in and to which the systems were set is GMT8(or CST, China Standard Time). Indeed, TZ=Asia/Shanghai date -d '4/14/1991' date: invalid date `4/14/1991' TZ=Asia/Shanghai date -d '4/14/1991 01:00:00' Sun Apr 14 01:00:00 CDT 1991 TZ=Asia/Shanghai date -d '1/1/1970 GMT + 671558399 sec' Sat Apr 13 23:59:59 CST 1991 TZ=Asia/Shanghai date -d '1/1/1970 GMT + 671558400 sec' Sun Apr 14 01:00:00 CDT 1991 According to tzdata, China had DST from 1986 to 1991. This comment in the source file indicates some doubt about correctness: # From Paul Eggert (2006-03-22): # Shanks Pottenger write that China (except for Hong Kong and Macau) # has had a single time zone since 1980 May 1, observing summer DST # from 1986 through 1991; this contradicts Devine's # note about Time magazine, though apparently _something_ happened in 1986. # Go with Shanks Pottenger for now. I made up names for the other # pre-1980 time zones. Maybe someone who can read Chinese could clear it up by finding the original policy declarations... Please review the FAQ for date. http://www.gnu.org/software/coreutils/faq/#The-date-command-is-not-work= ing-right_002e There might be less occurrences of this misunderstanding if we could teach date that -d 4/14/1991 is not actually a request for 4/14/1991 00:00:00, but any time that existed during the day 4/14/1991, or perhaps a more specific the first second of 4/14/1991. Has that been considered and rejected already, or is it just waiting for someone to implement it?
bug#6897: date -d '1991-04-14 +1 day' fails
Paul Eggert writes: On 08/22/10 18:09, Alan Curry wrote: There might be less occurrences of this misunderstanding if we could teach date that -d 4/14/1991 is not actually a request for 4/14/1991 00:00:00, but any time that existed during the day 4/14/1991, or perhaps a more specific the first second of 4/14/1991. Has that been considered and rejected already, or is it just waiting for someone to implement it? As far as I know nobody has ever suggested that, and it is a reasonable suggestion. However, it would not fix the problem in general, since in some cases there is no first second of date X, even when X is valid. For example: $ TZ=Pacific/Kwajalein date -d 1993-08-20 date: invalid date `1993-08-20' There's nothing wrong with that error message. It's telling the truth about 1993-08-28 being an invalid date. But TZ=Asia/Shanghai date -d '4/14/1991' says: date: invalid date `4/14/1991' which is a lie. 4/14/1991 is not an invalid date. It made a bad assumption (that midnight was intended, when the user didn't ask for midnight at all) and then reported an error caused by the bad assumption, and didn't even have the courtesy to mention the assumption. Bonus thought: the date command is misnamed. If it actually worked with dates, it wouldn't need to attach an hour, minute, and second to everything. It would understand 4/14/1991 as representing an entire day, and + 1 day added to it would represent the entire next day. But date doesn't work with dates, it works with time_t's. This is not obvious to the casual user. -- Alan Curry
bug#6949: Uniq command should allow total to be displayed
Miles Duke writes: 'uniq --count' is an excellent tool for summarizing data, but it is missing one useful feature - an overall total. This might be a good idea... It's embarrassing to have to go to excel to bring the totals together. ...but you can't think of any other tool that can add up a bunch of numbers?! You're using dynamite to kill a mosquito. There must be a dozen basic utilities that can do arithmetic. Like awk: ... | uniq -c | awk '{t+=$1}END{print t,total}1' -- Alan Curry
bug#7182: sort -R slow
Ole Tange writes: I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: Never heard of unsort. Why didn't you try shuf(1)? Also, your perl is not valid: $ time perl -e 'print sort { rand() = rand() } ' file real0m6.621s That comparison function is not consistent (unless very lucky). I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. How big is your test file? I expect sort(1) to be optimized for big jobs. I bet it would win the contest if you are shuffling a file that's bigger than available RAM.
bug#7228: coreutils-8.6 sort-float failure on ppc
Jim Meyering writes: Gilles Espinasse wrote: Just tested 8.6 on linux glibc-2.11.1/gcc-4.4.5 LFS build on x86, sparc and ppc First a good news is that on sparc (32-bits), 8.6 test suite is now passing I didn't report yet a failure on misc/stty which was Failure was + stty -icanon stty: standard input: unable to perform all requested operations Is that consistently reproducible? If so, you might want to investigate, but it's probably not a big deal. I've seen that error message before, and I did investigate. It was caused by glibc's tcsetattr()/tcgetattr() being too clever, trying to support fields that didn't exist in the kernel's termios struct. The kernel struct is arch-specific so it's not surprising that an arch-specific bug would show up here. I've only seen it with speed changes. stty 115200 /dev/ttyS0 makes the change succesfully, but complains. The kernel termios struct may or may not have separate speed fields for input and output, but glibc likes to pretend that they're both there, and somehow stty gets confused by glibc's fakery. strace doesn't give any clues because it shows the real kernel structures. See sysdeps/unix/sysv/linux/{speed,tc[gs]etattr}.c in glibc source for the full ugliness.
bug#7247: readdir obsoleteness?
Ian Martin writes: A message containing only ASCII characters which was nevertheless encoded as quoted-unreadable, with its original newlines senselessly escaped, and then more newlines injected, forming a bricktext with continuation markers. Does yahoo send them out like this or is it a mailing list manager hatchet job? Reformatted for sanity: Hi, just trawling the webpages, I got caught in a loop. The syscalls page states: Don't read man pages you find on the web, unless you're deliberately looking for information on old systems. Up to date man pages for Linux are at ftp.$COUNTRYCODE.kernel.org:/pub/linux/docs/man-pages ... Then there is __NR_readdir corresponding to old_readdir(), which will read at most one directory entry at a time, and is superseded by sys_getdents(). however, on the getdents man page: DescriptionThis is not the function you are interested in. Look at readdir(3)for the POSIX conforming C library interface. This page documents the bare kernel system call interface. You started at readdir(2), you ended at readdir(3). That's not a loop. readdir(3) is the POSIXly portable C-level interface. readdir(2) and getdents(2) are the Linux-specific implementations which you don't need to know about unless you're writing code at or below the libc layer. -- Alan Curry
bug#7433: ls: [manual] description for --directory is insufficient
Eric Blake writes: The wording for -d may not mention it, but the wording at the very beginning of the --help and man page is clear that: | Usage: ls [OPTION]... [FILE]... | List information about the FILEs (the current directory by default). In other words, to correctly predict the behavior of ls -d you must read two pieces of information that are not immediately adjacent to each other, and use a minimal amount of thought to decide whether and how they influence each other. For people who read documentation all the way through, knowing that a thorough understanding of the available tools will be a long-term benefit, this is not a problem. Let's call these people the smart bears. They'll get the garbage can open easily because they're patient. For people who only skim documentation, and not even that until they have a problem, the obstacle is larger. If there isn't a single sentence that tells them everything they need to know, they're not going to get it. Let's call these people the dumb tourists. They're impatient with the garbage can latch, because they're holding a smelly bag of garbage. Smart bears see a thick instruction manual and say Hooray! Proper documentation! I won't have to guess how it works. Dumb tourists see a thick instruction manual and say Screw that, reading sucks, I can guess how it works. man pages are written by and for smart bears. Dumb tourists don't write documentation. Sometimes they write web pages which they optimistically call documentation. Making documentation dumb-tourist-friendly inevitably makes it longer, because it has to have a clause for each goal that the reader might want to achieve, instead of just listing the facts and expecting the reader to be able to put them together. The increased length bothers the smart bears since it increases the time required to read the documentation all the way through. In the case of ls, I suggest that -d is special enough (since it affects how the non-option arguments are used, unlike other ls options) that a little extra length is justified. It would be reasonable to provide 2 separate SYNOPSIS lines, something like this: SYNOPSIS ls [OPTION]... [FILE]... ls -d [OPTION]... [FILE]... DESCRIPTION The first form lists the given FILEs, and if any of them are directories, the directory contents are listed. If no FILEs are given, the contents of the current directory are listed. The second form (with -d) lists the given FILEs, but any FILE that is a directory will not have its contents listed. With no FILEs given, the current directory (not its contents) is listed. I don't care how you'd translate that to/from --help. I care about man pages, not --help. If that seems like giving too much attention to -d, how about this alternative: add an EXAMPLES section. Dumb tourists love EXAMPLES sections, and smart bears can safely skip them. It's a little bit ridiculous that cat(1) has examples and ls(1) doesn't. ls has a lot more options. And the conflict between -R and -d should be explicitly mentioned. One of them makes the other meaningless, and we should say which one. -- Alan Curry
bug#7450: cp does not check for errno=EISDIR when target is dangling link
=?UTF-8?Q?=D0=9C=D0=B0=D1=80=D0=BA_?= writes: How to reproduce: $ ln -s non-exist tgt [...] $ cp /etc/passwd tgt/ cp: cannot create regular file `tgt/': Is a directory Novices can not understand this message :) The same confusing error message also occurs in the simpler case where the target is simply any nonexistent name with a trailing slash. $ ls -ld tgt ls: cannot access tgt: No such file or directory $ cp /etc/passwd tgt/ cp: cannot create regular file `tgt/': Is a directory strace shows this: open(tgt/, O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = -1 EISDIR (Is a directory) which I think is just bad kernel behavior. There's no errno (among the classical errno values anyway) which completely expresses you tried to creat something with a trailing slash, but I'd rather see ENOENT or ENOTDIR than EISDIR. When this open fails, nothing in sight Is a directory. The problem is that tgt/ _would_ be a directory, but since it doesn't exist and isn't about to be created, Is a directory is an overstatement. cp should dodge this issue by never calling creat with a trailing slash. It could supply a meaningful error message instead of one that is derived from an errno. -- Alan Curry
bug#7450: cp does not check for errno=EISDIR when target is
Jim Meyering writes: Thanks for the suggestions. Here's the patch I'm considering: [I first patched copy.c's copy_reg, included at end, but didn't like that as much; the core copying code should not be encumbered like that, and other users of copy.c are not affected. ] Cross-filesystem mv is pretty much the same. If one is confusing enough to justify a change, I think the other is too. $ cd /tmp $ touch foo $ ls -ld $HOME/nosuch ls: cannot access /home/pacman/nosuch: No such file or directory $ mv foo $HOME/nosuch/ mv: cannot create regular file `/home/pacman/nosuch/': Is a directory $ -- Alan Curry
bug#8079: rm command problem/suggestion
Luca Daniel writes: Hi there :-) I have o problem and an suggestion : 1) The problem: I can't find an easy way to remove a type of file through all sub-directories with GNU tool rm (remove). There is not an option to search through all sub-folders , only in the current working directory. Back when I used windows this was easy with the command : del /s *.pdf . You misplace the blame on rm; the problem is that the standard unix shell doesn't have recursive globbing. Doing it in the shell means that all utilities benefit. rm is just one of them. zsh does recursive globbing with a double-asterisk, so that for example rm **/*.pdf would get rid of all files named *.pdf anywhere under the current directory. bash also knows about the ** recursive glob, but I recommend zsh because it has a lot more cool features, like **/*.(pdf|ps)(m+30Lk-500) (recursive directory search, all files named *.pdf or *.ps, whose last modification was more than 30 days ago, with a size less than 500k) -- Alan Curry
bug#8090: strerror(1) and strsignal(1)?
Bruce Korb writes: Hi Jim, On 02/20/11 15:20, Jim Meyering wrote: Bruce Korb wrote: Hi Bruce, [your subject mentions strsignal -- you know you can get a list via env kill --table, assuming you have kill from coreutils? ] What's the installation rate of coreutils-kill vs. procps kill? Debian chooses procps kill (except on Hurd and maybe freebsd-kernel) I've had that itch many times. Here are some handy bash/perl functions I wrote: Yep. I know one can get to it via perl. OTOH, _you've_ had that itch many times, Padraig's had that itch many times, and I'd take a wild guess that there have been a few others, too. So it still You guys don't perl-golf well. perl -E'say$!=11' or for older perls perl -le'print$!=11' remains for the itchy folks to drag something around to new places whenever they go to a new environment. Were it in coreutils, it would likely be more easily found. It also fits well with my pet theory that library function names ought to have same-named commands lying about. Thus, if you can remember strerror(3p), then by golly there's a strerror(1), too, with obvious options (none, in this case) and operands. The important thing is that when you need to use this utility, you report a bug on the program that printed a number instead of calling strerror(3) itself. Error numbers are not a user interface, regardless of Microsoft's attempt to train people otherwise. Nice. I've copied them into my shell functions directory. I still think strerror(3p) ought to imply a strerror(1) command, but I leave it to you to decide. It's just my preference. Just as write(2) implies write(1), and time(2) implies time(1). Or something like that. -- Alan Curry
bug#8103: NUL terminated lines
Bjartur Thorlacius writes: On 2/24/11, Jim Meyering j...@meyering.net wrote: Bjartur Thorlacius wrote: Maybe we should modify tac to add the -z option. Would you care to write a patch? It would be redundant, as tac -s $'\0' is equivalent. Note that a $'\0' argument in a shell command line is exactly equivalent to an empty string, since it must be passed from the shell to the program using execve() which takes NUL-terminated strings. There is no way to run a program with an actual NUL byte contained in one of its arguments. execve will stop copying at the NUL, and even if it didn't, the new program receives its arguments in int argc, char **argv form so how is it supposed to know that there's a NUL in there that's not a terminator? This limitation can't be avoided. It's not just a C language thing. The execve interface is based on NUL-terminated strings at the asm level too. If tac -s $'\0' did something different from tac -s '', it could only have been a shell builtin. (Assuming the shell supported the $'...' notation at all) -- Alan Curry
bug#8102: [head] do not return EXIT_SUCCESS upon premature EOF
Bjartur Thorlacius writes: On 2/23/11, Eric Blake ebl...@redhat.com wrote: On 02/23/2011 11:58 AM, Bjartur Thorlacius wrote: That's because this is not a bug, but a POSIX requirement: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html When a file contains less than number lines, it shall be copied to standard output in its entirety. This shall not be an error. Indeed. Since it's explicitly mentioned, I assume there's a reason for it. I'd be grateful if someone could point out what the rationale beind the decision is (or better yet, where such information can be found). So should I be using a head-alike for iterating over lines, and would such an utility belong to a GNU package, or is awk the right tool for the job? Here's what an iterate-over-lines loop normally looks like in a shell script: while read -r line do something $line done The idea of using head to control a loop means you are either a newbie who didn't know about read, or you are trying to do something subtly different which I didn't understand. Excuse me if I guessed the wrong one. -- Alan Curry
bug#8231: Bug in the linux command: tail
Eric Blake writes: Besides, we already have the convention that long options that require an argument mean that the associated short option also requires an option. That is, we are already consistent in writing -n, --lines=K as shorthand for: -n K OR --lines=K And a stupid convention it is. An equals sign that's distributive over a comma! Anywhere else, the comma is a very-low-precedence operator. The fact that the comma has whitespace on one side, while the equals sign has no adjacent whitespace, provides reinforcing evidence that the equals sign should bind tighter. But it doesn't. The reader's only hope is to infer this GNU anti-readability convention by reading the man page for a command they already know how to use, and then apply that convention when reading about other commands. The Linux man-pages project should take on section 1. -- Alan Curry
bug#8231: Bug in the linux command: tail
Eric Blake writes: On 03/11/2011 10:03 AM, Roger N. Clark wrote: This works on HP-UX. Since we're already dealing with two GNU extensions, I don't see why we can't be nice and make the shorter syntax work the way HP-UX is doing things. Patches welcome! Really? tail -N of multiple files used to work with GNU tail, before someone broke it[1]. I have a local patch that I've been using since the breakage first bit me. Before the breakage, a comment explicitly stated that multiple filename arguments were allowed. That comment was replaced with one explicitly stating the opposite. Looks like an intentional feature removal to me. Patches welcome? How about a revert? [1] commit 99f09784cc98732a440de86bb99a46f11f7355d8 -- Alan Curry
bug#8408: A possible tee bug?
George Goffe writes: Howdy, I have run several scripts and seen this behavior in all cases... tee somescript | tee somescript.log 21 The contents of the log is missing a lot of activity... messages and so forth. Is it possible that there are other file descriptors being used for these messages? I can't tell what you're trying to do from this incomplete example, but it looks like you're expecting the 21 to do something other than what it's actually doing. It's only pointing the second tee's stderr to wherever its stdout was going. If the above pipeline is run in isolation from an interactive shell prompt, the 21 is accomplishing nothing at all, since stderr and stdout will already be going to the same place (the tty) anyway. tee's stderr will normally be empty; it would only print an error message there if it had trouble writing to somescript.log. Post a more complete description of your intent. -- Alan Curry
bug#8408: A possible tee bug?
George Goffe writes: Alan, Oops. I goofed... My apologies. The example would be this somescript | tee somescript.log 21. The intent is to capture all the output (stdout and stderr) from somescript. somescript runs several commands that may or may not utilize other FDs. I was hoping to get a better output than what you might get from the script command which records all the messages + a ton of other things like escapes which are a pain to eliminate. Does this make better sense? Well, you still have the 21 in the wrong place. If you want it to affect the stderr of the command to the left of the pipe, you have to put it to the left of the pipe.
bug#8511: Sort error in makefile
Harpal Shergill writes: Hello, I have a makefile which does the following: 1. grab a .gz file from server using wget -- works fine 2. extract the data from .gz file to new file based on filter using ZCAT -- works fine 3. sort the data from based on specific field and saves the data into a new file -- DOES NOT WORK - command looks like this: sort -t: -k2n inputFile outputFile - this command works perfectly on command line of cygwin BUT fails through makefile - Error says: Input file specified two times. I tried to search on this online but couldn't get any info. Can you please help and show me what's wrong? In make file i have this command enclosed with ` character. Any feedback on this would be greatly appreciated. Your makefile is running the DOS/Windows sort command instead of the GNU/cygwin sort. Use a full path like /whatever/cygwin/bin/sort to make it use the right one. cygwin's bug, if a bug at all... -- Alan Curry
bug#8423: Questions about checking out 6.7 using git
Jim Meyering writes: tags 8423 notabug close 8423 thanks That's disappointing. I was looking forward to seeing a response to this question. I also recently tried to find the origin of a bug with git bisect and quickly ended up with an uncompilable mess. If you want to see a complete demonstration, I could make another attempt and log it all. But if you're trying to tell us that checking out old versions from the repository and compiling them shouldn't be expected to work... then you're wrong. -- Alan Curry
bug#8578: 8.12 and 8.10 'ls -dl' appends ' ' (0x20: space) to
Eric Blake writes: On 04/28/2011 12:34 PM, Jason Vas Dias wrote: I do: =20 $ ls --version | grep '[(]G' ls (GNU coreutils) 8.12 Thanks for the report. $ ls -dl /. | od -cx od -cx is not always the best choice in formatting - it depends on the endianness of your machine since it groups two bytes at a time. I personally like 'od -c -tx1z' better for the type of output you are wanti= ng. 000 d r w x r - x r - x . 2 5 r= 726478772d727278782d202e35327220= Did anyone else notice the '.' after the drwxr-xr-x part? I bet that's what's confusing python. The file mode written under the -l, -g, -n, and -o options shall consist of the following format: %c%s%s%s%c, entry type, owner permissions, group permissions, other permissions, optional alternate access method flag The optional alternate access method flag shall be a single space if there is no alternate or additional access control method associated with the file; otherwise, a printable character shall be used.
bug#8587: Curious bug.
Francois Boisson writes: On a debian squeeze amd64. francois@totoche:~$ echo ABCD Directory | tr [:lower:] [:upper:] ABCD DIRECTORY francois@totoche:~$ cd /tmp francois@totoche:/tmp$ echo ABCD Directory | tr [:lower:] [:upper:] tr: construit [:upper:] et/ou [:lower:] mal aligné I can't read that error message but I can see what you did wrong. [:upper:] is seen by the shell as a glob which matches these filenames: : e p r u and likewise [:lower:] matches a different set of single-character filenames. In one directory, you don't have any files named like that. In the other directory, you do. When the glob matches nothing, the shell passes the string [:upper:] or [:lower:] literally as an argument to the command. That's a design flaw in the unix shell from its early days, which nobody has the guts to fix. Use '[:upper:]' and '[:lower:]' to make the shell treat them as literal strings and not globs. Switch to zsh for better diagnostics... % echo ABCD Directory | tr [:lower:] [:upper:] zsh: no matches found: [:lower:] % echo ABCD Directory | tr '[:lower:]' '[:upper:]' ABCD DIRECTORY -- Alan Curry
bug#8604: Linux mime help needed
Eric Blake writes: $ file mmencode mmencode: PA-RISC2.0 shared executable dynamically linked - not stripped If you want to run a binary on a different platform, you have to recompile it from source for that platform. Do you have the source for mmencode? If not, then I don't see how you can expect to migrate to a different operating system and hardware. But again, that's outside the scope of Coreutils. For the record, mmencode is found in the metamail package, and also comes with elm. Source is still findable, even though it's dropped out of the packaged by the OS distributor level of popularity. (Curious header watchers will have noticed I'm an elm user. And yep, I used mmencode to decode the mmencode in the original question.) The coreutils equivalent is base64(1). After a rewrite with mutt, the whole script might be a one-liner. -- Alan Curry
bug#8609: $GZIP doesn't mean what you think it means.
Let me show you what happens if I try to clone coreutils from git and compile in the most straightforward way possible: % git clone git://git.sv.gnu.org/coreutils Cloning into coreutils... remote: Counting objects: 151287, done. remote: Compressing objects: 100% (37539/37539), done. remote: Total 151287 (delta 113807), reused 150796 (delta 113449) Receiving objects: 100% (151287/151287), 26.95 MiB | 767 KiB/s, done. Resolving deltas: 100% (113807/113807), done. Script started on Tue May 3 00:40:20 2011 % cd coreutils % ./bootstrap ./bootstrap: Error: '-9' not found ./bootstrap: See README-prereq for how to get the prerequisite programs % On seeing that the first time, I immediately knew what happened and worked around it... then quickly got into trouble trying to git bisect something and forgot about it. Now I've repeated the process (including getting into trouble with git bisect but that's for later) and decided that this bug, though easily worked around, deserves to be reported. bootstrap wrongly assumes that if there's a GZIP environment variable, it must contain the pathanme of a gzip program. gzip is a tool we use all the time, so I would have hoped that GNU developers would have read its documentation, but apparently not. The GZIP environment variable is used to pass default options, hence the GZIP=-9 which has been in my environment for a long time. If you even tried putting GZIP=/bin/gzip in the environment, you'd find that gzip no longer works properly, because it acts as if an extra /bin/gzip was given on the command line... and if you did it as root, congratulations, you just gzipped your gzip program. Surely gzip has the authority to define the semantics of the GZIP environment variable, and bootstrap should not be making the unwarranted (and obviously untested obviously) assumption that it means something different. I assume this bug resulted from an over-generalization of the pattern CC=gcc, MAKE=gmake, ... In the environment of my current login shell, there are 10 environment variables with names that (after tr A-Z a-z) are also programs in my PATH. Of those 10, 3 follow the $GZIP pattern where the value of the environment variable is a list of options for the command. Another 3 fit the pattern COMMAND=/path/to/implementation/of/command. Neither pattern is a reliable predictor of the semantics of an arbitrary environment variable. -- Alan Curry
bug#8643: 'who' command bug
Bob Proulx writes: ding bat wrote: I was using Who to list all users connected to pptpd vpn server with maverick 10.10. I put natty on the computer and now the who command does not list out the vpn users. It only seems to list out local logged in user. any thoughts, this issue is killing me. I can still do 'last |grep ppp' and get joy, but 'w' and 'who' were very nice. [...] I am not an Ubuntu user and do not have a system to test with and so I do not know what programs you would be running when you log in with Ubuntu's Natty. You will need to look at your system and determine what login manager you are using. This is probably gdm but might be gdm3 but possibly one of several others. You will need to determine what terminal program you are using. This is probably gnome-terminal but possibly one of several others. Both of those programs either should (or should not) be logging user login information to utmp. Bob apparently doesn't know what pptpd is. Or what VPN means. Or what PPP is. Or didn't read very carefully. But he's probably right anyway. The bug is more likely to be in pppd than anywhere else. It's weird that it would write to wtmp (for last) but not utmp (for who). Check the config files for recent changes, and if you can't find the cause, find someplace that gives help with pppd. An strace of the pppd process during connection setup could be enlightening. -- Alan Curry
bug#8766: Bug in sha1sum?
Theo Band writes: Hi I'm not sure, but I think I found a bug in sha1sum. It's easy to reproduce with any file that contains a backslash (\) in the name: echo test test $ sha1sum test 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test $ mv test 'test\test' $ sha1sum 'test\test' \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test\\test I expect the file sha1sum to be the same after renaming the file (a backslash is prepended to the otherwise correct result). This result violated my expectations too, but it turns out to be a documented feature: For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating a binary or text input file, and the file name. If FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. If FILE is omitted or specified as `-', standard input is read. (the sha*sum utilities all refer back to md5sum's description) I better go fix all my scripts that rely on /^[0-9a-f]{32} / -- Alan Curry
bug#8938: make timeout and CTRL-C
=?UTF-8?Q?P=C3=A1draig?= Brady writes: On 26/06/11 20:20, shay shimony wrote: all: timeout 12 sleep 10 Note there is a tab before timeout 12 sleep 10. Then run at same directory where the file is located make and try to press CTRL-C. Notes: CTRL-Z works. When executing timeout without make CTRL-C works. When executing make without timeout CTRL-C works. Drats, That because SIGINT is sent by the terminal to the foreground group. The issue is that `make` and `timeout` use much the same method to control their jobs. I.E. they create their own process group so they can terminate all sub-processes. Are you sure? I see no evidence of that. When I run make with the above makefile, the processes look like this: PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND 1 1451 1451 1451 6 16407 S1000 0:06 -zsh 1451 16407 16407 1451 6 16407 S1000 0:00 make 16407 16408 16408 1451 6 16407 S1000 0:00 timeout 60 sleep 30 16408 16409 16408 1451 6 16407 S1000 0:00 sleep 30 The first PGID is the login shell. The second PGID is make, which was put into its own process group by the shell because the shell has job control enabled. The last PGID is timeout, which put itself into a process group. make never noticed any of them. In the source for GNU make 3.82 there are no calls to setpgrp or setpgid (unless obfuscated from grep). There is the following comment: /* A termination signal won't be sent to the entire process group, but it means we want to kill the children. */ That's above the handling of SIGTERM, which iterates over child processes and passes along the SIGTERM to them. After that is the handling of SIGINT, which doesn't kill child processes (unless they're remote, which is... news to me that make does remote things) but just waits for them. What seems to be happening is that make *doesn't* create a process group, therefore assumes that when it gets a SIGINT, its children have already gotten it too, and it just waits for them to die. A child that puts itself into a new process group screws this up (as would kill -2 `pidof make`). I think the answer is that timeout should put itself into the foreground. That way it would get the SIGINT. make wouldn't get it, but wouldn't need to. timeout would exit quickly after SIGINT and make would proceed or abort according to the exit code. -- Alan Curry
bug#8938: make timeout and CTRL-C
=?UTF-8?Q?P=C3=A1draig?= Brady writes: This is a multi-part message in MIME format. --03030307000505070101 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 27/06/11 21:12, Alan Curry wrote: What seems to be happening is that make *doesn't* create a process group, therefore assumes that when it gets a SIGINT, its children have already gotten it too, and it just waits for them to die. A child that puts itself into a new process group screws this up (as would kill -2 `pidof make`). Thanks for the analysis Alan. Yes you're right I think. In any case the important point is that timeout sets itself as group leader, and is not the foreground group. Right, we have a tree of process groups that goes roughly shell-make-timeout and the one in the middle of the tree is the foreground, receiving tty-based signals. I think the answer is that timeout should put itself into the foreground. That way it would get the SIGINT. make wouldn't get it, but wouldn't need to. timeout would exit quickly after SIGINT and make would proceed or abort according to the exit code. I've a version locally here actually that calls tcsetpgrp() but I discounted that as it's not timeout's place to call that I think. timeout sets itself as group leader so that it can kill everything it starts, but it shouldn't need to grab the foreground group as the shell (or make) may be starting it in the background etc. It seems like this is a misuse of process groups, using them as if they were a handle for killing a whole tree of processes. That's not what they're for. Process groups were invented to support job control, which means the only program that was supposed to mess with them was csh. Only the lack of a kill process tree primitive (and the fact that you can't even query the process tree easily) tempts us into using process groups as a shortcut. Any non-job-control-aware parent process will have a problem with timeout's behavior. We've already seen what GNU make does. pmake simply dies of the SIGINT and leaves the child processes lingering (it probably also assumes they got the SIGINT, and doesn't bother waiting for them). In an interactive shell with job control disabled (set +m in most Bourne-ish shells), the behavior is not good there either. dash, bash, and posh all act like GNU make, appearing to ignore the SIGINT. zsh acts more like pmake, printing a new prompt but leaving the timeout and its child running. timeout's pgrp behavior only appears harmless when the parent process is a shell with job control, which expects its children to be in separate process groups. But in that case, timeout doesn't need to put itself in a new process group because the shell has already done so. So I suggest that if you create a process group, you take on the responsibility of behaving like a job control shell in other ways, including managing the foreground group. (An important piece of that is remembering the original value and restoring it before you exit). -- Alan Curry
bug#8938: make timeout and CTRL-C
=?ISO-8859-1?Q?P=E1draig_Brady?= writes: I'm still not convinced we need to be messing with tcsetpgrp() but you're right in that the disconnect between the timeout process group and that of whatever starts `timeout` should be bridged. I'm testing the attached patch at the moment (which I'll split into 2). It only creates a separate group for the child that `timeout` execs, leaving the timeout process in the original group to propagate signals down. I'll need to do lots of testing with this before I commit. With this patch the child is guaranteed to not be in the foreground (as far as the tty knows) so it will be getting SIGTTIN and possibly SIGTTOU on tty operations. I don't think there's anything that will make every scenario happy. (Except for a recursive-kill that doesn't use pgrps!). -- Alan Curry
bug#8938: make timeout and CTRL-C
Bob Proulx writes: P=E1draig Brady wrote: Paul Eggert wrote: I'd like to have an option to 'timeout' so that it merely calls alarm(2) and then execs COMMAND. This would be simple and fast would avoid the problem in question. This approach has its own issues, but when it works it works great, and it'd be a nice option. I agree. It is nice and simple and well understood. The main problem with that is would only send the signal to the first process, and any processes it started would keep running. Then that is a problem for that parent process to keep track of its own children. It is a recursive situation. If all processes are well behaved then it works okay. And if you ask about processes that are not well behaved then my response would be to fix them so that they are better behaved. That sounds reasonable, but then if something is about to be killed by timeout, there's reason to believe it's not behaving well at the moment. -- Alan Curry
bug#8938: make timeout and CTRL-C
shay shimony writes: With this patch the child is guaranteed to not be in the foreground (as far as the tty knows) so it will be getting SIGTTIN and possibly SIGTTOU on tty operations. You may need to correct me. In practice we see that the timeouted program perform successfully writes to the terminal, though it belongs to a different group then the foreground (in my case make's group is in the foreground and timeout+compiler/test group is in the background, and all output of the compiler and test seem to appear correctly on the terminal). And regarding read, I think it makes sense enough that users will not use timeout for interactive programs that wait for input from the user. So maybe the fact that the timeouted program will not be able to get SIGTTIN and SIGTTOU is not such a disaster? Notice that I wrote possibly before SIGTTOU. There was a reason for that. A background process that writes to the tty will get SIGTTOU if stty tostop is in effect. This is a user preference thing. You can set it if you get annoyed by processes writing to the terminal after you backgrounded them expecting them to be quiet. It's not enabled by default. If the process ignores SIGTTOU, the write will proceed, overriding the user's expressed preference. SIGTTIN is more forceful. There's no stty flag to turn it off, and ignoring it results in EIO. Keyboard input always belongs exclusively to the foreground job. For completeness I'll also mention that SIGTTOU will also be sent to a background process that attempts to change the tty settings, even if tostop is not enabled. -- Alan Curry
bug#8938: make timeout and CTRL-C
=?ISO-8859-1?Q?P=E1draig_Brady?= writes: Given the above setsid make example (which hangs for 10s ignoring Ctrl-C, I'm leaning towards `make` needing to be more shell like, or at least forward the SIGINT etc. to the job, and not assume jobs run in the foreground group). I'm a little worried that you're focusing too much on make, which is just one way to demonstrate the problems of process group abuse. This simple shell script: #!/bin/sh timeout 12 sleep 10 is also nonresponsive to ^C for the same reason as the original makefile. Are you going to argue that the shell is doing something wrong there too? -- Alan Curry
bug#9102: timeout 0 FOO should timeout right away
Paul Eggert writes: sleep 0 sleeps for zero seconds, and timeout 0 FOO should timeout in zero seconds as well. Currently, it doesn't; it times out in an infinite number of seconds. I see why, from the internals (alarm (0) is a special call intended to cancel alarms). However, 'timeout' shouldn't be exposing those internals to users; it should behave like 'sleep' does, as that's more consistent. What's the difference between running a command with a 0 second timeout and not running the command at all? It could be killed before it even gets scheduled. -- Alan Curry
bug#9531: md5sum: confusing documentation for file type output
=?UTF-8?Q?R=C3=BCdiger?= Meier writes: Or in other words you can never validate a ' ' md5sum unless you know about the platform which calculated it. Not exactly. It works transparently if the appropriate translation was done at the time of the transfer from the original platform to the current platform (e.g. FTP'ed in text mode, not binary mode). If you transferred a text file in binary mode, you have something that's bitwise identical, but semantically different, so md5sum is right to complain. Well, at least the above would apply if you believed that having a text file/binary file distinction is a good idea at all. Which I don't. -- Alan Curry
bug#9620: dd: bogus behavior when interrupted
=?UTF-8?Q?P=C3=A1draig?= Brady writes: BTW that ^C being displayed (started around Fedora 11 time (2.6.30)) is very annoying, especially when inserted in the middle of an ANSI code. I mentioned that previously here: http://mail.linux.ie/pipermail/ilug/2011-February/106723.html I've been annoyed by that too. So annoyed that I patched my kernel to get rid of it. It was added between 2.6.24 and 2.6.25. Here's the commit message: |commit ec5b1157f8e819c72fc93aa6d2d5117c08cdc961 |Author: Joe Peterson j...@skyrush.com |Date: Wed Feb 6 01:37:38 2008 -0800 | |tty: enable the echoing of ^C in the N_TTY discipline | |Turn on INTR/QUIT/SUSP echoing in the N_TTY line discipline (e.g. ctrl-C |will appear as ^C if stty echoctl is set and ctrl-C is set as INTR). | |Linux seems to be the only unix-like OS (recently I've verified this on |Solaris, BSD, and Mac OS X) that does *not* behave this way, and I really |miss this as a good visual confirmation of the interrupt of a program in |the console or xterm. I remember this fondly from many Unixs I've used |over the years as well. Bringing this to Linux also seems like a good way |to make it yet more compliant with standard unix-like behavior. | |[a...@linux-foundation.org: coding-style fixes] |Cc: Alan Cox a...@lxorguk.ukuu.org.uk |Signed-off-by: Andrew Morton a...@linux-foundation.org |Signed-off-by: Linus Torvalds torva...@linux-foundation.org And here's what I use to kill it (committed to my own git tree which is exported to no one and has been seen by nobody but me until now): commit 0b76f0a49a52ac37fb220f1481955426b6814f86 Author: Alan Curry pac...@kosh.dhis.org Date: Wed Sep 22 16:35:01 2010 -0500 The echoing of ^C when a process is interrupted from tty may be more like what the real unixes do, but this is a case where Linux was better. Put it back the way it was. When a command's output ends with an incomplete line, the shell can do one of two things, both of them bad: engage its command line editor with the cursor in the wrong column, or force the cursor to the first column before printing the prompt, which obliterates the incomplete line, hiding actual program output. The echo of ^C immediately followed by process death is an instance of this generally bad command output ends with incomplete line behavior. diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index c3954fb..70f5698 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1194,10 +1194,12 @@ send_signal: } if (I_IXON(tty)) start_tty(tty); +#if 0 /* This echoing is a sucky new feature. --Pac. */ if (L_ECHO(tty)) { echo_char(c, tty); process_echoes(tty); } +#endif if (tty-pgrp) kill_pgrp(tty-pgrp, signal, 1); return; -- Alan Curry
bug#9788: chown gets permission denied
Richard Woolley writes: When trying to change ownership of the files in a directory, I mistakenly h= ad the settings wrong in the command, so I got the following ls -l total 16 drw-rw-r-- 4 user proj1 4096 Sep 28 14:23 doc/ drw-rw-r-- 24 user proj1 4096 Sep 28 14:27 modules/ drw-rw-r-- 3 user proj1 4096 Sep 28 14:23 project/ Your first problem is that you've got some directories here with read permission but no x permission. In that situation, this happens: ls -l project total 0 ?- ? ? ? ?? compile.conf ?- ? ? ? ?? myproject.conf ?- ? ? ? ?? novas_fli.so ls can read the directory, getting the filenames, but the lack of x permission prevents it from getting any other information. First chmod u+x doc modules project, then see what you get from ls -l on them. -- Alan Curry
bug#9939: Problems with the SIZE description in man pages for ls
abdallah clark writes: I was wondering if you received my very detailed account of the issues I found with the ls -l --block-size=3DSIZE command. It's been about a week since I sent it, so I wasn't sure what was happening. I looked over that message and prepared a reply explaining the things that you had misunderstood. Then I tried running your examples and realized that I didn't understand some of them either. According to my understanding, several of the behaviors you observed are bugs. So I deleted my reply and decided to wait along with you for someone else to explain it all. Since that hasn't happened yet, I'll go ahead and cover the main point: You're interested in altering the block size used in the ls output, but you haven't investigated what portions of the output are affected by block size. There are 3 instances of the word block in ls(1). 2 of them are in the description of the options that change the block size: --block-size and -k. The 3rd instance is under the only option that actually makes use of the block size: -s. A quick demonstration of -k working. First I have to set POSIXLY_CORRECT because the default block size when not in POSIXLY_CORRECT mode is already 1K, so -k is normally a no-op. $ POSIXLY_CORRECT=1 ; export POSIXLY_CORRECT $ ls -s /bin/ls 224 /bin/ls $ ls -sk /bin/ls 112 /bin/ls Since the -l output is not defined in terms of block size, ls -l and ls -lk will produce exactly the same output. $ ls -l /bin/ls -rwxr-xr-x 1 root root 107124 Feb 8 2011 /bin/ls $ ls -lk /bin/ls -rwxr-xr-x 1 root root 105 Feb 8 2011 /bin/ls Oops. Well, I know they used to produce the same output. And I think they still should and this is a bug. Anyone? On Wed, Nov 2, 2011 at 11:01 AM, Paul Eggert egg...@cs.ucla.edu wrote: [snip] Quote what you're replying to, and put your reply in logical order with it. -- Alan Curry
bug#10016: ls -lk is wrong
I mentioned this already in the bug#9939 thread, but nobody replied and it's really a separate issue so here's an independent report. This behavior: $ ls -l /bin/ls -rwxr-xr-x 1 root root 107124 Feb 8 2011 /bin/ls $ ls -lk /bin/ls -rwxr-xr-x 1 root root 105 Feb 8 2011 /bin/ls is awful. -k should not have any effect on the ls -l field that reports st_size. It is only supposed to possibly affect the reporting of st_blocks by -s and the total line at the start of a full directory listing. I won't make any claims about what --block-size should do, but -k comes from BSD and it should act like BSD. -- Alan Curry
bug#10016: ls -lk is wrong
Jim Meyering writes: I'm thinking of making -k comply, but letting any block-size specification (via --block-size= or an envvar) override that to give the behavior we've seen for the last 9 years. Wow, look what I stirred up. If it's been like this for 9 years, it's been broken for 9 years. As I said originally, BSD is the standard that matters here. It doesn't matter when or even whether POSIX blessed the -k option. Everywhere except GNU, this is simple. The size field of the ls -l output is not defined in terms of blocks, so the block size setting doesn't affect it. Numbers derived from st_blocks are reported in units of blocks, and others aren't. If you're going to define --block-size to have this effect, then you really need to document it as being an option that does 2 separate things: 1. sets the size of a block 2. alters the definition of the -l format -- Alan Curry
bug#10021: [PATCH id] Add error-checking on GNU
Ludovic =?UTF-8?Q?Court=C3=A8s?= writes: OTOH, on POSIX-conforming systems (which includes GNU/Linux, so it may be the majority of systems in use), -1 may well be a valid UID/GID. That's a bizarre statement. 3.428 User ID A non-negative integer that is used to identify a system user. When the identity of a user is associated with a process, a user ID value is referred to as a real user ID, an effective user ID, or a saved set-user-ID. chown(2) uses (uid_t)-1 and (gid_t)-1 as the don't change special values. So does setreuid(2)/setregid(2). setuid(-1) isn't documented as special. Trying it out, it seems to be treated as equivalent to setuid(1). Not what I expected, but it doesn't really support your -1 is a valid uid theory. -- Alan Curry
bug#10136: Can't view some strange characters in some of the man pages
Harold Raulston writes: Hi, Could you tell me what encoding I need to use to view your man pages? I've tried Unicode, Western, Western ISO, but still get some unreadable characters in the EXAMPLES (I've just looked at the find and du commands so far): =C3=A2=E2=82=AC=C3=A2=E2=82=AC=E2=84=A2 linuxcommand find1 can't display re= ad BTW, I'm using Win7 Pro English, IE9. All latest updates. I have the same problem in Chrome... man pages are read with the man program. HTML is Not The Way. [c3 a2 e2 82 ac c3 a2 e2 82 ac e2 84 a2] is what you get when you start with U+2019 RIGHT SINGLE QUOTATION MARK in UTF8, then misinterpret it as windows-1252 and convert it to UTF8 again. We were *so* unfortunate when we didn't have all these extra kinds of quotation marks. -- Alan Curry
bug#10281: change in behavior of du with multiple arguments (commit
Paul Eggert writes: Perhaps this is a bug in POSIX, of course, but there is a good argument for why GNU du behaves the way it does: you get useful behavior that you cannot get easily with the Solaris du behavior. Remind us again... the useful behavior is that du -s returns a column of numbers next to a column of names, and the numbers don't necessarily have any individual meaning relevant to the adjacent names, but you can add them up manually and get something that is correct total for the group. Meanwhile if you wanted the total for the group you would have used -c and not had to add them up manually. Why not let the -c total be correct *and* the -s individual numbers also be correct for the names they are next to? Like this: $ mkdir a b ; echo hello a/a ; ln a/a b/b ; du -cs a b 8 a 8 b 12 total The fact that the numbers on the left don't add up means there is less redundancy in the output. Each number actually tells me something you can't derive from the others. There is higher information content. This is good, not bad. -- Alan Curry
bug#10281: change in behavior of du with multiple arguments (commit
Paul Eggert writes: For example, suppose I have a bunch of hard links that all reside in three directories A, B, and C, and I want to find out how much disk space I'll reclaim by removing C. (This is a common situation with git clones, for example.) With GNU du, I can run du -s A B C and the output line labeled C will tell me how much disk space I'll reclaim. There's no easy way to do this with Solaris du. The straightforward method would be to simply the directory you intend to remove and keep track of the discrepancy between st_nlink and how many links you've seen. I admit that this straightforward method isn't implemented in any standard tool, but your way involves extra work by both du, which must traverse all the other directories which might share files with the target directory; and the user, who must somehow amass that list of directories ahead of time. As a creative improvised use of pre-existing tools it's a good example, but as a justification for an intentional feature, it's just too inefficient. -- Alan Curry
bug#10281: change in behavior of du with multiple arguments (commit
Paul Eggert writes: On 12/16/11 18:36, Alan Curry wrote: The straightforward method would be to simply the directory you intend to remove and keep track of the discrepancy between st_nlink and how many links you've seen. Sorry, I can't parse that. But whatever it is, it sounds like you're talking about what one could do with a program written in C, not with either GNU or Solaris du. Yes, I'm saying that du is just not the tool for this job, although you've managed to twist it to fit. The predict free space after rm -rf foo operation can be done without searching other directories and without requiring the user to specify a list of other directories that might contain links. What you do with du is kludgy by comparison. [...] Of course I'd never want to do that in an actual link farm: it's tricky and brittle and could mess up currently-running builds. But the point is that GNU du is not being inefficient here, any more than Solaris du is. By comparison to a proper tool which doesn't do any unnecessary traversals of extra directories, your use of du is slow and brittle (if the user forgets an alternate directory containing a link, the result is wrong) and has only the slight advantage of already being implemented. Here's a working outline of the single-traversal method. I wouldn't suggest that du should contain equivalent code. A single-purpose perl script, even without pretty output formatting, feels clean enough to me. Since I've gone to the trouble (not much) of writing it, I'll keep it as ~/bin/predict_rm_rf for future use. #!/usr/bin/perl -W use strict; use File::Find; @ARGV or die Usage: $0 directory [directory ...]\n; my $total = 0; my %pending = (); File::Find::find({wanted = sub { my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12]; if(-d _ || $nlink==1) { $total += $blocks; return; } if($nlink == ++$pending{$dev.$ino}) { delete $pending{$dev.$ino}; $total += $blocks; } }}, @ARGV); print $total blocks would be freed by rm -rf @ARGV\n; __END__ -- Alan Curry
bug#10349: tail: fix --follow on FhGFS remote file systems
Bob Proulx writes: Jim Meyering wrote: Are there so many new remote file systems coming into use now? That are not listed in /usr/include/linux/magic.h? The past can always be enumerated. The future is always changing. It isn't possible to have a complete list of future items. It is only possible to have a complete list of past items. The future is not yet written. Between past and future is the present, i.e. the currently running kernel. Shouldn't it return an error when you use an interface that isn't implemented by the underlying filesystem? Why doesn't this happen? -- Alan Curry
bug#10355: Add an option to {md5,sha*} to ignore directories
Bob Proulx writes: severity 10355 wishlist tags 10355 + notabug wontfix moreinfo thanks Erik Auerswald wrote: Gilles Espinasse wrote: I was using a way to check md5sum on a lot of file using for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum /$myfile $ALLFILES}.md5; fi; done ... You could use find $DIR -type f to list regular files only. Yes. Exactly. The capability you ask for is already present. Do you suppose we can convince GNU grep's maintainer to follow this philosphy? $ mkdir d $ touch d/foo $ grep foo * $ It opens and reads, gets EISDIR, and intentionally skips printing it. Grr. But wait, there's a -d option with 3 alternatives for what to do with directories! ...and none of choices is just print the EISDIR so I'll know if I accidentally grepped a directory. -- Alan Curry
bug#10363: /etc/mtab - /proc/mounts symlink affects df(1) output for
jida...@jidanni.org writes: Filesystem 1K-blocksUsed Available Use% Mounted on rootfs 1071468 287940 729100 29% / /dev/disk/by-uuid/551e44e1-2cad-42cf-a716-f2e6caf9dc78 1071468 287940 729100 29% / (I'm replying only on the issue of the duplicate mount point. Someone else can tackle the long ugly name.) The one with rootfs as its device is the initramfs which you automatically get with all recent kernels. Even if you aren't using an initramfs, there's an empty one built into the kernel which gets mounted as the first root filesystem. The real root gets mounted on top of that. So this is a special case of a general problem with no easy solution: What should df do when 2 filesystems are mounted at the same location? It can't easily give correct information for both of them, since the later mount obscures the earlier mount from view. If there's a way for df to get the correct information for the lower mount, I don't know what it would be. If you have a process with a leftover cwd or open fd in the obscured filesystem, you can use that. But generally you won't. But maybe we could do better than reporting incorrectly that the lower mount has size and usage identical to the upper mount! At least df could print a warning at the end if it has seen any duplicate entries. Perhaps there is some way it could figure out which one is on top, and print a bunch of question marks as the lower mount's statistics. If df is running as root, it might be able to unshare(2) the mount namespace, unmount the upper level, and then statfs the mount point again to get the correct results for the lower level. That won't work in all cases (even in a private namespace you can't unmount the filesystem containing your own cwd) and it does nothing for you if you're not root, but still... it would be a cool bonus in the cases where it does work. As a special case, rootfs should probably be excluded from the default listing, since the initramfs is not very interesting most of the time. It could still be shown with the -a option, although it would always have the wrong statistics. Or if you really want to be impressive, default to showing the initramfs if and only if it is the only thing mounted on / - so you can run df within the initramfs before the real root is mounted and get the right result. Or... (brace yourself for the most bold idea yet)... can you imagine a kernel interface that would *cleanly* give access to obscured mount points? Comments on any of the above? Do the BSDs have any bright ideas we can steal, or is their df as embarrassingly bad at handling obscured mount points as ours? -- Alan Curry
bug#10456: bug in du
Lubomir Mateev writes: root@thor:/# fdisk -l Disk /dev/hda: 15.0 GB, 15000330240 bytes ... 9.5T/usr/lib I'm going to guess filesystem corruption causing a file in /usr/lib (not a subdirectory) to have the wrong block count. Do ls -Ssr /usr/lib and see if you get a big surprise at the end. unmount and fsck it to fix if I'm right. -- Alan Curry
bug#11246: Is this a bug in tee?
Adrian May writes: ad@pub:~/junk$ echo abcde | tee (tr a 1) | tr b 2 a2cde 12cde I'd have expected 1bcde instead of 12cde. It seems like the tr b 2 is acting early on the stream going into tr a 2. This is a ubuntu server 10.04 machine. Adrian. The shell sets up the pipeline, and your shell is doing it stupidly. With zsh, you'd get the correct result: % echo abcde | tee (tr a 1) | tr b 2 a2cde 1bcde ksh and bash recently copied the process substitution feature from zsh, and they haven't got it right yet. -- Alan Curry
bug#11667: problem with command date
amanda sabatini writes: Hi, The follow command does not work with the specifics date: 1986-10-25; 1987-10-25; 1989-10-15; 1992-10-25; 1991-10-20; 1995-10-15; 2006-11-05. date +%d --date=1986-10-25 The date command never actually works on dates alone. There is always a time attached to its calculations, even when it's not necessary for the output format you requested. When you don't specify a time with the --date option, the command guesses that you meant 00:00:00. That turns out to be a bad guess in this case, since 00:00:00 didn't exist on those days. All of those are dates on which the Brazil/East time zone shifted into daylight savings time, jumping from 23:59:59 the previous day to 01:00:00 on the day you mentioned. You can avoid this problem by adding 12:00:00 to the requested date. $ TZ=Brazil/East date +%d --date=1986-10-25 date: invalid date `1986-10-25' $ TZ=Brazil/East date +%d --date=1986-10-25 12:00:00 25 -- Alan Curry
bug#11950: cp: Recursively copy ordered for maximal reading speed
Michael writes: Hello, After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in). [grumble at User-Agent: claws-mail.org: One line per paragraph isn't good mail formatting!] It's called directory order. It used to be simply order of creation of files, with deletions creating gaps that could be filled by later creations with same-length or shorter names. But on most new filesystems, directories are stored in a non-linear structure so that lookups in a large directory don't have to scan through every name. For ext2/ext3/ext4, run tune2fs -l on the block device and look for the dir_index option. If you're copying files onto a filesystem with dir_index enabled, the order in which cp creates them should have little effect on the directory's layout afterward. If you're not using dir_index on the destination filesystem, there's your problem! Enable dir_index and all directory lookups will be fast. None of this has anything to do with where the actual data blocks of the file will be allocated. There's no way to control that. If you think that the second file created is going to be adjacent to the first file created... that's never been guaranteed. Filesystem block allocators are way more mysterious than that. If you really think there's something to be gained here, prove it: start with a directory with a lot of files but no subdirectories. Do an alphabetical-order copy like this: $ mkdir other_directory ; cp ./* other_directory (The glob returns the names in sorted order so this gives you the creation order you want, unlike cp -r) Then get it all out of cache so the read test will hit the disk as much as possible: $ sync ; echo 3 /proc/sys/vm/drop_caches And read back the files: $ cd other_directory ; time cat ./* /dev/null Now repeat, but using cp -r to create the other directory so the files get copied in the source directory order. And repeat again, but using $ find . -type f -exec cat '{}' + /dev/null instead of the cat ./* (the glob will cat the files in sorted order, the find will use directory order). If there are any significant differences in the times, and dir_index is enabled, you're onto something. With dir_index disabled, you should get worse times all around, but not a lot worse if the files are big enough that the time spent reading their contents overshadows the time spent on directory lookups. -- Alan Curry
bug#12019: join command - wrong column moved to start of line with
Eric Blake writes: On 07/21/2012 12:20 PM, Jean-Pierre Tosoni wrote: Hello Maintainer, =20 I am using join v8.5 from debian squeeze. =20 now, the command: join -v 2 -1 2 -2 3 a b produces =3D=3D=3D=3D wrong output =3D=3D=3D=3D zzz222 zzz111 keyZ zzz333 I tried reproducing this with coreutils 8.17: $ cat a b axx111 keyX axx222 ayy111 keyY ayy222 xxx111 xxx222 keyX xxx333 zzz111 zzz222 keyZ zzz333 $ join -v2 -1 2 -2 3 a b keyZ zzz111 zzz222 zzz333 but I get the expected order. I don't see a specific mention of a fix for this in NEWS, so I have to wonder if this might be a bug in a debian-specific patch. Can you do some more investigating, such as compiling upstream coreutils to see if the problem still persists for you= ? It's not a Debian-specific problem. I can reproduce the bug with unaltered coreutils 8.9. It was apparently fixed by accident as a side effect of some other work on the join program. commit d4db0cb1827730ed5536c12c0ebd024283b3a4db Author: Pádraig Brady p...@draigbrady.com Date: Wed Jan 5 11:52:54 2011 + join: add -o 'auto' to output a constant number of fields per line d4db0cb1827730ed5536c12c0ebd024283b3a4db can be cherry-picked and applied to older coreutils to fix the bug. I tested this with upstream 8.9 and Debian's 8.5, both applied with fuzz but worked correctly. -- Alan Curry
bug#12339: Bug: rm -fr . doesn't dir depth first deletion yet it is
Bob Proulx writes: Jim Meyering wrote: Could you be thinking of some other rm? Coreutils' rm has rejected that for a long time: ... POSIX requires rm to reject any attempt to delete an explicitly specified . or .. argument (or any argument whose last component is one of those): Hmm... Wow. I decided to check HP-UX 11.11, a now rather old release from twelve years ago in 2000, the oldest easily available to me, and got this: $ /usr/bin/rm -rf . rm: cannot remove .. or . So I guess GNU coreutils is in good company with traditional Unix systems! It has definitely been that way for a long time. Linux has the ability to actually remove a directory that is empty but still referenced as the cwd of some process. This ability is non-traditional (my fuzzy memory says it showed up some time in the 2.2 or 2.4 era). It's worth considering whether this change should be reflected by a relaxation of rm's traditional behavior. rm -rf $PWD, meaning basically the same thing as rm -rf ., works, and leaves you in a directory so empty that ls -a reports no . or .. entries, and no file can be created in the current directory. (open and stat and chdir still work on . and .. though. They're magic.) -- Alan Curry
bug#12339: Bug: rm -fr . doesn't dir depth first deletion yet it is
Jim Meyering writes: Alan Curry wrote: rm -rf $PWD, meaning basically the same thing as rm -rf ., works, and leaves If you use that, in general you would want to add quotes, in case there are spaces or other shell meta-characters: rm -rf $PWD Well, when I do it I'm in zsh which has fixed that particular Bourne shell design error. -- Alan Curry
bug#12339: Bug: rm -fr . doesn't dir depth first deletion yet it
Eric Blake writes: Indeed, reading the original V7 source code from 1979: http://minnie.tuhs.org/cgi-bin/utree.pl?file=3DV7/usr/src/cmd/rm.c [...] shows that _only_ .. was special, . was attempted in-place and didn't fail until the unlink(.) after the directory itself had been emptied. It wasn't until later versions of code that . also became special. I also decided to look around there, and found some of the turning points: Up to 4.2BSD, the V7 behavior was kept. (http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/bin/rm.c) rm -rf . was forbidden in 4.3BSD (26 years ago). http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD/usr/src/bin/rm.c The removal of dir/. (and dir/..) was not forbidden until Reno. http://minnie.tuhs.org/cgi-bin/utree.pl?file=4.3BSD-Reno/src/bin/rm/rm.c cp = rindex(arg, '/'); if (cp == NULL) cp = arg; else ++cp; if (isdot(cp)) { fprintf(stderr, rm: cannot remove `.' or `..'\n); return (0); } Maybe the classical behavior stuck around longer in the more SysV-ish Unices. The Ultrix-11 3.1 tree on TUHS from 1988 has a rm that looks very much like V7, but I can't find anything to compare it to until OpenSolaris. Did POSIX force BSD to change their rm in 1988? I think it's more likely that POSIX simply documents a restriction that BSD had already added. Either way the latest POSIX revisions certainly can't be blamed. -- Alan Curry
bug#12339: Bug: rm -fr . doesn't dir depth first deletion yet it
Linda Walsh writes: So far no one has addressed when the change in -f' went in NOT to ignore the non-deletable dir . and continue recursive delete, In the historic sources I pointed out earlier (4.3BSD and 4.3BSD-Reno) the -f option is not consulted before rejecting removal of . so I don't think the change you're referring to is a change at all. -f never had the effect you think it should have. -- Alan Curry
bug#12339: Bug: rm -fr . doesn't dir depth first deletion yet it
Linda Walsh writes: Alan Curry wrote: Linda Walsh writes: So far no one has addressed when the change in -f' went in NOT to ignore the non-deletable dir . and continue recursive delete, In the historic sources I pointed out earlier (4.3BSD and 4.3BSD-Reno) the -f option is not consulted before rejecting removal of . so I don't think the change you're referring to is a change at all. -f never had the effect you think it should have. If I was using BSD, I would agree. --- But most of my usage has been on SysV compats Solaris, SGI, Linux, a short while on SunOS back in the late 80's, but that would have been before it changed anyway. SGI is dead, Sun is dead, the game's over, we're the winners, and our rm has been this way forever. For all i know it could have been a vendor addin, but that's not the whole point here. Do you want to support making . illegal for all gnu utils for addressing content? I don't think addressing content is a clearly defined operation, no matter how many times you repeat it. Consistency between tools is a good thing, but consistency between OSes is also good, and we'd be losing that if any change was made to GNU rm's default behavior. Even OpenSolaris has the restriction: see lines 160-170 of http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/rm/rm.c I think you'll find many more people against the idea and wondering why it's in 'rm' and why -f doesn't really mean ignore all the errors it can and why that one should be specially treated. Of course they also might wonder why rm doesn't follow the necessary algorithm for deleting files -- and delete contents before dying issuing an error for being unable to delete a parent. Which might also raise why -f shouldn't be usable to silence permission or access errors as it was designed to. Look, I agree isn't not logical or elegant. But we have a standard that all current Unices are obeying, and logic and elegance alone aren't enough to justify changing that. A new option that you can put in an alias is really the most realistic goal. -- Alan Curry
bug#12421: Issue of the cp on Ubuntu 10.10
owen.z...@alitech.com writes: Dear Sir, A strange issue happens when I use the cp tool on two directory. [snip - summary: after recursive cp, some file has the wrong size] I don't have any good ideas about the cause of this problem, but since I didn't see anyone else replying, I'll suggest some investigation techniques. Run cp --version so we know how far back in history we should look for similar bugs. cmp the two versions of the file to see if the short one is just truncated, or if there are other differences. Run df -T on the source and destination. If it's reproducible, run strace -o cptrace cp ... and publish the cptrace for others to look at. (If the files being copied are private, the names and contents will be in the trace so you will have to inspect it yourself.) -- Alan Curry
bug#12478: cat SEGV when I press Ctrl-Alt-SysRq-1 on text console
Rafal W. writes: $ cat /dev/zero ^\Quit (core dumped) Steps to reproduce: 1. Switch to any text console (it doesn't happen in X). 2. Login 3. Run: cat /dev/zero 4. Press: Ctrl-Alt-SysRq-1 (or any number except letters:) What's that supposed to do? Ctrl isn't normally used with SysRq. 5. You'll see: ^\Quit (core dumped) The ^\ character generates a QUIT signal (the same way ^C generates INT), and death with core dump is the default response to SIGQUIT. Ctrl-4 is an alternate way of typing Ctrl-\ so this is all perfectly normal for a key combination involving Ctrl and 4. By adding SysRq into the mix I don't know what exactly you accomplished. Maybe you confused the keyboard. Most keyboards don't have every key wired separately, and weird combinations can send events for keys that weren't pressed. To investigate further, try running 'stty -isig' to disable signal generation, then 'cat /dev/null' or maybe 'od -c' and type your key combinations. Ctrl-D should still work for EOF to get you out, which is not a signal so it's not disabled by stty -isig.
bug#12494: 0 exit status even when chmod fails
Georgiy Treyvus writes: Finally I had him show me the mount options of the relevant partitions. Many I recognized. Some I did not. I started researching those I did Did you notice this one?: Mount options for fat (Note: fat is not a separate filesystem, but a common part of the msdos, umsdos and vfat filesystems.) [...] quiet Turn on the quiet flag. Attempts to chown or chmod files do not return errors, although they fail. Use with caution! If you're getting the quiet behavior without the quiet mount option, I'd say that's a kernel bug. -- Alan Curry
bug#12494: 0 exit status even when chmod fails
Sven Joachim writes: On 2012-09-24 08:37 +0200, Alan Curry wrote: Georgiy Treyvus writes: Finally I had him show me the mount options of the relevant partitions. Many I recognized. Some I did not. I started researching those I did Did you notice this one?: Mount options for fat (Note: fat is not a separate filesystem, but a common part of the msdos, umsdos and vfat filesystems.) [...] quiet Turn on the quiet flag. Attempts to chown or chmod files do not return errors, although they fail. Use with caution! If you're getting the quiet behavior without the quiet mount option, I'd say that's a kernel bug. Actually, it's the default unless you're using Linux 2.6.25. This kernel reported an error to the caller, but since that broke rsync[1,2], 2.6.26 reverted to the previous behavior of silently ignoring chmod attempts which do not work on FAT filesystems[3]. This bug report should probably be closed. If the mount man page disagrees with the kernel, it's still a bug in the man page at least. (Also, the rest of the world needs to work around extra stupidity because of rsync?) -- Alan Curry
bug#12478: cat SEGV when I press Ctrl-Alt-SysRq-1 on text console
Rafal W. writes: But if Control-4 is sending QUIT signal, why: Control-1 does kill the process? I've checked again and actually it's not even about the number. When I press only: Control-SysRq it kills the process as well. Sometimes it happens on press, sometimes on release. Is your SysRq key also the PrtSc key? It will be if your keyboard is a descendant of the IBM PC/AT design. With Alt, it's the SysRq key. Without Alt, it's the PrtSc key. So if your Control-Sysrq combination doesn't include Alt, then it's really Control-PrtSc and you should call it that instead of Control-Sysrq which just confusing. For other keys, the interpretation of modifiers (including Alt) is done in software. The PrtSc/SysRq key is the only one in which a distinction is made in hardware. PrtSc and SysRq are different scancodes. This specialness probably influenced the decision to use SysRq as a magic key for talking to the Linux kernel. Now, on to why you got your SIGQUIT. Well, the default keymap for the Linux console generates ^\ when you press PrtSc. That's not a reason, that's just a fact. I don't know the reason. The Ctrl-4 thing is, I believe, a matter of accurate vt100 emulation. At least it's part of a neat pattern. Ctrl-2 through Ctrl-8 generate all the control codes that aren't ^A through ^Z alphabeticals, in numerical order: key byte echoprt ASCII name Ctrl-2 0 ^@ NUL Ctrl-3 27 ^[ ESC Ctrl-4 28 ^\ FS Ctrl-5 29 ^] GS Ctrl-6 30 ^^ RS Ctrl-7 31 ^_ US Ctrl-8 127^? DEL Notice that one of them, Ctrl-6 for ^^ actually makes sense. The Ctrl-^ is Ctrl-Shift-6 after all. Perhaps the others were simply built around that one as a logical extension. Oops, I got sidetracked. Why does PrtSc generate ^\ on the Linux console? I don't know. Looking at the historical source code, it seems that it has been this way since Linux-0.99.10 (June 7, 1993), in which the keyboard driver was massively overhauled to support loadable keymaps. In 0.99.9 there is this: /* Print screen key sends E0 2A E0 37 and puts the VT100-ESC sequence ESC [ i into the queue, ChN */ puts_queue(\033\133\151); So in conclusion, the PrtSc ^\ mapping snuck in as part of a large patch that wasn't supposed to change any defaults, but did. Accident... or sabotage? Insert your conspiracy theory here. History says Risto Kankkunen did the loadable keymap patch, so that's who to blame. ChN appears to be: * Some additional features added by Christoph Niemann (ChN), March 1993 Whatever the reason behind this annoying ^\, fixing it isn't hard: # It's too easy to hit PrtSc by accident. mapping it to ^\ hurts! loadkeys 'EOF' keycode 99 = VoidSymbol EOF I've had that in my system startup for a long time. Actually it's a bit more complicated since I have a few other keys I like to remap, but the comment is exactly as I wrote it at least 15 years ago. (I don't hit PrtSc by accident much since I got my Happy Hacking keyboard!) VoidSymbol makes the key do nothing at all when pressed. I suppose you could map it to ESC [ i like it used to be in 0.99.9 if you feel like you must right this historical wrong. The remapping has no effect on the usability of the magic SysRq functions, because they magically bypass the remapping table.
bug#12478: cat SEGV when I press Ctrl-Alt-SysRq-1 on text console
Rafal W. writes: So in example if I want to check all currently held Locks with SysRq-D (which doesn't work anyway), so: When I press SysRq-D, I've KSnapshot popping up. In the text console it doesn't work at all. ksnapshot sounds like something that might respond to a PrtSc keypress. This is a sign that you aren't using Alt, so what you've really done is PrtSc-D. Didn't I tell you already to stop using SysRq to descibe key combinations that don't include Alt? WITHOUT ALT IT IS NOT A SYSRQ KEY. Got that yet? Reread it until you do. When I press Control-SysRq-D, my session is getting logout. Well, Ctrl-D is EOF and PrtSc+D is a meaningless combination (as meaningless as pressing D and Q at the same time, it's anyone's guess which will take precedence) When I press Control-Alt-SysRq-D my processes are killed. Too many keys there, I can't guess what they're all doing. Get rid of the Control. And make sure your kernel has CONFIG_LOCKDEP, otherwise the Sysrq+D function is disabled. Also, based on the Subject line, you think SEGV is a synonym for core dump. Stop thinking that. Nothing segfaulted. SIGSEGV is one of many signals that can cause a core dump. SIGQUIT is another one. -- Alan Curry
bug#13912: Feedback on coreutils 8.13
the generic --with-PACKAGE and --without-PACKAGE lines are included in the help output. Their presence seems to imply that the list of valid values for PACKAGE is open-ended, but then you immediately get a complete list. I think the help would be more helpful if those top 2 lines were deleted. The same goes for the enable/disable section. (I also think the distinction between with/without and enable/disable is something that isn't helpful to anyone but the people maintaining the autoconf scripts. If there was an upper layer that made --enable an alias for --with and --disable an alias for --without, the users would probably be grateful for it.) So, the specific things you tried were wrong because: Initially tried ./configure --enable-md5sum but this gave configure: WARNING: unrecognized options: --enable-md5sum md5sum isn't a feature you can enable/disable (--enable-FEATURE). The help output lists them all. and proceeded anyway. When `make' was run, many errors were reported, concerning expr.c (see next section). Since these concern the expr command which was not really needed, tried repeating ./configure using --without-PACKAGE ./configure --without-expr expr isn't a package you can with/without (--without-PACKAGE). The help output lists them all. It's great that after all this trouble you've held on to your optimistic belief that there must be some way of configuring just the program you want, if only you can just find the right syntax. Sadly, it just isn't so. Ideally, make src/md5sum or make -C src md5sum would work, but the Makefiles in coreutils aren't quite good enough for it to work out of the box. Some dependencies are missing. But if your first attempt got far enough to blow up on expr.c, make -C src md5sum might actually work afterward. 3. Some problems with configure Retried make, redirecting the output to a log file. The errors in expr were more extensive than realised before. The first error is expr.c:54:18: error: gmp.h: No such file or directory It seems that configure has made an incorrect decision about the availability of gmp, which is not available (but is placed ready to be installed along with the gcc sources. It had previously been established that it was a Prerequisite). Noted that config.status has D[HAVE_GMP]= 1 It sounds like configure found your not-yet-installed gmp and tried to use it, with disastrous results. This is the part of the bug report where you should include your config.log, so we can see exactly how that HAVE_GMP became 1. And I won't be surprised if it turns out to be a bug that's already fixed in 8.21. and the expr.c source tests this. It seems that configure has incorrectly decided that gmp is available, and expr.c fails to find the header, and all other errors arise from this. Since the expr.c source allowed for the test failing, it seemed possible to proceed without gmp. So config.status was modified so that D[HAVE_GMP]= 0 Editing a config.status by hand? That sure shows bravery and determination. I'm quitting here. The rest of the story needs to be read by someone who actually knows MacOS. -- Alan Curry