Jim Meyering <[EMAIL PROTECTED]> writes: > http://article.gmane.org/gmane.comp.gnu.coreutils.bugs/1464 > > but no one has begun work on that, as far as I know.
Here's a proposed implementation for that idea. It was a bit trickier than I thought it would be, mostly because I thought of more optimizations. 2004-03-27 Paul Eggert <[EMAIL PROTECTED]> * NEWS: cp -pu and mv -u (when copying) now take the destination file system time stamp resolution into account. * doc/coreutils.texi (mv invocation): Document this. (cp invocation): Document -u (it was missing!) with new behavior. * lib/utimecmp.c, lib/utimecmp.h, m4/utimecmp.m4: New files. * lib/Makefile.am (libfetish_a_SOURCES): Add utimecmp.c, utimecmp.h. * m4/prereq.m4 (jm_PREREQ): Require gl_UTIMECMP. * src/copy.c: Include "utimecmp.h". (copy_internal): Compare time stamps using utimecmp rather than MTIME_CMP. Index: NEWS =================================================================== RCS file: /home/meyering/coreutils/cu/NEWS,v retrieving revision 1.194 diff -p -u -r1.194 NEWS --- NEWS 24 Mar 2004 17:38:58 -0000 1.194 +++ NEWS 24 Mar 2004 23:15:42 -0000 @@ -4,6 +4,12 @@ GNU coreutils NEWS ** New features + cp -pu and mv -u (when copying) now don't bother to update the + destination if the resulting time stamp would be no newer than the + preexisting time stamp. This saves work in the common case when + copying or moving multiple times to the same destination in a file + system with a coarse time stamp resolution. + 'df', 'du', and 'ls' now take the default block size from the BLOCKSIZE environment variable if the BLOCK_SIZE, DF_BLOCK_SIZE, DU_BLOCK_SIZE, and LS_BLOCK_SIZE environment variables are not set. Index: doc/coreutils.texi =================================================================== RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v retrieving revision 1.173 diff -p -u -r1.173 coreutils.texi --- doc/coreutils.texi 24 Mar 2004 17:38:17 -0000 1.173 +++ doc/coreutils.texi 24 Mar 2004 23:11:13 -0000 @@ -6402,6 +6402,19 @@ results in an error message on systems t @optTargetDirectory [EMAIL PROTECTED] -u [EMAIL PROTECTED] --update [EMAIL PROTECTED] -u [EMAIL PROTECTED] --update [EMAIL PROTECTED] newer files, copying only +Do not copy a non-directory that has an existing destination with the +same or newer modification time. If time stamps are being preserved, +the comparison is to the source time stamp truncated to the +resolutions of the destination file system and of the system calls +used to update time stamps; this avoids duplicate work if several [EMAIL PROTECTED] -pu} commands are executed with the same source and +destination. + @item -v @itemx --verbose @opindex -v @@ -6798,6 +6811,11 @@ about each existing destination file. @cindex newer files, moving only Do not move a non-directory that has an existing destination with the same or newer modification time. +If the move is across file system boundaries, the comparison is to the +source time stamp truncated to the resolutions of the destination file +system and of the system calls used to update time stamps; this avoids +duplicate work if several @samp{mv -u} commands are executed with the +same source and destination. @item -v @itemx --verbose Index: lib/Makefile.am =================================================================== RCS file: /home/meyering/coreutils/cu/lib/Makefile.am,v retrieving revision 1.182 diff -p -u -r1.182 Makefile.am --- lib/Makefile.am 23 Mar 2004 17:34:05 -0000 1.182 +++ lib/Makefile.am 24 Mar 2004 20:20:57 -0000 @@ -115,6 +115,7 @@ libfetish_a_SOURCES = \ unistd-safer.h \ unlocked-io.h \ userspec.c userspec.h \ + utimecmp.c utimecmp.h \ utimens.c utimens.h \ version-etc.c version-etc.h \ xalloc.h \ Index: m4/prereq.m4 =================================================================== RCS file: /home/meyering/coreutils/cu/m4/prereq.m4,v retrieving revision 1.83 diff -p -u -r1.83 prereq.m4 --- m4/prereq.m4 18 Dec 2003 10:33:39 -0000 1.83 +++ m4/prereq.m4 24 Mar 2004 22:21:02 -0000 @@ -103,6 +103,7 @@ AC_DEFUN([jm_PREREQ], AC_REQUIRE([gl_UNICODEIO]) AC_REQUIRE([gl_UNISTD_SAFER]) AC_REQUIRE([gl_USERSPEC]) + AC_REQUIRE([gl_UTIMECMP]) AC_REQUIRE([gl_UTIMENS]) AC_REQUIRE([gl_XALLOC]) AC_REQUIRE([gl_XGETCWD]) Index: src/copy.c =================================================================== RCS file: /home/meyering/coreutils/cu/src/copy.c,v retrieving revision 1.159 diff -p -u -r1.159 copy.c --- src/copy.c 12 Mar 2004 11:53:18 -0000 1.159 +++ src/copy.c 28 Mar 2004 00:09:07 -0000 @@ -39,6 +39,7 @@ #include "quote.h" #include "same.h" #include "savedir.h" +#include "utimecmp.h" #include "utimens.h" #include "xreadlink.h" @@ -945,16 +946,28 @@ copy_internal (const char *src_path, con return 1; } - if (x->update && MTIME_CMP (src_sb, dst_sb) <= 0) + if (x->update) { - /* We're using --update and the source file is older - than the destination file, so there is no need to - copy or move. */ - /* Pretend the rename succeeded, so the caller (mv) - doesn't end up removing the source file. */ - if (rename_succeeded) - *rename_succeeded = 1; - return 0; + /* When preserving time stamps (but not moving within a file + system), don't worry if the destination time stamp is + less than the source merely because of time stamp + truncation. */ + int options = ((x->preserve_timestamps + && ! (x->move_mode + && dst_sb.st_dev == src_sb.st_dev)) + ? UTIMECMP_TRUNCATE_SOURCE + : 0); + + if (0 <= utimecmp (dst_path, &dst_sb, &src_sb, options)) + { + /* We're using --update and the destination is not older + than the source, so do not copy or move. Pretend the + rename succeeded, so the caller (if it's mv) doesn't + end up removing the source file. */ + if (rename_succeeded) + *rename_succeeded = 1; + return 0; + } } } --- /dev/null Tue Mar 18 13:55:57 2003 +++ lib/utimecmp.c Sat Mar 27 16:22:09 2004 @@ -0,0 +1,340 @@ +/* utimecmp.c -- compare file time stamps + + Copyright (C) 2004 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + +/* Written by Paul Eggert. */ + +#if HAVE_CONFIG_H +# include <config.h> +#endif + +#include "utimecmp.h" + +#if HAVE_INTTYPES_H +# include <inttypes.h> +#endif +#if HAVE_STDINT_H +# include <stdint.h> +#endif + +#include <limits.h> +#include <stdbool.h> +#include <stdlib.h> +#include "hash.h" +#include "timespec.h" +#include "utimens.h" +#include "xalloc.h" + +/* Verify a requirement at compile-time (unlike assert, which is runtime). */ +#define verify(name, assertion) struct name { char a[(assertion) ? 1 : -1]; } + +#ifndef MAX +# define MAX(a, b) ((a) > (b) ? (a) : (b)) +#endif + +#ifndef SIZE_MAX +# define SIZE_MAX ((size_t) -1) +#endif + +/* The extra casts work around common compiler bugs. */ +#define TYPE_SIGNED(t) (! ((t) 0 < (t) -1)) +/* The outer cast is needed to work around a bug in Cray C 5.0.3.0. + It is necessary at least when t == time_t. */ +#define TYPE_MINIMUM(t) ((t) (TYPE_SIGNED (t) \ + ? ~ (t) 0 << (sizeof (t) * CHAR_BIT - 1) : (t) 0)) +#define TYPE_MAXIMUM(t) ((t) (~ (t) 0 - TYPE_MINIMUM (t))) + +enum { BILLION = 1000 * 1000 * 1000 }; + +/* Best possible resolution that utimens can set and stat can return, + due to system-call limitations. It must be a power of 10 that is + no greater than 1 billion. */ +#if HAVE_WORKING_UTIMES && defined ST_MTIM_NSEC +enum { SYSCALL_RESOLUTION = 1000 }; +#else +enum { SYSCALL_RESOLUTION = BILLION }; +#endif + +/* Describe a file system and its time stamp resolution in nanoseconds. */ +struct fs_res +{ + /* Device number of file system. */ + dev_t dev; + + /* An upper bound on the time stamp resolution of this file system, + ignoring any resolution that cannot be set via utimens. It is + represented by an integer count of nanoseconds. It must be + either 2 billion, or a power of 10 that is no greater than a + billion and is no less than SYSCALL_RESOLUTION. */ + int resolution; + + /* True if RESOLUTION is known to be exact, and is not merely an + upper bound on the true resolution. */ + bool exact; +}; + +/* Hash some device info. */ +static size_t +dev_info_hash (void const *x, size_t table_size) +{ + struct fs_res const *p = x; + + /* Beware signed arithmetic gotchas. */ + if (TYPE_SIGNED (dev_t) && SIZE_MAX < MAX (INT_MAX, TYPE_MAXIMUM (dev_t))) + { + uintmax_t dev = p->dev; + return dev % table_size; + } + + return p->dev % table_size; +} + +/* Compare two dev_info structs. */ +static bool +dev_info_compare (void const *x, void const *y) +{ + struct fs_res const *a = x; + struct fs_res const *b = y; + return a->dev == b->dev; +} + +/* Return -1, 0, 1 based on whether the destination file (with name + DST_NAME and status DST_STAT) is older than SRC_STAT, the same age + as SRC_STAT, or newer than SRC_STAT, respectively. + + If OPTIONS & UTIMECMP_TRUNCATE_SOURCE, do the comparison after SRC is + converted to the destination's timestamp resolution as filtered through + utimens. In this case, return -2 if the exact answer cannot be + determined; this can happen only if the time stamps are very close and + there is some trouble accessing the file system (e.g., the user does not + have permission to futz with the destination's time stamps). */ + +int +utimecmp (char const *dst_name, + struct stat const *dst_stat, + struct stat const *src_stat, + int options) +{ + /* Things to watch out for: + + The code uses a static hash table internally and is not safe in the + presence of signals, multiple threads, etc. + + int and long int might be 32 bits. Many of the calculations store + numbers up to 2 billion, and multiply by 10; they have to avoid + multiplying 2 billion by 10, as this exceeds 32-bit capabilities. + + time_t might be unsigned. */ + + verify (time_t_is_integer, (time_t) 0.5 == 0); + verify (twos_complement_arithmetic, -1 == ~1 + 1); + + /* Destination and source time stamps. */ + time_t dst_s = dst_stat->st_mtime; + time_t src_s = src_stat->st_mtime; + int dst_ns = TIMESPEC_NS (dst_stat->st_mtim); + int src_ns = TIMESPEC_NS (src_stat->st_mtim); + + if (options & UTIMECMP_TRUNCATE_SOURCE) + { + /* Look up the time stamp resolution for the destination device. */ + + /* Hash table for devices. */ + static Hash_table *ht; + + /* Information about the destination file system. */ + static struct fs_res *new_dst_res; + struct fs_res *dst_res; + + /* Time stamp resolution in nanoseconds. */ + int res; + + if (! ht) + ht = hash_initialize (16, NULL, dev_info_hash, dev_info_compare, free); + if (! new_dst_res) + { + new_dst_res = xmalloc (sizeof *new_dst_res); + new_dst_res->resolution = 2 * BILLION; + new_dst_res->exact = false; + } + new_dst_res->dev = dst_stat->st_dev; + dst_res = hash_insert (ht, new_dst_res); + if (! dst_res) + xalloc_die (); + + if (dst_res == new_dst_res) + { + /* NEW_DST_RES is now in use in the hash table, so allocate a + new entry next time. */ + new_dst_res = NULL; + } + + res = dst_res->resolution; + + if (! dst_res->exact) + { + /* This file system's resolution is not known exactly. + Deduce it, and store the result in the hash table. */ + + time_t dst_a_s = dst_stat->st_atime; + time_t dst_c_s = dst_stat->st_ctime; + time_t dst_m_s = dst_s; + int dst_a_ns = TIMESPEC_NS (dst_stat->st_atim); + int dst_c_ns = TIMESPEC_NS (dst_stat->st_ctim); + int dst_m_ns = dst_ns; + + /* Set RES to an upper bound on the file system resolution + (after truncation due to SYSCALL_RESOLUTION) by inspecting + the atime, ctime and mtime of the existing destination. + We don't know of any file system that stores atime or + ctime with a higher precision than mtime, so it's valid to + look at them too. */ + { + bool odd_second = (dst_a_s | dst_c_s | dst_m_s) & 1; + + if (SYSCALL_RESOLUTION == BILLION) + { + if (odd_second | dst_a_ns | dst_c_ns | dst_m_ns) + res = BILLION; + } + else + { + int a = dst_a_ns; + int c = dst_c_ns; + int m = dst_m_ns; + + /* Write it this way to avoid mistaken GCC warning + about integer overflow in constant expression. */ + int SR10 = SYSCALL_RESOLUTION; SR10 *= 10; + + if ((a % SR10 | c % SR10 | m % SR10) != 0) + res = SYSCALL_RESOLUTION; + else + for (res = SR10, a /= SR10, c /= SR10, m /= SR10; + (res < dst_res->resolution + && (a % 10 | c % 10 | m % 10) == 0); + res *= 10, a /= 10, c /= 10, m /= 10) + if (res == BILLION) + { + if (! odd_second) + res *= 2; + break; + } + } + + dst_res->resolution = res; + } + + if (SYSCALL_RESOLUTION < res) + { + struct timespec timespec[2]; + struct stat dst_status; + + /* Ignore source time stamp information that must necessarily + be lost when filtered through utimens. */ + src_ns -= src_ns % SYSCALL_RESOLUTION; + + /* If the time stamps disagree widely enough, there's no need + to interrogate the file system to deduce the exact time + stamp resolution; return the answer directly. */ + { + time_t s = src_s & ~ (res == 2 * BILLION); + if (src_s < dst_s || (src_s == dst_s && src_ns <= dst_ns)) + return 1; + if (dst_s < s + || (dst_s == s && dst_ns < src_ns - src_ns % res)) + return -1; + } + + /* Determine the actual time stamp resolution for the + destination file system (after truncation due to + SYSCALL_RESOLUTION) by setting the access time stamp of the + destination to the existing access time, except with + trailing nonzero digits. */ + + timespec[0].tv_sec = dst_a_s; + timespec[0].tv_nsec = dst_a_ns; + timespec[1].tv_sec = dst_m_s | (res == 2 * BILLION); + timespec[1].tv_nsec = dst_m_ns + res / 9; + + /* Set the modification time. But don't try to set the + modification time of symbolic links; on many hosts this sets + the time of the pointed-to file. */ + if (S_ISLNK (dst_stat->st_mode) + || utimens (dst_name, timespec) != 0) + return -2; + + /* Read the modification time that was set. It's safe to call + 'stat' here instead of worrying about 'lstat'; either the + caller used 'stat', or the caller used 'lstat' and found + something other than a symbolic link. */ + { + int stat_result = stat (dst_name, &dst_status); + + if (stat_result + | (dst_status.st_mtime ^ dst_m_s) + | (TIMESPEC_NS (dst_status.st_mtim) ^ dst_m_ns)) + { + /* The modification time changed, or we can't tell whether + it changed. Change it back as best we can. */ + timespec[1].tv_sec = dst_m_s; + timespec[1].tv_nsec = dst_m_ns; + utimens (dst_name, timespec); + } + + if (stat_result != 0) + return -2; + } + + /* Determine the exact resolution from the modification time + that was read back. */ + { + int old_res = res; + int a = (BILLION * (dst_status.st_mtime & 1) + + TIMESPEC_NS (dst_status.st_mtim)); + + res = SYSCALL_RESOLUTION; + + for (a /= res; a % 10 != 0; a /= 10) + { + if (res == BILLION) + { + res *= 2; + break; + } + res *= 10; + if (res == old_res) + break; + } + } + } + + dst_res->resolution = res; + dst_res->exact = true; + } + + /* Truncate the source's time stamp according to the resolution. */ + src_s &= ~ (res == 2 * BILLION); + src_ns -= src_ns % res; + } + + /* Compare the time stamps and return -1, 0, 1 accordingly. */ + return (dst_s < src_s ? -1 + : dst_s > src_s ? 1 + : dst_ns < src_ns ? -1 + : dst_ns > src_ns); +} --- /dev/null Tue Mar 18 13:55:57 2003 +++ lib/utimecmp.h Sat Mar 27 16:08:36 2004 @@ -0,0 +1,38 @@ +/* utimecmp.h -- compare file time stamps + + Copyright (C) 2004 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + +/* Written by Paul Eggert. */ + +#ifndef UTIMECMP_H +#define UTIMECMP_H 1 + +#include <sys/types.h> +#include <sys/stat.h> + +/* Options for utimecmp. */ +enum +{ + /* Before comparing, truncate the source time stamp to the + resolution of the destination file system and to the resolution + of utimens. */ + UTIMECMP_TRUNCATE_SOURCE = 1 +}; + +int utimecmp (char const *, struct stat const *, struct stat const *, int); + +#endif --- /dev/null Tue Mar 18 13:55:57 2003 +++ m4/utimecmp.m4 Wed Mar 24 14:22:15 2004 @@ -0,0 +1,14 @@ +dnl Copyright (C) 2004 Free Software Foundation, Inc. +dnl This file is free software, distributed under the terms of the GNU +dnl General Public License. As a special exception to the GNU General +dnl Public License, this file may be distributed as part of a program +dnl that contains a configuration script generated by Autoconf, under +dnl the same distribution terms as the rest of that program. + +AC_DEFUN([gl_UTIMECMP], +[ + dnl Prerequisites of lib/utimecmp.c. + AC_REQUIRE([gl_TIMESPEC]) + AC_REQUIRE([gl_FUNC_UTIMES]) + : +]) _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils