Re: direct write patch

2001-11-12 Thread Dave Dykstra

Oh boy, I think you're getting into quite a can of worms there.

At a minimum this option should imply the --partial option because if the
operation is aborted the file will be left partially transferred.  Note
that if you're trying to use the rsync rolling checksum algorithm to
minimize bandwidth that if a transfer is interrupted all the previous
data will be lost so there will be less data to compare against when the
transfer is retried.

Next, keep in mind that the receiving side of an rsync transfer uses two
independent processes, one for generating checksums and one for creating
files.  I'm not knowledgable enough to know whether or not the creating files
operation is guaranteed to begin only after the checksum generation is 
completed on each file, but if it isn't then overwriting a file could be
a big problem.  Have you tried transferring any very large files?

I'm surprised you didn't need to do anything to finish_transfer to keep
the robust_rename from returning an error.

The last thing that comes to mind is that overwriting files that are open
by another process, such as a running executable, can be a problem.  An
unlink/rename works much better for that.

I'm sure there are more issues too.

- Dave Dykstra


On Mon, Nov 12, 2001 at 10:12:54AM -0800, Don Mahurin wrote:
 I have attached a patch that supports a new --direct-write option.
 
 The result of using this option is to write directly to the destination
 files, instead of a temporary file first.
 
 The reason this patch is needed is for rsyncing to a device where the
 device is full or nearly full.
 
 Say that I am writing to a device that has 1 Meg free, and a 2 meg file
 on that device is out of date.
 Rsync will first attempt to write a new temp file, and fail, SIGUSR1'ing
 itself, and outputting Error 20.
 
 Specifically, I am writing a linux root fs to a 32Meg compact flash, and
 libc needs to be updated, and rsync fails.
 
 This patch simply sets fnametmp to fname.
 
 Two issues with the patch, that hopefully developers can answer:
 
 - In direct-write mode, I open without O_EXCL, as the file likely does
 exist.
   Should the destination file be deleted instead? (I do not know what
 exactly the race condition is)
 
 - There is a section after the assignment of fnametmp, and before the
 open that does do_mktemp, then
   receive_data.  What is the purpose of this part? I skip it for
 direct-write, and it works, but what do I know?
 
 
 -don

 Only in rsync-2.4.6-direct-write/lib: dummy
 diff -ru rsync-2.4.6/options.c rsync-2.4.6-direct-write/options.c
 --- rsync-2.4.6/options.c Tue Sep  5 19:46:43 2000
 +++ rsync-2.4.6-direct-write/options.cSun Nov 11 10:40:01 2001
 @@ -22,6 +22,7 @@
  
  
  int make_backups = 0;
 +int direct_write = 0;
  int whole_file = 0;
  int copy_links = 0;
  int preserve_links = 0;
 @@ -147,6 +148,7 @@
rprintf(F, --ignore-errors delete even if there are IO errors\n);
rprintf(F, --max-delete=NUMdon't delete more than NUM files\n);
rprintf(F, --partial   keep partially transferred files\n);
 +  rprintf(F, --direct-write  write directly to the destination 
files\n);
rprintf(F, --force force deletion of directories even if not 
empty\n);
rprintf(F, --numeric-ids   don't map uid/gid values by user/group 
name\n);
rprintf(F, --timeout=TIME  set IO timeout in seconds\n);
 @@ -188,7 +190,7 @@
OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS,
OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, 
OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO,
 -  OPT_MODIFY_WINDOW};
 +  OPT_MODIFY_WINDOW, OPT_DIRECT_WRITE};
  
  static char *short_options = oblLWHpguDCtcahvqrRIxnSe:B:T:zP;
  
 @@ -227,6 +229,7 @@
{perms,   0, 0,'p'},
{links,   0, 0,'l'},
{copy-links,  0, 0,'L'},
 +  {direct-write, 0, 0,OPT_DIRECT_WRITE},
{copy-unsafe-links, 0, 0,  OPT_COPY_UNSAFE_LINKS},
{safe-links,  0, 0,OPT_SAFE_LINKS},
{whole-file,  0, 0,'W'},
 @@ -400,6 +403,10 @@
   safe_symlinks=1;
   break;
  
 + case OPT_DIRECT_WRITE:
 + direct_write = 1;
 + break;
 +
   case 'h':
   usage(FINFO);
   exit_cleanup(0);
 @@ -554,6 +561,8 @@
   keep_partial = 1;
   break;
  
 +
 +
   case OPT_IGNORE_ERRORS:
   ignore_errors = 1;
   break;
 @@ -691,6 +700,9 @@
   slprintf(mdelete,sizeof(mdelete),--max-delete=%d,max_delete);
   args[ac++] = mdelete;
   }
 +
 + if (direct_write)
 + args[ac++] = --direct-write;
  
   if (io_timeout) {
   slprintf(iotime,sizeof(iotime),--timeout=%d,io_timeout);
 diff -ru 

Re: direct write patch

2001-11-12 Thread Don Mahurin

Perhaps, all that I need is a  --delete-before-update option that just unlinks the 
file before it
starts to write the temp file. Then we avoid the possible issues that you raised.
I can still see a case where --direct-write may be useful (read-write file in a 
read-only dir),  but
this is probably not a common situation, and I don't want to tackle those issues yet.

-don

Dave Dykstra wrote:

 Oh boy, I think you're getting into quite a can of worms there.

 At a minimum this option should imply the --partial option because if the
 operation is aborted the file will be left partially transferred.  Note
 that if you're trying to use the rsync rolling checksum algorithm to
 minimize bandwidth that if a transfer is interrupted all the previous
 data will be lost so there will be less data to compare against when the
 transfer is retried.

 Next, keep in mind that the receiving side of an rsync transfer uses two
 independent processes, one for generating checksums and one for creating
 files.  I'm not knowledgable enough to know whether or not the creating files
 operation is guaranteed to begin only after the checksum generation is
 completed on each file, but if it isn't then overwriting a file could be
 a big problem.  Have you tried transferring any very large files?

 I'm surprised you didn't need to do anything to finish_transfer to keep
 the robust_rename from returning an error.

 The last thing that comes to mind is that overwriting files that are open
 by another process, such as a running executable, can be a problem.  An
 unlink/rename works much better for that.

 I'm sure there are more issues too.

 - Dave Dykstra





Re: direct write patch

2001-11-12 Thread Dave Dykstra

On Mon, Nov 12, 2001 at 11:50:01AM -0800, Don Mahurin wrote:
 Perhaps, all that I need is a  --delete-before-update option that just
 unlinks the file before it starts to write the temp file. Then we avoid the
 possible issues that you raised.  I can still see a case where
 --direct-write may be useful (read-write file in a read-only dir),  but
 this is probably not a common situation, and I don't want to tackle those
 issues yet.


Wait, I forgot something more fundamental about the way the rsync
implementation works.  The one process on the receiver side generates
checksums, but the other one puts together pieces of the old file as well
as the pieces from the new file that get sent across the network.  If it
happens to want pieces that are earlier in the file than what is being
written, your file will be corrupted if you're overwriting it.  For
example, if it needs to move the data at the point 50K into the file
forward to the point 100K in the file, it will already have been
overwritten by the new data.

If on the other hand you use your proposed --delete-before-update option,
it won't do you any good because the operating system will not actually
delete the file until rsync closes it after it is completed, because rsync
will hold the file descriptor open the whole time.  So it won't save you
any space.

Sorry, but I don't think it's possible to do what you want and still use
the rsync algorithm.  You could probably do it in conjunction with the 
--whole-file option which turns off the rsync algorithm, but then you
give up a lot.

- Dave Dykstra