[Bug 8265] Various corruption of devices and sockets

2013-06-03 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=8265

--- Comment #11 from grarp...@gmail.com 2013-06-03 07:11:56 UTC ---
Way back then... I think I modified the patch a bit, ran against just a few
files in a dir, and reconfirmed it. The workproduct is still here somewhere...
a bet from memory says don't close this till someone can recheck. Perhaps me
when that box gets rebuilt.

-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Announce: Tool for replicating filesystem changes

2013-06-03 Thread Dan Shearer
Hello rsync-ers,

Announcing 'should', a GPLv3 tool for recording and reacting to
filesystem events that has synergies with and differences from common
rsync use cases:

   https://github.com/gladserv/should

Written by Claudio, it is in use at Gladserv (a hosting company,
http://gladserv.com/ ) and The PODFather (a SaaS provider,
http://thepodfather.com ) and a few other places. This is a call for some
more general testing.

The current 'should' design uses an efficient and careful algorithm
around inotify on the collection side and a batching/replay system to
implement other functionality, including optionally linking to librsync.
Additional technologies may be implemented in the future, perhaps as
discussed below.

Comparison
--

rsync is known to be highly reliable after years of testing. 'should' is not.

rsync saves network bandwidth. 'should' saves disk bandwidth as
well as network bandwidth.

rsync cannot reliably and quickly detect subtree renames. 'should' does both.

rsync doesn't care when a file is modified, only that there is a
difference between two files. 'should' detects filesystem events in
real-time, which may (or may not) be applied to a similar filesystem
elsewhere.

rsync on a live filesystem is lossy due to the time it takes to traverse
a tree. 'should' watches the whole tree at once and records all changes.

rsync scales to an arbitary number of changes between filesystems. 'should' can 
potentially
be overwhelmed by too many changes happening too quickly in real-time,
although the total size of the filesystem or total number of changes is 
probably 
not a concern.

rsync has one main use: file copying. 'should' can be used for many
things, including monitoring, notifying, mirroring, snapshotting and more.

rsync can optionally batch changes for later replay (--write-batch),
although rarely used. 'should' always batches changes, although the
replay can take place immediately.

rsync can deal with all Unix file-like objects such as fifos, devices
files etc. 'should' can too.

rsync knows about filesystem boundaries. So does 'should'.

rsync cannot mirror live filesystems, there will always be a lag and
potentially unresolvable files due to them constantly changing. 'should'
is designed to mirror live filesystems, and may handle the
constantly-changing case fairly well (do let us know :)

rsync knows about ACLs. 'should' does not.

rsync has its own protocol, a simple security system and can run over
any transparent shell such as ssh. So does 'should'.

rsync cannot connect over SSL (although it can be tunneled [1]). 'should' uses 
SSL by default.

rsync is ~88k LOC. 'should' is ~26k LOC.

What 'should' Consists of
-

'should' consists of a C utility (various modes including client and
server), a library, and a feature-complete Perl module as an example of
language bindings. The program tries very hard to avoid the traditional
issues of a kernel API like inotify, including: not handling
subdirectories recursively; race conditions (what happens if you mkdir
or rmdir a watched directory, many times a second?); how do you store
and catch up events if the kernel has been too busy; how do you prevent
event overruns etc.  Optionally, differential rsync file transfer can be
used to replicate events where they affect large files. 

Documentation is complete and Unix-like, although currently more suited
to readers of this list than first-time users. In some cases (eg what
exactly is the dirsync option?) you also need to read the protocol
summary.

There is one corner case where 'should' is expected to work really well,
which is replication of email, especially Maildirs.  'should' also works
with multiple replicants and cascading replication architectures.  Some
experiments have been done to implement multimaster in the general case
- to the extent that multimaster is useful, and there are theoretical
questions that this raises - but none of these are the git tree. In the
Maildir-and-similar case, multimaster probably works reliably (ie
multiple inbound MTA and multiple imapd servers) due to the way Maildir
works in theory, but nobody is promising 'should' will definitely work
in this configuration. 'should' has had basic testing in a ring
architecture, it might work but at least is not expected to deadlock,
resonate or explode.

Problems


* inotify is Linux-specific, although 'should' portability to the existing 
inotify-like APIs 
  on most mainstream operating systems is very possible and expected.

* The maximum useful value for /proc/sys/fs/inotify/max_user_watches is 32768 .
  Which is quickly exceeded on any moderate Maildir deployment.

* inotify has no concept of byteranges to indicate where a modification
  happened.

Architecture Discussion
---

* inotify is about extending specific Linux filesystems. But replication
  ought to be as filesystem-independent as possible as well as
  independent of operating sytems and