On  4 Aug 2002, Wayne Davison <[EMAIL PROTECTED]> wrote:

> Your previous proposal sounded quite a bit more fine-grained than what
> rZync is doing.  For instance, it sounded like you would have much more
> primitive building-block messages and move much of the controlling
> smarts into something like a python-language scripting layer.  While
> rZync allows ftp-level control (such as "send this file", "send this
> directory tree", "delete this file", "create this directory") it does
> this with a small number of higher-level command messages.

OK, good.

> I think that's a good idea.  My rZync app currently operates on each arg
> independently, but I recently discovered that this makes it incompatible
> with rsync when merging directories and such.  For instance, the command
> "rsync -r dir1/ dir2/ dir3" merges the file list and removes duplicates
> before starting the transfer to dir3.

This is a substantial source of cruft in the current code, and one of
the reasons claimed to make an up-front traversal necessary.

I think a more efficient, and possibly simpler solution, would be to
first examine all of the source directories and determine their
relationships.  Basically, you might discover that dir2 is in fact a
subdirectory of dir1, or the same (or vice versa), in which case you
can eliminate it.  Or you might discover that they're disjoint.  Given
that directories are trees, I don't think any there are any other
possibilities.

Doing this in a way that properly respects various symlink options
will be a little complex, but I think it is in principle possible.  It
is also something quite amenable to being thoroughly exercised in
isolation as a unit test.

I am pretty sure that you can do this by just examining dir1 and dir2.
You do need to look at the filesystem to find out about symlinks and
so on, but I think you do not need to traverse their contents.

It is pretty complex, so there might be some case I've missed.

> I got rid of the "multi-IO" idiom of rsync in favor of sending all
> data via messages and limiting each chunk to 32K to allow other
> messages to be mixed into the middle of a large file's data-stream
> (such as verbose output).

OK, that makes sense.  I guess 32k is as good a number as any.

> I think the basic idea of how rZync envisions a new protocol working is
> a good one -- not so much the specifics of the bytes sent in the
> message-header format, but how the messages flow, how each side handles
> the messages in a single process, how all I/O is handled by a single
> function, etc.  There's certainly lots of room for improvement,
> though.

I've started looking at the code, and it looks very nice.  It's
certainly easier to read that rsync.  Would you mind putting in some
more comments to help me along though?

I had a couple of "internal" thoughts about how the code for a next
release ought to go.  Please don't take them as criticisms of your
right to write experimental code however you want, or as an attempt to
dictate how we run things.  I just want to raise the issues.

Global names should be distinguished with some kind of prefix, as in
librsync: "rz_" or whatever.  If this ever turns into a library that
gets linked into something else it will help; in the meantime it helps
keep clear what is part of the project and what's pulled in from
elsewhere.

I really liked mkproto.awk when I first saw it, but now I'm not so
keen.  I think maintaining header files by hand is in some ways a
good thing, because it forces you to think about whether a particular
function really needs to be exported to rest of the program, or to the
world at large.

>From rzync.h:

> #define MSG_HELLO                     1

> #define MSG_QUIT                      3
> #define MSG_NO_QUIT_YET               4 // XXX needed??
> #define MSG_ABORT                     5

> #define MSG_NOTE_DIRNAME              6
> #define MSG_NOTE_FILENAME             7
> #define MSG_DEC_REFCNT                        8

These might work better as an enum, so that gdb can show symbolic
values.

> typedef struct {
>     char *names[MAX_ID_LIST_LEN];
>     long nums[MAX_ID_LIST_LEN];
>     int count;
> } ID;

Linus has a rule about not using typedefs for structures, because it's
good to be clear about whether something is a structure or whatever.
I'm inclined to agree.  So I would refer to that thing "struct rz_id"
or something.

Being 64-bit clean probably implies declaring rz_time_t, rz_uid_t and
so on, and using that rather than the native types, which will be
pretty random.

> This also reminds me that I hadn't responded to jw's question about why
> I thought his pipelined approach was more conducive to a batch protocol
> than an interactive protocol.  To make the pipelined protocol as
> efficient as rsync will require the complexity of his backchannel
> implementation, which I think will be harder to get right than a
> single-process message-oriented protocol.  If every stage is a separate
> process, it seems less clear how to implement something like an
> interactive "mkdir" or a "delete" command. (What process handles this?
> How do we signal that process?  Do we need yet another socket path for a
> control stream in some circumstances?)  It also seems to me that the
> extra processes/threads and socket-channels will make a less portable
> interactive app than a single select-using interactive app.

I wasn't clear if the types were meant to be real pipes (with
socketpair() or pipe()), or just conceptual flows of information.  I
agree that the first one is interesting but probably not a good idea.

The second one is probably a good way to visualize things, and
something along those lines ought to be in the programmer's
documentation.

(By the way, my mail client was temporarily confused and sent mail as
"[EMAIL PROTECTED]", which is obviously wrong; please use samba or
sourcefrog.)

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html

Reply via email to