Re: superlifter design notes and a new proposal

2002-08-04 Thread Wayne Davison

On Sun, 4 Aug 2002, Martin Pool wrote:
 My first draft was proposing what you might call a fine-grained rpc
 system, with operations like list this directory, delete this
 file, calculate the checksum of this file.  I think Wayne's rzync
 system was kind of like that too.

Your previous proposal sounded quite a bit more fine-grained than what
rZync is doing.  For instance, it sounded like you would have much more
primitive building-block messages and move much of the controlling
smarts into something like a python-language scripting layer.  While
rZync allows ftp-level control (such as send this file, send this
directory tree, delete this file, create this directory) it does
this with a small number of higher-level command messages.

Rsync, as you know, is a much more modal protocol.  It has a strict set
of steps that must be specified in order and nothing else.  This saves
bytes because so much of the protocol is determined by context, but is
very limiting.

My rZync protocol opens this up by using message numbers for everything
that gets sent, but it still keeps some context-oriented smarts when
transferring files.  There is no micro-management of a file transfer
from start to finish.  The messages cascade from side to side as the
sig, delta, patch sequence of events unfold.  The most CISC-like message
in rZync is the recursive-directory-send message.  Using this is very
much like starting an entire rsync -r src/ dest transfer sequence via
a single message.

 So the client will send something more or less equivalent to its whole
 command line.

I think that's a good idea.  My rZync app currently operates on each arg
independently, but I recently discovered that this makes it incompatible
with rsync when merging directories and such.  For instance, the command
rsync -r dir1/ dir2/ dir3 merges the file list and removes duplicates
before starting the transfer to dir3.  rZync currently just transfers
the contents of dir1 to dir3 and then transfers the contents of dir2 to
dir3.  Fortunately, this is not going to be hard to fix.

 While staying with that overall approach, we may still be able to make
 some improvements in
 
  - documenting the protocol
 
  - doing one directory at a time
 
  - possibly, doing librsync deltas of directories
 
  - just one process on either end
 
  - getting rid of interleaved streams on top of TCP
 
  - sending errors as distinct packets, including a reference to the 
file that caused them (if any)
 
  - handling ACLs, EAs, and other incidental things
 
  - holding the connection open and doing more operations afterwards

This is very much in keeping with what I've been fiddling with in rZync
(which nearly implements this whole list).  I like the simplicity of one
process per side, which makes it easy to cache data that will be used
later and discard it when it is no longer needed.  I got rid of the
multi-IO idiom of rsync in favor of sending all data via messages and
limiting each chunk to 32K to allow other messages to be mixed into the
middle of a large file's data-stream (such as verbose output).

I think the basic idea of how rZync envisions a new protocol working is
a good one -- not so much the specifics of the bytes sent in the
message-header format, but how the messages flow, how each side handles
the messages in a single process, how all I/O is handled by a single
function, etc.  There's certainly lots of room for improvement, though.

This also reminds me that I hadn't responded to jw's question about why
I thought his pipelined approach was more conducive to a batch protocol
than an interactive protocol.  To make the pipelined protocol as
efficient as rsync will require the complexity of his backchannel
implementation, which I think will be harder to get right than a
single-process message-oriented protocol.  If every stage is a separate
process, it seems less clear how to implement something like an
interactive mkdir or a delete command. (What process handles this?
How do we signal that process?  Do we need yet another socket path for a
control stream in some circumstances?)  It also seems to me that the
extra processes/threads and socket-channels will make a less portable
interactive app than a single select-using interactive app.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync from windows to unix

2002-08-04 Thread jw schultz

On Sat, Aug 03, 2002 at 08:44:00AM -0700, S Peram wrote:
 Hi,
 I'm trying to use rsync from Windows to Unix. 
 I've followed the directions on :
 http://optics.ph.unimelb.edu.au/help/rsync/rsync_pc1.html

Start by reading the manpage.  Look first in the GENERAL
section where it lists the 6 (soon to be 8) ways of using
rsync and then pick one.

 
 I can see ssh running on the windows machine, because
 I tested it using telnet localhost 22 and I can see
 the result SSH-2.0-OpenSSH_3.4p1.
 But when I try 
 $ rsync -avz -e ssh user@windowsserver::/rsync/* .

At this point in time your arguments are in conflict.
Using :: to connect to an rsyncd negates -e ssh.

 from the Linux machine I'm getting the error message
  ERROR: The remote path must start with a module
 name

If you are going to connect to an rsync daemon the path
(after the ::) must start with a module name as defined in
the rsyncd.conf on the windowsserver.

 Even when I try 
 $ rsync -avz -e ssh user@windowsserver:/rsync/* .
 I'm getting the error
 $ rsync -avz -e ssh [EMAIL PROTECTED]:/testrsync/* .
 user@windows server password: i enter domain passwd 
 unexpected EOF in read_timeout

You don't have ssh configured to allow connection without a
password yet.  Read up on ssh for windows and configure the
account to know about your public keys.  I don't use windows
so i don't know the details, on UNIX this means putting the
public keys from .ssh/*.pub into .ssh/authorized_keys on the
server.  Once you have that done you should be able to do
$ rsync -avz -e ssh user@windowsserver:/rsync/* .

 The rsync daemon seems to be running too, since 
 $telnet localhost 873 gave me the result 
 @RSYNCD: 25
 I'd appreciate if any of you gurus can guide me where
 I'm going wrong.
 
 Thanks, 
 Peram

Good luck.

Once rsync supports 8 ways using -e option will require both
a module configure and the key management to further confuse
you.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: new rsync release needed soon?

2002-08-04 Thread jw schultz

On Sun, Aug 04, 2002 at 12:50:04AM +1000, Martin Pool wrote:
 On 31 Jul 2002, Dave Dykstra [EMAIL PROTECTED] wrote:
 
  Yes I think a new release is needed soon, but there's more patches than
  that that should get in.  
 
 We need to weigh up getting functions in vs making steps small enough
 that the chance of breakage is acceptable.  
 
 I am afraid that at the moment our only means of getting really good
 cross-platform test coverage for rsync is to throw a release out, and
 so that inclines me towards being conservative in what we put in.
 Hopefully we can try to get people on the list testing -rc releases
 more aggressively.

Was an RC announced recently?  I don't recall seeing it
here.  For what it's worth (my usage isn't very varied) i'm
now running from friday's cvs + patches.

  A bunch of them have been posted and I was hoping you were keeping
  track of them and would be putting more of them in.
 
 I will try to read back through the list and see about merging them
 this week, with a view to a release candidate on about the 11th, and a
 release about a week after that.
 
  The patch that I'd most like to see get in JD Paul's patch for using SSH
  and daemon mode together.  We still don't have an agreement on what the
  syntax should be.  I think the combination of -e ssh and :: which he
  implemented is the most understandable syntax and we should just go with
  it.
 
 I agree that it would be really good to support it.  
 
 However, -e and :: seem to be a persistent source of confusion for new
 users.  I'm not sure if this change will help those people, or what if
 anything would be better.  (More later on this.)

I concur on the confusion issue.  Never got it wrong myself 
but it took a little puzzling over.  Perhaps if the manpage
changed the terminology a bit so that instead of calling it an
rsync server we called it an rsync daemon it would reduce
the confusion of when :: is needed.  After all with --rsh
you are connecting to a server.

Another thing that may help would be to restructure USAGE
and GENERAL so that GENERAL became more of a TOC of USAGE and
we had USAGE subsections on
using locally 
using without rsync daemon on server 
using with rsync daemon on server
using with rsync daemon over ssh transport
running rsync daemon on server
and fold the examples in.  Reading over it again just now i
get the impression that the manpage has suffered from the
documentation equivalent of code spagettification.

It might also help if either [user@]host::module or
rsync://[user@]host[:port]/module were deprecated and moved
to an errata section.  I know, this is a whole flame-fest
and i'm sorry but i think it needs to be said that the
extra two possible invocation syntaxes are making support
more difficult than it needs to be.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes and a new proposal

2002-08-04 Thread Martin Pool

On  4 Aug 2002, Wayne Davison [EMAIL PROTECTED] wrote:

 Your previous proposal sounded quite a bit more fine-grained than what
 rZync is doing.  For instance, it sounded like you would have much more
 primitive building-block messages and move much of the controlling
 smarts into something like a python-language scripting layer.  While
 rZync allows ftp-level control (such as send this file, send this
 directory tree, delete this file, create this directory) it does
 this with a small number of higher-level command messages.

OK, good.

 I think that's a good idea.  My rZync app currently operates on each arg
 independently, but I recently discovered that this makes it incompatible
 with rsync when merging directories and such.  For instance, the command
 rsync -r dir1/ dir2/ dir3 merges the file list and removes duplicates
 before starting the transfer to dir3.

This is a substantial source of cruft in the current code, and one of
the reasons claimed to make an up-front traversal necessary.

I think a more efficient, and possibly simpler solution, would be to
first examine all of the source directories and determine their
relationships.  Basically, you might discover that dir2 is in fact a
subdirectory of dir1, or the same (or vice versa), in which case you
can eliminate it.  Or you might discover that they're disjoint.  Given
that directories are trees, I don't think any there are any other
possibilities.

Doing this in a way that properly respects various symlink options
will be a little complex, but I think it is in principle possible.  It
is also something quite amenable to being thoroughly exercised in
isolation as a unit test.

I am pretty sure that you can do this by just examining dir1 and dir2.
You do need to look at the filesystem to find out about symlinks and
so on, but I think you do not need to traverse their contents.

It is pretty complex, so there might be some case I've missed.

 I got rid of the multi-IO idiom of rsync in favor of sending all
 data via messages and limiting each chunk to 32K to allow other
 messages to be mixed into the middle of a large file's data-stream
 (such as verbose output).

OK, that makes sense.  I guess 32k is as good a number as any.

 I think the basic idea of how rZync envisions a new protocol working is
 a good one -- not so much the specifics of the bytes sent in the
 message-header format, but how the messages flow, how each side handles
 the messages in a single process, how all I/O is handled by a single
 function, etc.  There's certainly lots of room for improvement,
 though.

I've started looking at the code, and it looks very nice.  It's
certainly easier to read that rsync.  Would you mind putting in some
more comments to help me along though?

I had a couple of internal thoughts about how the code for a next
release ought to go.  Please don't take them as criticisms of your
right to write experimental code however you want, or as an attempt to
dictate how we run things.  I just want to raise the issues.

Global names should be distinguished with some kind of prefix, as in
librsync: rz_ or whatever.  If this ever turns into a library that
gets linked into something else it will help; in the meantime it helps
keep clear what is part of the project and what's pulled in from
elsewhere.

I really liked mkproto.awk when I first saw it, but now I'm not so
keen.  I think maintaining header files by hand is in some ways a
good thing, because it forces you to think about whether a particular
function really needs to be exported to rest of the program, or to the
world at large.

From rzync.h:

 #define MSG_HELLO 1

 #define MSG_QUIT  3
 #define MSG_NO_QUIT_YET   4 // XXX needed??
 #define MSG_ABORT 5

 #define MSG_NOTE_DIRNAME  6
 #define MSG_NOTE_FILENAME 7
 #define MSG_DEC_REFCNT8

These might work better as an enum, so that gdb can show symbolic
values.

 typedef struct {
 char *names[MAX_ID_LIST_LEN];
 long nums[MAX_ID_LIST_LEN];
 int count;
 } ID;

Linus has a rule about not using typedefs for structures, because it's
good to be clear about whether something is a structure or whatever.
I'm inclined to agree.  So I would refer to that thing struct rz_id
or something.

Being 64-bit clean probably implies declaring rz_time_t, rz_uid_t and
so on, and using that rather than the native types, which will be
pretty random.

 This also reminds me that I hadn't responded to jw's question about why
 I thought his pipelined approach was more conducive to a batch protocol
 than an interactive protocol.  To make the pipelined protocol as
 efficient as rsync will require the complexity of his backchannel
 implementation, which I think will be harder to get right than a
 single-process message-oriented protocol.  If every stage is a separate
 process, it seems less clear how to implement something like an
 interactive mkdir or a delete 

Re: superlifter design notes and a new proposal

2002-08-04 Thread Martin Pool

I think there was some confusion earlier in the thread about the
redo thing in rsync 2.  It's not for handling files that have
changed during the transfer.  My understanding of this is that it is
used when the whole-file md4 hash shows that the block checksum
actually made a mistake in transferring the file.

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rsync-like ftping

2002-08-04 Thread Zoltan HERPAI

hi,

first of all, sorry about bothering you all on such a nice weekend ;)
i'm looking for a solution for a $subject, ftping up and down only the
different files, which are changed in either size, date, etc. i know that
rsync knows much more than this, but that's only what i need for now.

is there any out-of-the-box solution for this, or should i sit down and write
some scripts which lists all the files on the ftpserver, differs to the
local structure, 'chooses' which are needed to be uploaded, then uploads
it? (i'm not a very experienced programmer, and uploading half gig on a
per-day basis is not an option for me). i think this is not rtfm://, but
if it is, please point me to the correct doc ;)

thanks in advance,
-w-
Zoltan HERPAI


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rsync-like ftping (addendum)

2002-08-04 Thread Zoltan HERPAI

sorry, just a small addendum , why this is all about :)  i can't run an
rsyncd, since the site is hosted on a serverfarm, i have only ftp, http
and mysql ;)

thanks,
-w-
Zoltan HERPAI


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html