Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-27 Thread jw schultz

On Fri, Jul 26, 2002 at 09:03:32AM -0400, Bennett Todd wrote:
 2002-07-26-03:37:51 jw schultz:
  All that matters is that we can represent the timestamps in
  a way that allows consistent comparison, restoration and
  transfer.
 
 A very good statement indeed. There are complications, though. Some
 time representations used by computer systems have ambiguities, two
 different times that are represented with the same number, or two
 different representations (created at different times) that actually
 end up representing the same time.
 
  [...] we can pick as an epoch any time in recorded human history.
  I don't feel qualified to impose any epoch myself.  I would be
  inclined to stick with the UNIX epoch for the sake of convenience.
 
 Which Unix epoch? 1970-01-01 00:00:00? 1970-01-01 00:00:10, and
 changing every time they issue a new leap second?

Hey, the whole leap second issue is a matter for the
libraries.  If the library is wrong then the system time
might be off by 10 seconds or so to compensate. we need not
care.

It doesn't matter as long as on the same platform it is the
same for converting back and forth.  We aren't determining
whether a file on one machine or filesystem is newer or
older than the coresponding file at the other end.  We are
determining that it is same or not (modulo precision).  Newer
or older are immaterial unless the system clocks are and
have always been in perfect sync.  Hey, some systems will be
running with a timezone offset of 0 and the clock set to
localtime.

 
  Conversion with any other time representation should be a matter
  of t * scale + offset.
 
 The trick is that offset. Given the different timekeeping systems
 in use, you can't correctly translate from one to another over
 a range of dates extending over years unless you either have a
 leap-second table of your own and convert to an absolute time
 format, or else you choose something like ISO 8601, and use local
 routines on each platform to convert to and from -MM-DD
 HH:MM:SS in UTC, recognizing that SS can exceed 59 when there are
 leap-seconds, and that sometimes, converting back to a machine's
 internal representation, you may have to fudge for that if the local
 conversion routines don't know about leap seconds.

We only need to deal with YMD... time if that is what the
system uses.  POSIX platforms do not.  We are talking about
converion and comparison between binary values where
leap-seconds don't matter.

 
 TAI has the advantage that while various platforms have troubles
 getting to and from it, those have often been solved by other people
 (djb for Unix systems), and once you get to TAI you know where
 you're at:-).

I don't wish to disparage TAI but i've yet to see any
pragmatic reason why we should use it in this context.  If
you are aware of one please tell us.  Forget the
advertising, tell us about technical details TAI and why for
the purposes of file tree syncronization TAI is preferable
to something more closely related to the most-common native
form.  I have better things to do that spelunk some obscure
library implementing a time format not native to any
platform.  What is the actual format of TAI?  The docs you
point to talk of a structure and and two packed formats but
do not define these. TAI may be wonderful but not suitable
for this purpose.





-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes (was Re: ...

2002-07-27 Thread John E. Malmberg

 From: jw schultz [EMAIL PROTECTED]
 
 On Fri, Jul 26, 2002 at 09:03:32AM -0400, Bennett Todd wrote:
 
2002-07-26-03:37:51 jw schultz:

All that matters is that we can represent the timestamps in
a way that allows consistent comparison, restoration and
transfer.

A very good statement indeed. There are complications, though. Some
time representations used by computer systems have ambiguities, two
different times that are represented with the same number, or two
different representations (created at different times) that actually
end up representing the same time.

There is potential loss of precision in converting timestamps.

A program serving source files for distribution does not need to be that 
concerned with preserving exact file attributes, but may need to track 
suggested file attributes for for the various client platforms.

A program that is replicating for backup  purposes must not have any 
loss of data, including any operating specific file attributes.

That is why I posted previously that they should be designed as two 
separate but related programs.


Each application has unique requirements that needlessly complicates an 
application that does both.

-John
[EMAIL PROTECTED]
Personal Opinion Only



-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes (was Re: ...

2002-07-27 Thread Martin Pool

On 27 Jul 2002, John E. Malmberg [EMAIL PROTECTED] wrote:

 A program serving source files for distribution does not need to be that 
 concerned with preserving exact file attributes, but may need to track 
 suggested file attributes for for the various client platforms.
 
 A program that is replicating for backup  purposes must not have any 
 loss of data, including any operating specific file attributes.
 
 That is why I posted previously that they should be designed as two 
 separate but related programs.

I'm not sure that the application space for rsync really divides
neatly into two parts like that.  Can you expand a bit more on how
you think they would be used?

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-27 Thread Martin Pool

I'm inclined to agree with jw that truthfully representing time and
leap seconds is a problem for the operating system, not for us.  We
just need to be able to accurately represent whatever it tells us,
without thinking very much about the meaning.

Somebody previously pointed out that timestamp precision is not a
property of the kernel, but rather of the filesystem on which the
files are stored.  In general there may be no easy way to determine it
ahead of time: you can (if you squint) imagine a network filesystem
with nanosecond resolution that's served by something with rather
less.  I suspect the only way to know may be to set the time and then
read it back.

You can also imagine that in the next few years some platform may
change to a format that accurately represents leap seconds, whether by
TAI or something else.  (I'm not sure if I'd put money on it.)
Presumably that machine's POSIX interface will do a lossy conversion
back to regular Unix time to support old apps.  If we merely used that
information, then when replicating between two such machines, files
whose mtime happened to near on a leap second would be inaccurate.
That would contradict our goal of preserving precision as much as
possible, even if we can't tell if it is accurate.

Ideally, we would use the native interface so as to be able to get the
machine's full precision, and that would imply something like TAI
internally.

Whether this is worth doing depends on whether you reckon any platform
will actually move to a filesystem that can represent leap seconds.
As jw says, practically all machines have clocks with more than one
second of inaccuracy, so handling leap seconds is not practically
important.  Certainly they might use it within their ntp code, but I
don't know if they'll expose it to applications.

 What is the actual format of TAI?

64-bit signed seconds-since-1970, plus optionally nanoseconds, plus
optionally attoseconds.  (There's something rather fascinating about
using attoseconds.)

To be fair, it seems that TAI is an international standard, and djb
just made up libtai, not the whole thing.  (Mind you, from some
standards I've seen, that would be a good reason to walk briskly
away.)

One drawback, which is not realy djb's fault, is that if you
inadvertently use a TAI value as a Unix value it will be about 10
seconds off -- almost, but not quite, correct.  I'd hate to have bugs
like that but presumably they can be avoided by using the interface
correctly.

On the other hand, sint32 unix time is clearly running out, and if we
have to use something perhaps it might as well be TAI. 

I would kind of prefer just a single 64-bit quantity measured in (say)
nanoseconds, and compromise on being able to time the end of the
universe, but I don't think I care enough to invent a new standard.

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes (was Re: ...

2002-07-27 Thread John E. Malmberg

Martin Pool wrote:
 On 27 Jul 2002, John E. Malmberg [EMAIL PROTECTED] wrote:
 
A program serving source files for distribution does not need to be that 
concerned with preserving exact file attributes, but may need to track 
suggested file attributes for for the various client platforms.

A program that is replicating for backup  purposes must not have any 
loss of data, including any operating specific file attributes.

That is why I posted previously that they should be designed as two 
separate but related programs.
 
 I'm not sure that the application space for rsync really divides
 neatly into two parts like that.  Can you expand a bit more on how
 you think they would be used?

Well remember, I am on the outside looking in, and of course I could be 
missing things. :-)

I did post this previously, but the message apparently got buried the 
large number of messages posted that day.


The two uses for rsync that I am seeing discussed on this list are:

Backup:  A low overhead and possibly distance backup of disks or directory.

In the case of a backup, usually it is the same platform, or one that is 
very close to being the same.  Also it is important that security 
information, and file attributes all be properly maintained.

The mapping of security information is platform specific, so this is a 
going to be an ongoing problem.  It is also critical that timestamps be 
maintained.

Since this is usually the same or closely similar platforms, a VFS layer 
can be used to store and retrieve attributes.  No special attribute 
files or host based translations should be needed.

The downsides are that as far as I can see there are no portable 
standard APIs to retrieve the security information, and as more variants 
are discovered, it may be hard to work them in for backward compatability.

Because you are distributing an arbitrary set of directories, it is 
ususally not permitted to add files to assist in the transfer.


This also seems to be an addition to rsync's original mission.

Also using something like rsync for backup of binary files has the 
potential for undetected corruption.  While the checksumming algorithm 
is good, it is not guaranteed to be perfect.  And no, I do not want to 
recycle the old arguments about this.

With a text file, the set of possible values is restricted enough that 
it is unlikely that the checksum method would fail, and if it did, the 
resulting corruption is more easily detected.


File Distribution:  A low overhead method of keeping local source 
directory trees synchronized with remote distributions.

In this case, strict binary preservation of time stamps is not needed 
and maintaining security attributes is usually not desired.  So that is 
two problems eliminated.

What rsync does not do now, is differentiate between text files and 
binary files.  A client that uses a different internal format for text 
files than binary files needs to do extra work.

And unless the server tells it what type of file is coming, it must 
guess based on the filename.

But you are specifically distributing a special tree of files in this 
case, not an arbitrary directory.  That gives you the ability to add 
special attribute files to assist in the transfer.


So while the two uses have a lot in common, there are significant 
differences, and having one program attempt to do both can lead to 
greater complexity.

-John
[EMAIL PROTECTED]
Personal Opinion Only



-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-25 Thread Bennett Todd

2002-07-21-04:12:55 jw schultz:
 On Thu, Jul 11, 2002 at 07:06:29PM +1000, Martin Pool wrote:
 6. No arbitrary limits: this is related to scalability.
Filesizes and times should be 64-bit; names should be
arbitrarily long. 
 
 File sizes, yes.  Times, no.  unsigned 32 bit integers will
 last us for another 90 years.  I suspect that by the time we
 need 64 bit timestamps the units will be milliseconds.
 I just don't see the need to waste an extra 4 bytes per
 timestamp per file.

If bandwidth is of any interest at all, compress; any compression
algorithm will have no trouble making hay with bulky, redundant
timestamp formats. Rather than trying to optimize the protocol for
bandwidth without compression, wouldn't it be better to try to
optimize to future-proof in the face of changing time
representations across systems?

If I were designing a protocol at this level, I'd be using TAI;
there's 64-bit time with 1 second resolution covering pretty much
all time (more or less, depending on the whimsies of
cosmologists:-); there are also longer variations with finer
resolution. TAI, with appropriately fine resolution, should be able
to represent any time that any other representation can, closer than
anyone could care.

TAI can be converted to other formats with more or less pain,
depending on how demented the other formats are; djb's libtai is a
reasonable starting point.

URL:http://cr.yp.to/time.html has links to some pages discussing
time formats.

In short, though, Time since the epoch has a complication:
leap-seconds. Either you end up having to move the epoch every time
you bump into a leap-second, thereby redefining all times before
that; or else you have duplicate times, where two different seconds
have the same representation in seconds-since-the-epoch. Well,
there's a third possibility, you could also let the current time
drift further and further from what everybody else is using, but
nobody seems to go for that one.

-Bennett



msg04673/pgp0.pgp
Description: PGP signature


Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-21 Thread Martin Pool

On 21 Jul 2002, jw schultz [EMAIL PROTECTED] wrote:
 .From what i can see rsync is very clever.  The biggest
 problems i see with its inability to scale for large trees,
 a little bit  of accumulated cruft and featuritis, and
 excessively tight integration.

Yes, I think that's basically the problem.

One question that may (or may not) be worth considering is to what
degree you want to be able to implement new features by changing only
the client.  So with NFS (I'm not proposing we use it, only an
example), you can implement any kind of VM or database or whatever on
the client, and the server doesn't have to care.  The current protocol
is just about the opposite: the two halves have to be quite intimately
involved, so adding rename detection would require not just small
additions but major surgery on the server.

 What i am seeing is a Multi-stage pipeline.  Instead of one
 side driving the other with comand and response codes each
 side (client/server) would set up a pipeline containing
 those components that are needed with the appropriate
 plumbing.  Each stage would largly look like a simple
 utility reading from input; doing one thing; writing to
 output, error and log.  The output of each stage is sent to
 the next uni-directionally with no handshake required.

So it's like a Unix pipeline?  (I realize you're proposing pipelines
as a design idea, rather than as an implementation.)

So, we could in fact prototype it using plain Unix pipelines?

That could be interesting.

  Choose some files:
find ~ | lifter-makedirectory  /tmp/local.dir
  Do an rdiff transfer of the remote directory to here:
rdiff sig /tmp/local.dir /tmp/local.dir.sig
scp /tmp/local.dir.sig othermachine:/tmp
ssh othermachine 'find ~ | lifter-makedirectory | rdiff delta /tmp/local.dir.sig - 
' /tmp/remote.dir.delta
rdiff patch /tmp/local.dir /tmp/remote.dir.delta /tmp/remote.dir

  For each of those files, do whatever
for file in lifter-dirdiff /tmp/local.dir /tmp/remote.dir
do
  ...
done

Of course the commands I've sketched there don't fix one of the key
problems, which is that of traversing the whole directory up front,
but you could equally well write them as a pipeline that is gradually
consumed as it finds different files.  Imagine

  lifter-find-different-files /home/mbp/ othermachine:/home/mbp/ | \
xargs -n1 lifter-move-file 

(I'm just making up the commands as I go along; don't take them too
seriously.)

That could be very nice indeed.

I am just a little concerned that a complicated use of pipelines in
both directions will make us prone to deadlock.  It's possible to
cause local deadlocks if e.g. you have a child process with both stdin
and stdout connected to its parent by pipes.  It gets potentially more
hairy when all the pipes are run through a single TCP connection.

I don't think that concern rules this design out by any means, but we
need to think about it.

One of the design criteria I'd like to add is that it should
preferably be obvious by inspection that deadlocks are not possible.

   timestamps should be represented as seconds from
   Epoch (SuS) as unsigned 32 int.  It will be 90 years
   before we exceed this by which time the protocol
   will be extended to use uint64 for milliseconds.

I think we should go to milliseconds straight away: if I remember
correctly, NTFS already stores files with sub-second precision, and
some Linux filesystems are going the same way.  A second is a long
time in modern computing!  (For example, it's possible for a command
started by Make to complete in less than a second, and therefore
apparently not change a timestamp.)  

I think there will be increasing pressure for sub-second precision in
much less than 90 years, and it would be sensible for us to support it
from the beginning.  The Java file APIs, for example, already work in
nanoseconds(?).

Transmitting the precision of the file sounds good.

   I think by default user and groups only be handled
   numerically.

I think by default we should use names, because that will be least
surprising to most people.  I agree we need to support both.

Names are not universally unique, and need to be qualified, by a NIS
domain or NT domain, or some other means.  I want to be able to say:

  map MAPOOL2@ASIAPAC - [EMAIL PROTECTED] - [EMAIL PROTECTED]

when transferring across machines.

We probably cannot assume UIDs are any particular length; on NT they
correspond to SIDs (?) which are 128-bit(?) things, typically
represented by strings like

 S1-212-123-2323-232323

So on the whole I think I would suggest following NFSv4 and just using
strings, with the intreptation of them up to the implementation,
possibly with guidance from the admin.

   When textual names are used a special chunk in the
   datastream would specify a node+ID - name
   equivalency immediately before the first use of that
   number.

It seems like in general there is a need to have 

Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-21 Thread jw schultz

On Mon, Jul 22, 2002 at 02:00:21PM +1000, Martin Pool wrote:
 On 21 Jul 2002, jw schultz [EMAIL PROTECTED] wrote:
  .From what i can see rsync is very clever.  The biggest
  problems i see with its inability to scale for large trees,
  a little bit  of accumulated cruft and featuritis, and
  excessively tight integration.
 
 Yes, I think that's basically the problem.
 
 One question that may (or may not) be worth considering is to what
 degree you want to be able to implement new features by changing only
 the client.  So with NFS (I'm not proposing we use it, only an
 example), you can implement any kind of VM or database or whatever on
 the client, and the server doesn't have to care.  The current protocol
 is just about the opposite: the two halves have to be quite intimately
 involved, so adding rename detection would require not just small
 additions but major surgery on the server.
 
  What i am seeing is a Multi-stage pipeline.  Instead of one
  side driving the other with comand and response codes each
  side (client/server) would set up a pipeline containing
  those components that are needed with the appropriate
  plumbing.  Each stage would largly look like a simple
  utility reading from input; doing one thing; writing to
  output, error and log.  The output of each stage is sent to
  the next uni-directionally with no handshake required.
 
 So it's like a Unix pipeline?  (I realize you're proposing pipelines
 as a design idea, rather than as an implementation.)

I'm kinda, sorta proposing both.  What i'm looking at is to
keep each stage as simple as possible without sharing
datastructures with other stages.  And that it should be
possible to break/intercept the pipeline at any point.

 
 So, we could in fact prototype it using plain Unix pipelines?

For local-to-local yes.

 
 That could be interesting.
 
   Choose some files:
 find ~ | lifter-makedirectory  /tmp/local.dir
   Do an rdiff transfer of the remote directory to here:
 rdiff sig /tmp/local.dir /tmp/local.dir.sig
 scp /tmp/local.dir.sig othermachine:/tmp
 ssh othermachine 'find ~ | lifter-makedirectory | rdiff delta /tmp/local.dir.sig 
- ' /tmp/remote.dir.delta
 rdiff patch /tmp/local.dir /tmp/remote.dir.delta /tmp/remote.dir
 
   For each of those files, do whatever
 for file in lifter-dirdiff /tmp/local.dir /tmp/remote.dir
 do
   ...
 done
 
 Of course the commands I've sketched there don't fix one of the key
 problems, which is that of traversing the whole directory up front,
 but you could equally well write them as a pipeline that is gradually
 consumed as it finds different files.  Imagine
 
   lifter-find-different-files /home/mbp/ othermachine:/home/mbp/ | \
 xargs -n1 lifter-move-file 
 
 (I'm just making up the commands as I go along; don't take them too
 seriously.)
 
 That could be very nice indeed.

I'm not seriously suggesting that each stage be a seperate
utility but there would be times when being able to treat
them as such would be advantageous.

 
 I am just a little concerned that a complicated use of pipelines in
 both directions will make us prone to deadlock.  It's possible to
 cause local deadlocks if e.g. you have a child process with both stdin
 and stdout connected to its parent by pipes.  It gets potentially more
 hairy when all the pipes are run through a single TCP connection.

Where in+out are connected to the same parent (multiplexing
TCP) that parent would have to use poll or select.  In the
ssh case it might be possible to use the port forwarding
features of ssh or borrow the code from there.  We should
plagiarise where sensible.

One key advantage of the looser coupling and of stages is that
they are immune to changes in the plumbing.

 
 I don't think that concern rules this design out by any means, but we
 need to think about it.

Absolutely! 

 
 One of the design criteria I'd like to add is that it should
 preferably be obvious by inspection that deadlocks are not possible.
 
  timestamps should be represented as seconds from
  Epoch (SuS) as unsigned 32 int.  It will be 90 years
  before we exceed this by which time the protocol
  will be extended to use uint64 for milliseconds.
 
 I think we should go to milliseconds straight away: if I remember
 correctly, NTFS already stores files with sub-second precision, and
 some Linux filesystems are going the same way.  A second is a long
 time in modern computing!  (For example, it's possible for a command
 started by Make to complete in less than a second, and therefore
 apparently not change a timestamp.)  
 
 I think there will be increasing pressure for sub-second precision in
 much less than 90 years, and it would be sensible for us to support it
 from the beginning.  The Java file APIs, for example, already work in
 nanoseconds(?).
 
 Transmitting the precision of the file sounds good.
 
  I think by default user and groups only be handled
  numerically.
 
 I think by default 

Re: superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-21 Thread Martin Pool

People have proposed network-endianness, ascii fields, etc.  

Here's a straw-man proposal on handling this for people to criticize,
ignite, feed to horses, etc.  I don't have any specific numbers to
back it up, so take it with a grain of salt.  Experiments would be
pretty straightforward.

Swabbing to/from network endianness is very cheap.  On 486s and higher
it is a single inlined instruction and I think takes about one cycle.
On non-x86 it is free.  The cost is barely worth considering: if you
are flipping words as fast as you can you will almost certainly be
limited by memory bandwidth, not by the work of swapping them.

BER-style variable length fields, on the other hand, are very
intensive, because you need to look at the top bit, mask it, shift,
continue.

If you're going to use a protocol that difficult, I think you might as
well use ASCII hex or decimal numbers.  

All other things being equal having a readable protocol is good. A
little redundancy in the protocol can help make it readable and also
help detect errors.  For example, distcc's 4-char commands make it
easy for humans to visually parse a packet, and they make errors in
transmission almost always immediately cause an error.  At the same
time they're cheap to process -- it's just a uint32 compare.

Arguably we should use x86-endianness because it's the most common
architecture at the moment, but I don't think the performance
justifies using something non-standard.  Anyhow, I would hope that if
it gets off the ground, this protocol might still be in use in ten
years, in which time x86 may no longer be dominant.  Bigendian also
has the minor advantage that it's easier to read in packet dumps.

Negotiated protocols are a bad idea because they needlessly multiply
the test domain.  Samba has to deal with Microsoft protocols which are
in theory negotiated-endian, but in practice of course Microsoft never
test anything but Intel, so BE support is broken and people writing
non-x86 servers need to negotiate Intel endianness.  Even assuming
we're smarter than they are, I don't think we need to make our lives
difficult in this way.

Lempel-Ziv is ideal for the exact case of compressing
0x0001 into a couple of bits.  Even a very cheap
compressor such as lzo (about half the speed of memcpy) will do well
on that kind of case; presumably numbers like uint64 0, 1, 2, etc will
occur often in packet headers and get tightly compressed.  I think it
will probably deal with filenames for us too.

So, as a straw man:

 - use XDR-like network-endian 32 and 64 bit fields 

 - keep all fields 4-byte aligned

 - make strings int32 length-preceded, and padded to a 4-byte boundary 

 - don't worry about interning or compressing filenames, just send
   then as plain UTF-8 relative to a working directory

 - send things like usernames as strings too

 - make operation names (or whatever) be human-readable, either
   variable-length strings or 4-byte tokens that happen to be readable
   as ascii

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



superlifter design notes (was Re: Latest rZync release: 0.06)

2002-07-11 Thread Martin Pool

I've put a cleaned-up version of my design notes up here

  http://samba.org/~mbp/superlifter/design-notes.html

It's very early days, but (gentle :-) feedback would be welcome.  It
has some comments on Wayne's rzync design, which on the whole looks
pretty clever.

I don't have any worthwhile code specifically towards this yet, but I
have been experimenting with the protocol ideas in distcc

  http://distcc.samba.org/

I like the way it has worked out there: the protocol is simple and
easy to understand, the bugs more or less found themselves, and it
feels like I'm using TCP in a natural way -- all of these much more so
than rsync at the moment.  (Of course, the rsync problem is much more
complicated.)

-- 
Martin 

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html