Hi,

I've got the task to implement the rsync-algorythm 
(http://rsync.samba.org/tech_report/)
for OpenAFS (modifying the "vos release" behaviour).
I've made an initial design - see attachment.
We would like to see the implementation in the CVS.
Any comments appreciated.

Peter
Using rsync algorythm in OpenAFS
=====================

Purpose: modify "vos release" synchronization mechanism to support 
rsync-algorythm, comparing files with 'rolling checksum' and sending only the 
differences.

In a few words:
- add a new option to volserver "-rsync"
- implement rolling checksum file compare
- implement a new rsync-like protocol
- combining with compression on-the-wire (if the -z [...] option was given)

More details:
- modify the function AFSVolForward[Multi] in volserver to support rsync-like 
protocol
- add file comparsion algorythm ("rsync algorythm")
- add an rsync-like protocol and design for efficient data transfer and CPU 
utilization
- important: we _don't_ assume that all RO sites has the same volume data
- a configuration file is needed (there must be a couple of settings which will 
seriously affect the performance)
- add/modify the RPC 'GetCapabilities'
- there should be a debug-option for volserver which causes to log 
modifications into VolserLog locally

Protocol design:
- notations:
        'A' means the volserver instance, for which AFSVolForward[Multi] has 
been invoked (source of up-to-date volume data)
        'Bi' [i=1..n] means the other volservers which are the destinations 
(where volume data needs to be updated)

- RPC must be used, with the functions rx_Read and rx_Write. A "channel" should 
mean here an RPC call, where one endpoint calls rx_Write, and the other 
endpoint calls rx_Read. Exactly one RPC call corresponds to this channel. We 
assume that such a channel is uni-directional.

There would be 3 channels:
C1) [A->Bi] 'A' sends file block checksums to 'Bi' [i=1..n]
C2) [Bi->A] Bi sends a list of difference query records to A [i=1..n]
C3) [A->Bi] 'A' sends file difference data chunks to Bi [i=1..n]

For each type of these channels a special data format must be used.

Each channel must be alive until all file synchronization has been done for the 
volume.

Creation of the channels:
- C1) would be the reuse of already existing rx calls
- for C2) and C3), 2 new calls should be created on the volserver interface 
(for example: "QueryDiff" and "SendDiffData"), or 2 wrappings
(wrapping means: for example calling AFSVolRestore with "magic" parameters 
instead of creating new RPCs)

Global algorythm:
'A'-side:
-> AFSVolForward[Multi]:
        - create C1)
        - create C3)
        - enumerate vnodes:
                - send block checksums of a vnode through C1), can be multiple 
sending
        - wait for end C3) and destrory
        - wait for end C1) and destrory
        - END

-> "StartAFSVolQueryDiff" C2):
        - while [rx_Read] "Vnode-diff request list":
                - read Vnode content parts locally on the disk
                - send diff data through C3)
        - END

'Bi'-side:
-> AFSVolRestore C1):
        - if got a special dump version with magic, continue in rsync-like mode 
(otherwise parse dump normally)
        - while [rx_Read] "Vnode block checksums":
                - generate difference request list (reading the local Vnode 
content, comparing to the input block checksums) using the rsync-algorythm
                - send difference request list through C2)
        - END

-> "StartAFSVolSendDiffData" C3):
        - create C2)
        - while [rx_Read] "Vnode-diff data":
                - modify Vnode locally on the disk
        - wait for end C2) and destrory
        - END



Reply via email to