Exactly what I propose. Keep a list of files and their sizes, so that when 
somebody asks for a range, you can skip files up until you get to the range 
they've requested. 
Not worrying about new or already-downloaded changed files, or deleted files. 
You're not getting a "current" copy of the files, you're getting a copy of the 
files that were available when you started your download. Minus the deleted 
files, which by policy we shouldn't be handing out anyway.

rsync doesn't have the MW database to consult for changes.

________________________________________
From: [email protected] 
[[email protected]] on behalf of Brion Vibber 
[[email protected]]
Sent: Monday, August 15, 2011 6:31 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] forking media files

On Mon, Aug 15, 2011 at 3:14 PM, Russell N. Nelson - rnnelson <
[email protected]> wrote:

> 
> download protocol or stream format. That's why I suggest tarball and range.
> Standards ... they're not just for breakfast.
>

Range on a tarball assumes that you have a static tarball file -- or else
predictable, unchanging snapshot of its contents that can be used to
simulate one:

1) every filename in the data set, in order
2) every file's exact size and version
3) every other bit of file metadata that might go into constructing that
tarball

or else actually generating and storing a giant tarball, and then keeping it
around long enough for all clients to download the whole thing -- obviously
not very attractive.

Since every tiny change (*any* new file, *any* changed file, *any* deleted
file) would alter the generated tarball and shift terabytes of data around,
this doesn't seem like it would be a big win for anything other than initial
downloads of the full data set (or else batching up specifically-requested
files).


Anything that involves updating your mirror/copy/fork/backup needs to work
in a more live fashion, that only needs to transfer new data for things that
have changed. rsync can check for differences but still needs to go over the
full file list (and so still takes a Long Time and lots of bandwidth just to
do that).

-- brion
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to