On 6/13/07 10:19 PM, "Alen Peacock" <[EMAIL PROTECTED]> wrote:

> [...]  
> What does this have to do with FEC vs. replication?  It has
*everything* to do
> with FEC vs. replication.  In the remote backup
scenario, the biggest
> bottleneck for most users is still the network.
Taking several days to backup
> a few GB of data will always make the
user feel better than taking weeks.  And
> that is what FEC gives you
over replication (as zooko already effectively
> pointed out).


Actually, I do not think you are correct in this statement.  If we are
comparing a FEC backup that encodes to 4x original data and a 4x replication
among peers the simple "push from one node" strategy does make the two into
equivalent operations, but replication has one advantage over FEC that
should not be dismissed given the current bandwidth asymmetry that most
users face.

Take a simple scenario where one user on a DSL/cable line is participating
in a network with 8 other peers (also on DSL/cable connections) and we are
trying to compare sending either a FEC encoded chunk of data (4x data
expansion) among the 8 peers or a non-encoded chunk of data that will be
replicated among 4 of the 8 nodes.  The FEC data publication will require 4x
the amount of time, while the replication system will only need 1x time to
get the bits off the local system.  While the replication system is pushing
different chunks of the file to its 4 peers those peers can themselves be
sharing what they receive with the other nodes; BitTorrent is a simple
example of such a strategy that most people are familiar with.

If each of the nodes in this example have the same bandwidth asymmetry then
when the data publisher finishes pushing the last source bits the
replicating peers will take 2x more time to finish synchronizing their
blocks, the replication system publisher could disconnect after sending 1x
and eventually the background sync process would get the 4x replication (to
continue with the BitTorrent metaphors, when the seeder disappears the
leechers can still share bits and can get a complete sync even if no one has
a complete copy when the seeder disconnects.)

In a sufficiently complicated FEC system the peers could perform this
background sync operation by downloading the bits necessary to reconstruct a
missing error correction block and then resending it out to the peers, but
the additional download steps this requires eats into efficiency.  With a
rateless codec you could avoid the extra complicated steps to perform this
background sync, but you would trade simplicity in the sync step for
additional complexity in the "find the blocks we need to reconstruct this
file" step.

The actual advantage FEC brings is in the data retrieval step, since you can
take advantage of a larger number of upstream connections to get your bits
back.  In the simple scenario I outlined previously it would be possible to
get a 2x speedup in the download process by taking advantage of the upstream
bandwidth of all 8 nodes instead of only being able to use 4 upstream pipes
in the replicated system.  The minor disadvantage here is that publication
is something peers in a backup system do all the time, while data retrieval
is a relatively infrequent operation.


Jim


_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to