This is similar to a very attractive approach for proxy web caching called
"*Value*-*Based* Web *Caching" by *S. C. Rhea, K. Liang and E. Brewer
Here, instead of caching Web objects based on the file names, they are
cached as blocks indexed by their MD5 fingerprints.
The technique followed to avoid the alignment problem is explained in
section 2.2. Briefly, the block boundaries are chose based on the value of
the blocks rather than just offsets. For eg, we can have block boundaries
decided by delimiters. Say the pattern "xxxx" can be chose as delimiter.
So a file of data:
abcdefxxxxljkjhdslaxxxxjvfdfjasdsaxxxxaasfdsaxxxx
Is cached as blocks:
abcdefxxxx
ljkjhdslaxxxx
jvfdfjasdsaxxxx
aasfdsaxxxx
Now if the any block is modified:
case (A): the value defining the block boundary does not change:
eg: abcdefxxxx => abcdefABCxxxx
Then only that block needs to refetched
case (B): the value defining the block boundary does change
eg: abcdefxxxx => abcdefxxABCxx
Then the file is cached as:
abcdefxxABCxxljkjhdslaxxxx
jvfdfjasdsaxxxx
aasfdsaxxxx
Only the modified block and its adjacent block's fingerprint differ.
A key issue is to have a optimal sized blocks. Too small.. lot of
caching/protocol overhead.
Too big, very little reuse.
They use Rabin functions to choose block boundaries which gives the
statistical guarantee that the avg block size is limited to around 2kb.
On 4/12/07, David Barrett <[EMAIL PROTECTED]> wrote:
I'm not sure this would work in the real world.
Even a movie with different translations would have non-matching 16KB
chunks
because audio and video frames are interleaved. Unless you have separate
audio/video channels, I don't see how this would ever work.
Furthermore, different rips of the same CD track are likely not identical
unless the codec happens to be exactly the same. (Is this true? I'm not
sure if there is homogeny between MP3 encoders.)
As for software packages, the contained files are generally compressed,
throwing off all similarity. And even if not compressed, unless they are
perfectly byte aligned (a 1 in 65536 chance), they won't match up.
So I just don't see how this would work in the real world. Am I
misunderstanding it?
-david
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:p2p-hackers-
> [EMAIL PROTECTED] On Behalf Of FabrÃcio Barros Cabral
> Sent: Wednesday, April 11, 2007 1:46 PM
> To: [EMAIL PROTECTED]
> Subject: [p2p-hackers] Computer scientists develop P2P system that
> promisesfaster music, movie downloads
>
> A Carnegie Mellon University computer scientist says transferring large
> data files, such as movies and music, over the Internet could be sped up
> significantly if peer-to-peer (P2P) file-sharing services were
> configured to share not only identical files, but also similar files.
>
> http://www.physorg.com/news95436100.html
>
> []'s
>
> --fx
>
> _______________________________________________
> p2p-hackers mailing list
> [EMAIL PROTECTED]
> http://lists.zooko.com/mailman/listinfo/p2p-hackers
_______________________________________________
p2p-hackers mailing list
[EMAIL PROTECTED]
http://lists.zooko.com/mailman/listinfo/p2p-hackers
--
Regards
~Ravi
_______________________________________________
p2p-hackers mailing list
[EMAIL PROTECTED]
http://lists.zooko.com/mailman/listinfo/p2p-hackers