> For example if data normally has to go from
> A->B->C->D->E to get from A to E and.  Freenet caching, to me, will only
> make sense if data goes from A-> B-> FREENET NODE -> D -> E with out
having to
> take any extra hops in the process, or at least only having to take a few
> extra hops.

Why would A even be *trying* to get the data from E?  Would it find the data
at E before it found it at a Freenet node near C, if the search algorithm
were working reasonably well and C had an opportunity to cache it?  If you
think carefully about the answers to these questions I think it will help to
see where I and others are coming from.

> Well for one thing it will cause no-so-popular data to fall of the network
> faster which is one of the key things I would like to avoid.  I want
> DistribNet to serve as a solution for storing long term data not data
> which just happens to be popular at the moment.

Good for you.  Anyone here who recognizes my name at all probably knows it
from my criticisms of Freenet on this very point (check my website).
However, just because you want to ensure at least one "permanent" copy of
the data does not preclude caching it promiscuously elsewhere.  They're
really orthogonal issues.

> > What you should be saying is "the more nodes that *have* the data...".
Why
> > limit it to nodes that initiated new requests for the data?
>
> Because it gives more freedom in how data is requested.  The main problem
> with always having to route data through other nodes is that it is
> incompatible with finding no-so-popular data sitting only on a few distant
> nodes.  I don't quite know how else to explain this.

That's really more to do with the search algorithm than the caching.  No
matter how you do caching, you'll need precise search to accomplish this
goal, and once you have precise search you can cache however you want.
Yeah, I know I just said the same thing twice in one sentence.  ;-)

> > I think you'll find that the algorithms by which S "notices" these
request
> > patterns either require huge amounts of memory and computation or don't
work
> > very well...or both.  Ditto for the algorithms for deciding where to
> > "upload" the data (a.k.a. replica placement).  As I said, a lot of smart
> > people have worked on these problems, but it's remarkably difficult to
come
> > up with anything better than sheer opportunistic caching like Freenet
does.
>
> Could you please give me some references.  I want to avoid mistakes others
> have made but I want to no *why* there methods fail and how.  Only then
> can I have any hope of creating any thing better.

A quick search on CiteSeer for relevant terms like "replica placement" and
"distributed storage" should turn up bunches of papers.  A couple of the
OceanStore folks (oceanstore.cs.berkeley.edu) have done some particularly
interesting recent work on replica placement, and biblio-backtracking should
lead to other good stuff.  I don't have any particularly good references for
access-pattern detection/prediction (still somewhat of a black art and
closely-held secret for not-so-open companies) off the top of my head, but
maybe when I get in to work tomorrow I'll be able to dig something up.  Feel
free to ping me via email if I forget.


_______________________________________________
freenet-tech mailing list
[EMAIL PROTECTED]
http://lists.freenetproject.org/mailman/listinfo/tech

Reply via email to