On Aug 12, 2008, at 12:22 PM, Tiago Vignatti wrote:
> Jim McCoy escreveu:
>> [...]
>> In an ideal world the full peer would return a failure for the write
>> attempt and then clients would treat that peer (for writes) as if it
>> were a down/unavailable host. You then just re-run the storage
>> placement code to get the new "best" node for storing the datum.
>
> So if the majority of peers are full it means we are really in
> trouble?
Probably :)
As you get closer and closer to being "full" you will start to see
more and more strange effects that can be hard to isolate if you do
not think about asymmetric storage and full nodes when you start coding.
As your peers become more and more full you end up storing the data
further and further away from the optimal location. Provided that you
have some facility for migrating data back to nodes that are closer to
the optimal storage location (which you should have anyway, if only to
deal with new nodes appearing) then everything should eventually work
out. There is a tricky issue you will want to think about regarding
expiring data that is somewhat related to this: a lot of systems will
expire data locally on a node based on some combination of last-access
for a locally stored blob and how far this blob is from the ideal
storage range of the local node (this is usually a feature of systems
which perform pass-along optimistic cacheing in addition to straight-
up dht storage); in these situations you will need some small hook to
enable the blob migrator to get to the temporarily stored blob before
the reaper arrives.
Another situation where this may come up is the case where a really,
really big blob is pushed into the system. A blob so large that it
displaces a lot of smaller blobs and either takes up a large chunk of
the available space on its optimally located node or else exceeds the
available space on that node and so it lives on nodes just outside
this optimal location waiting in vain for sufficient space to clear.
This particular corner case is why you should always spec out a
maximum blob size within your DHT and have some facility in the write
preparation request to tell the node just how much data the publisher
plans on dumping on it.
A good way to think about this problem during the design phase is to
consider three situations and figure out how your system would deal
with them:
- what happens if none of the nodes have room for the incoming data
blob?
- what happens if one node (the optimal node for this blob in normal
operation) is so full that it can't accept the incoming data blob but
all of the other nodes have room?
- what happens if every node except for one (which happens to be the
last one that would normally get selected for this blob by your node
selection logic) is so full that it can't accept the incoming data blob?
> BTW, I'm failing to see some p2p simulator that deals with the
> capacity
> of disk in each peer. Anyone already seen such one? It's kinda strange
> that no one bothers with disk capacity in DHTs...
No one bothers because simulators are generally based on the same
fantasy-land in which these academic systems operate :) Every node is
identical, bandwidth is symmetric, nodes always have lots of disk
space, i/o bandwidth is never a factor, etc.
jim
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers