On Aug 12, 2008, at 12:22 PM, Tiago Vignatti wrote:

> Jim McCoy escreveu:
>> [...]
>> In an ideal world the full peer would return a failure for the write
>> attempt and then clients would treat that peer (for writes) as if it
>> were a down/unavailable host.  You then just re-run the storage
>> placement code to get the new "best" node for storing the datum.
>
> So if the majority of peers are full it means we are really in  
> trouble?

Probably :)

As you get closer and closer to being "full" you will start to see  
more and more strange effects that can be hard to isolate if you do  
not think about asymmetric storage and full nodes when you start coding.

As your peers become more and more full you end up storing the data  
further and further away from the optimal location.  Provided that you  
have some facility for migrating data back to nodes that are closer to  
the optimal storage location (which you should have anyway, if only to  
deal with new nodes appearing) then everything should eventually work  
out.  There is a tricky issue you will want to think about regarding  
expiring data that is somewhat related to this: a lot of systems will  
expire data locally on a node based on some combination of last-access  
for a locally stored blob and how far this blob is from the ideal  
storage range of the local node (this is usually a feature of systems  
which perform pass-along optimistic cacheing in addition to straight- 
up dht storage); in these situations you will need some small hook to  
enable the blob migrator to get to the temporarily stored blob before  
the reaper arrives.

Another situation where this may come up is the case where a really,  
really big blob is pushed into the system.  A blob so large that it  
displaces a lot of smaller blobs and either takes up a large chunk of  
the available space on its optimally located node or else exceeds the  
available space on that node and so it lives on nodes just outside  
this optimal location waiting in vain for sufficient space to clear.   
This particular corner case is why you should always spec out a  
maximum blob size within your DHT and have some facility in the write  
preparation request to tell the node just how much data the publisher  
plans on dumping on it.

A good way to think about this problem during the design phase is to  
consider three situations and figure out how your system would deal  
with them:
        - what happens if none of the nodes have room for the incoming data  
blob?
        - what happens if one node (the optimal node for this blob in normal  
operation) is so full that it can't accept the incoming data blob but  
all of the other nodes have room?
        - what happens if every node except for one (which happens to be the  
last one that would normally get selected for this blob by your node  
selection logic) is so full that it can't accept the incoming data blob?


> BTW, I'm failing to see some p2p simulator that deals with the  
> capacity
> of disk in each peer. Anyone already seen such one? It's kinda strange
> that no one bothers with disk capacity in DHTs...

No one bothers because simulators are generally based on the same  
fantasy-land in which these academic systems operate :)  Every node is  
identical, bandwidth is symmetric, nodes always have lots of disk  
space, i/o bandwidth is never a factor, etc.

jim

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to