[Tech] I have this great idea.... and

Donald Buczek Fri, 12 Dec 2008 01:29:10 +0100

I'm not really into freenet yet, anyway I'd like to ask your comments on 
this idea of mine:


If I understand things correctly:

Currently, if you can read the data stored on a node, you can tell, 
whether is contains data you know the hash of. So node owners may get in 
trouble, because they can be proven to posses certain data. The 
possession itself might be illegal or lead to oppression, even if the 
owner can not really be proven to have knowledge of the data or to have 
requested it.

Whatever, it would be much better, if a stolen data store would not 
reveal its content.

My idea how this could possibly done is this:

Whenever some new data has to be inserted into the network, first 
another chunk already in the network (in the nodes cache) is randomly 
chosen and the new data is XORed with the old chunk producing a new 
chunk. This new one is stored in the network. The information published  
to retrieve the data is a pair of hashes to the old and the new chunk. A 
client needs to retrieve both chunks to rebuild the data. (so requiring  
up to doubled time and bandwidth, but not doubles storage)

Whats is the point? The Point is, that possession of a chunk in the 
datastore is not related to just the single file, the bad guys might 
know about, but it might be related to many other data as well, the bad 
guys don't know about or don't have problems with.

You see: CHUNK1,CHUNK2 might combine to a problematic piece of data.
But CHUNK1 and CHUNK3  combine to some other data. So do CHUNK2 and 
CHUNK4. So, possession of CHUNK1 and CHUNK2 does not prove, that the 
problematic data has been produced or that the owner knows anything 
about the fact, that they could be combined. CHUNK1 and CHUNK2 might 
have been stored to serve some other data, even if other related chunks 
can no longer be found in the cache, because they were displaced. All 
chunks are just random data, they can be XORed to generate just 
anything. The "poison", the difference between good or bad, is only in 
the references, not in the data itself.

I think, when existing chunks are chosen as partners for new data from 
the inserting nodes cache, chunks will cluster together in different 
contexts. If two chunks are "near" to each other, because they were 
needed together to build one set of data, they will likely be used 
together again to build new sets of data. So later its quite likely to 
have CHUNK1 and CHUNK2 together in the cache for multiple sets of data.

Is this understandable at all? What do you think?

Sincerely
  D. Buczek

-- 
Donald Buczek
buczek at molgen.mpg.de
Tel: +49 30 8413 1433

[Tech] I have this great idea.... and

Reply via email to