As a "distributed atomspace" is a commonly recurring request, I thought I should put together yet-another distributed atomspace variant. This one is very super-simple: easy to use, and has a tiny implementation (500 lines of code).
Easy-to-use: Take a look at the two examples, here: https://github.com/opencog/atomspace-cog/tree/master/examples These examples use exactly the same API as the 3-4 other distributed atomspace backends. Some comments about those. * The Postgres SQL backend. It's part of the core AtomSpace code, provided by default. Large, complex, bullet-proof, production-ready. Kind-of slow: there's a lot of overhead. * The IPFS backend. Sounds great, right? I tried a very very naive implementation, and it's surprisingly terrible. Turns out IPFS is actually "centralized" not "decentralized": to get anything done, one must build an index, that index must fit into just one file. Whoops. So clearly, my naive design is the wrong way to go. The code "works" (passes unit tests) but is disappointing. https://github.com/opencog/atomspace-ipfs * The OpenDHT backend. Taking the lessons learned above, I ported the same naive implementation to OpenDHT. So, DHT stands for "Distributed Hash Table", in this case Kademlia, the same one as in bittorrent and ethereum and gnunet and many others. Much better, but revealed a different flaw in my naive thinking. Two problems, harder to explain. Problem #1: an Atom, sitting in OpenDHT, is just taking up RAM, thus competing for RAM with any local AtomSpace. Problem #2: the hash used by DHT's completely randomizes Atoms. So even if they are close to one-another, e.g. (List (Concept "a") (Concept "b")) -- these three atoms - the two concepts and the list, will end up on different servers on opposite sides of the planet. The DHT hashing algo has no clue about the locality-of-reference that we want for the atomspace. Again, this code "works" (passes unit tests) but is disappointing. https://github.com/opencog/atomspace-dht * So what's the right design? Well, it seems that the best bet would be to use OpenDHT to store AtomSpace indexes, but do the actual serving of atoms by "seeders". And so this is why I wrote this super-simple cogserver-based distributed atomspace. The hope is to use it as a "seeder" https://github.com/opencog/atomspace-cog/ Future plans: I'm hoping that someone interested can build a high-performance server/seeder, based on the prototype here. (really -- 500 LOC is very simple, very easy to understand, and thus easy to improve upon.) This does NOT require any special skills: if you have basic coding skills, maybe some experience with network i/o, or are willing to explore, it should be possible to build a high-performance variation thereof. So all those people saying "I'm just an ordinary coder, how can I help?" well -- here's your chance. A more difficult, more conceptual task would be how to wire up a bunch of these servers using the OpenDHT/Kademlia infrastructure. I think this is possible, but it's more cerebral, and requires thinking-work. --linas -- Verbogeny is one of the pleasurettes of a creatific thinkerizer. --Peter da Silva -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37wx%2B3SAtBS45eS6vf_XrC8rgCCdNW2HziA7xkYasCw_A%40mail.gmail.com.
