As a "distributed atomspace" is a commonly recurring request, I thought I
should put together yet-another distributed atomspace variant. This one is
very super-simple: easy to use, and has a tiny implementation (500 lines of
code).

Easy-to-use: Take a look at the two examples, here:
https://github.com/opencog/atomspace-cog/tree/master/examples

These examples use exactly the same API as the 3-4 other distributed
atomspace backends. Some comments about those.

* The Postgres SQL backend. It's part of the core AtomSpace code, provided
by default. Large, complex, bullet-proof, production-ready. Kind-of slow:
there's a lot of overhead.

* The IPFS backend. Sounds great, right? I tried a very very naive
implementation, and it's surprisingly terrible. Turns out IPFS is actually
"centralized" not "decentralized": to get anything done, one must build an
index, that index must fit into just one file. Whoops. So clearly, my naive
design is the wrong way to go. The code "works" (passes unit tests) but is
disappointing. https://github.com/opencog/atomspace-ipfs

* The OpenDHT backend. Taking the lessons learned above, I ported the same
naive implementation to OpenDHT. So, DHT stands for "Distributed Hash
Table", in this case Kademlia, the same one as in bittorrent and ethereum
and gnunet and many others. Much better, but revealed a different flaw in
my naive thinking. Two problems, harder to explain. Problem #1: an Atom,
sitting in OpenDHT, is just taking up RAM, thus competing for RAM with any
local AtomSpace. Problem #2: the hash used by DHT's completely randomizes
Atoms. So even if they are close to one-another, e.g. (List (Concept "a")
(Concept "b")) -- these three atoms - the two concepts and the list, will
end up on different servers on opposite sides of the planet. The DHT
hashing algo has no clue about the locality-of-reference that we want for
the atomspace.  Again, this code "works" (passes unit tests) but is
disappointing. https://github.com/opencog/atomspace-dht

* So what's the right design? Well, it seems that the best bet would be to
use OpenDHT to store AtomSpace indexes, but do the actual serving of atoms
by "seeders". And so this is why I wrote this super-simple cogserver-based
distributed atomspace.  The hope is to use it as a "seeder"
https://github.com/opencog/atomspace-cog/

Future plans: I'm hoping that someone interested can build a
high-performance server/seeder, based on the prototype here. (really -- 500
LOC is very simple, very easy to understand, and thus easy to improve
upon.) This does NOT require any special skills: if you have basic coding
skills, maybe some experience with network i/o, or are willing to explore,
it should be possible to build a high-performance variation thereof. So all
those people saying "I'm just an ordinary coder, how can I help?" well --
here's your chance.

A more difficult, more conceptual task would be how to wire up a bunch of
these servers using the OpenDHT/Kademlia infrastructure.  I think this is
possible, but it's more cerebral, and  requires thinking-work.

--linas

-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37wx%2B3SAtBS45eS6vf_XrC8rgCCdNW2HziA7xkYasCw_A%40mail.gmail.com.

Reply via email to