Re: [opencog-dev] Distributed Atomspace

Matthew Ikle Wed, 29 Jul 2020 10:33:13 -0700


> On Jul 29, 2020, at 10:39 AM, Ben Goertzel <[email protected]> wrote:
> 
> On Wed, Jul 29, 2020 at 6:35 AM Abdulrahman Semrie <[email protected]> wrote:
>> 
>>> I think it's a mistake to try to think of a distributed atomspace as one 
>>> super-giant, universe-filling uniform, undifferentiated blob of storage.
>> 
>> It is not clear to me why this is a mistake.
> 
> It's a mistake because making a call from machine A to machine B is
> just sooooooo much slower than making a call from machine A to machine
> A ...
> 
> So if you try to ignore the underlying distributed nature of a
> knowledge store, and treat it as if it was a single knowledge blob
> living in one location, you will wind up making a system that is very,
> very, very slow...
> 
> My Webmind colleagues and I were naive enough to try this in the late
> 1990s using Java 1.1   ;-)


Ah yes I recall those days and the (in)famous Java 1 with the original broken 
Java Memory Model, not fixed until 2004 
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.7914&rep=rep1&type=pdf
 <http://www.ibm.com/developerworks/library/j-jtp02244.html>).
> 
> One challenge though is: From a language and algorithm design
> perspective, it is of course necessary to abstract away many of the
> details of distributed infrastructure, while still respecting the
> difference btw a localized and distributed knowledge store.
> 
> E.g. an AI algorithm may need to be aware that pieces of knowledge can
> have three different statuses: Local, Remote (in RAM on some other
> machine in Distributed Atomspace) or BackedUp (disk).   So then when
> it issues a query it may need specify whether its search for an answer
> should be Local only, should include Remote machines, or should also
> include BackedUp data...   Because having an AI algorithm issue all
> its queries across a distributed Atomspace + disk backup will just be
> too slow.   So in this case the existence of a distributed/persistent
> infrastructure requires the AI algorithm to prioritize its queries w/
> at least 3 levels of priority.
> 
>> I suggest you to look into the design docs of Nebula graph DB, which is a 
>> strongly typed distributed graph db. I believe they address the above issues 
>> you mentioned and  it is possible to implement something similar for the 
>> first version of the distributed Atomspace.   Here are the links
>> 
>> [Overview] - 
>> https://docs.nebula-graph.io/manual-EN/1.overview/3.design-and-architecture/1.design-and-architecture/
>> 
>> [Storage Design] - 
>> https://docs.nebula-graph.io/manual-EN/1.overview/3.design-and-architecture/2.storage-design/
>>  - part of this currently implemented through the Postgres backend as 
>> demonstrated in this example
>> 
>> [Query Engine] - 
>> https://docs.nebula-graph.io/manual-EN/1.overview/3.design-and-architecture/3.query-engine/
>>  - esp.  interesting how they implement access control through sessions, 
>> which partly relates to #1855
>> 
>> They implement sharding somewhat similar to what you described above using 
>> Edge-Cut - storing a destination vertex and all its incoming edges in the 
>> same partition, a source vertex and its outgoing edges in the same 
>> partition. They use Multi-raft groups (" Multi-Raft only means we manage 
>> multiple Raft consensus groups on one node") to achieve consistency across 
>> partitions for multiple databases. This is contrary to what you suggested in 
>> that each node doesn't broadcast its changes, only the elected leader will 
>> broadcast changes (i.e send log requests) and the rest of the nodes will 
>> update their partitions accordingly. Of course, a new leader can be elected 
>> if the current leader fails or it term ends. The above design also solves 
>> what you noted as the "unsolved part" in #2138
> 
> 
> 
> There are some interesting things in Nebula and maybe some stuff for
> us to learn there.
> 
> However, their assumption of complete consistency across the
> distributed KB does not match our requirements for OpenCog.  We need
> complete consistency only regarding certain sorts of knowledge items
> -- for other cases it's OK for us if different versions of an Atom in
> different parts of a distributed system drift apart a little and are
> then reconciled a little later.
> 
> The assumption of complete consistency is built into the RocksDB
> infrastructure that they use, btw
> 
> ben
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/opencog/CACYTDBc5wPvNj-k-%3Dwhx-yntHkZrh%3DEfWi%2BcqjJUMksJ-5LKhA%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/B9DCC63A-3866-41D4-B0D9-735318FF721E%40gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to