I don't know your performance requirements, but I always thought one way to do a distributed atomspace would simply be to have a bunch of independent atomspaces that all share one distributed Cassandra database as the "disk" storage layer.
Note that I continue to reference the name Cassandra because it is better known, but if you were going to adopt a third-party datastore whole cloth, I do recommend the Scylla C++ implementation of Cassandra, having used it in production for realtime, ML systems at moderate scale (6+ nodes in my case, though it is documented to scale to hundreds, as I recall). On Wed, Jul 29, 2020, 11:59 AM Ben Goertzel <[email protected]> wrote: > Matt, > > I looked at Cassandra some time ago, haven't used it in practice though... > > You are pointing it out here as a source of design ideas/inspirations, > but I'm also wondering: Do you think it would be a strong choice as an > ingredient in an OpenCog Hyperon (next-gen OpenCog) distributed > Atomspace? We have been looking at Apache Ignite which serves a > different purpose, and of course the two have been integrated > https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra > as well... > > It looks like graph databases aren't going to be apropos for the > persistent storage component in Hyperon, and key-value stores are > probably the right level to be looking at... > > I haven't thought through how the various levels of non-ACID > consistency in Cassandra might help with distributed Atomspace, > > > https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-consistency/ > > ben > > On Wed, Jul 29, 2020 at 11:19 AM Matt Chapman <[email protected]> > wrote: > > > > > Which peers? > > As determined by a token ring: > > > > > https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archDataDistributeDistribute.html > > > > I think you could almost replace "vnode" with "chunk" if you wanted to > adopt the Cassandra architecture, although I wouldn't be surprised to see > performance problems with a huge number of vnodes, so it might actually > need to be a "chunk-hash modulo reasonable number of vnodes". > > > > > How do you find them? > > > > By calculating the partition token via consistent hash, as Cassandra > does with Murmur3. This tells you the authoritative source for the chunk > you want. You might also have a local cache of other peers that have had > replicas of that chunk, in case any of them are more responsive to you. > Cassandra calls this process of finding potential replicas "Snitching". > > > > > > > You are thinking Kademlia (as do I, when I think of publishing) or > OpenDHT or IPFS. > > > > Nope. I've only played with IPFS a bit, but I don't expect it to be > performance for the atomsoace use case. I'm only vaguely familiar with > openDHT; it seems worth exploring, but I'm sure you understand it far > better than I do. > > > > I'm not very familiar with p2p systems like kademlia, but I suspect > that's optimized for consistency & availability over performance, so not > the right choice for datomspace. > > > > By this point, it should be clear that I look to Cassandra for how > semi-conistent distributed data storage systems should be designed. (Fwiw, > my inspiration for distributed messaging systems comes mostly from Apache > Kafka.) > > > > > > > Which is great, if all you're doing is publishing small amounts of > static, infrequently-changing information. Not so much, if interacting or > blasting out millions of updates. Neither system can handle that -- > literally -- tried that, been there, done that. They are simply not > designed for that. > > > > Cassandra is. To be fair, Cassandra is optimized for massive scale, > with may involve some trade-offs that are not desirable for present-day > atomspace use cases. > > > > See also, ScyllaaDB for a C++ reimplementation of Cassandra. > > > > > Now, perhaps using only a hash-driven system, it is possible to > overcome these issues. I do not know how to do this. Perhaps someone does > -- perhaps there are even published papers ... I admit I did not do a > careful literature search. > > > > http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf > > > http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf > > > > Matt > > > > > > > > On Wed, Jul 29, 2020, 9:37 AM Linas Vepstas <[email protected]> > wrote: > >> > >> > >> > >> On Wed, Jul 29, 2020 at 1:09 AM Matt Chapman <[email protected]> > wrote: > >>> > >>> >I think it's a mistake to try to think of a distributed atomspace as > one super-giant, universe-filling uniform, undifferentiated blob of storage. > >>> > >>> > You don't want broadcast messages going out to the whole universe. > >>> > >>> Not sure if you intended to imply it, but the reality of the first > statmentt need not require the 2nd statement. Hashes of atoms/chunks can be > mapped via modulo onto hashes of peer IDs so that messages need only go to > one or few peers. > >> > >> > >> Which peers? How do you find them? You are thinking Kademlia (as do I, > when I think of publishing) or OpenDHT or IPFS. Which is great, if all > you're doing is publishing small amounts of static, infrequently-changing > information. Not so much, if interacting or blasting out millions of > updates. Neither system can handle that -- literally -- tried that, been > there, done that. They are simply not designed for that. > >> > >> Now, perhaps using only a hash-driven system, it is possible to > overcome these issues. I do not know how to do this. Perhaps someone does > -- perhaps there are even published papers ... I admit I did not do a > careful literature search. > >> > >> But, basically, before we are even out of the gate, we already have a > snowball of problems with no obvious solution. Haven't even written any > code, and are beset by technical problems. That's not an auspicious > beginning. > >> > >> If you have something more specific, let me know. Right now, I simply > don't know how to do this. > >> > >> --linas > >>> > >>> > >>> Specialization has a cost, in that you need to maintain some central > directory or gossip protocol so that peers can learn which other peers are > specialized to which purpose. > >>> > >>> An ideal general intelligence network may very well include both a > large number of generalist, undifferentiated peers and clusters of highly > interconnected specialized peers. If peers are neurons, I think this > describes the human nervous system also, no? > >>> > >>> To borrow terms from my previous messsge, generalist peers own many > atoms, and replicate few, while specialist peers own few or none, but > replicate many. > >>> > >>> Matt > >>> > >>> > >>> > >>> On Tue, Jul 28, 2020, 10:36 PM Linas Vepstas <[email protected]> > wrote: > >>>> > >>>> > >>>> > >>>> On Tue, Jul 28, 2020 at 11:41 PM Ben Goertzel <[email protected]> > wrote: > >>>>> > >>>>> > >>>>> > >>>>> Hmm... you are right that OpenCog hypergraphs have natural chunks > >>>>> defined by recursive incoming sets. However, I think these chunks > >>>>> are going to be too small, in most real-life Atomspaces, to serve the > >>>>> purpose of chunking for a distributed Atomspace > >>>>> > >>>>> I.e. it is true that in most cases the recursive incoming set of an > >>>>> Atom should all be in the same chunk. But I think we will probably > >>>>> need to deal with chunks that are larger than the recursive incoming > >>>>> set of a single Atom, in very many cases. > >>>> > >>>> > >>>> I like the abstract to the Ja-be-ja paper, will read and ponder. It > sounds exciting. > >>>> > >>>> But ... the properties of a chunk depends on what you want to do with > it. > >>>> > >>>> For example: if some peer wants to declare a list of everything it > holds, then clearly, creating a list of all of its atoms is self-defeating. > But if some user wants some specific chunk, well, how does the user ask for > that? How does the user know what to ask for? How does the user say "hey > I want that chunk which has these contents"? Should the user say "deliver > to me all chunks that contain Atom X"? If the user says this, then how does > the peer/server know if it has any checks with Atom X in it? Does the > peer/server keep a giant index of all atoms it has, and what chunks they > are in? Is every peer/server obliged to waste some CPU cycles to figure out > if it's holding Atom X? This gets yucky, fast. > >>>> > >>>> This is where QueryLinks are marvelous: the Query clearly states > "this is what I want" and the query is just a single Atom, and it can be > given an unambiguous, locally-computable (easily-computable; we already do > this) 80-bit or a 128-bit (or bigger) hash and that hash can be blasted > out to the network (I'm thinking Kademlia, again) in a compact way - its > not a lot of bytes. The request for the "query chunk" is completely > unambiguous, and the user does not have to make any guesses whatsoever > about what may be contained in that chunk. Whatever is in there, is in > there. This solves the naming problem above. > >>>> > >>>>> > >>>>> What happens when the results for that (new) BindLink query are > spread > >>>>> among multiple peers on the network in some complex way? > >>>> > >>>> > >>>> I'm going to avoid this question for now, because "it depends" and > "not sure" and "I have some ideas". > >>>> > >>>> My gut impulse is that the problem splits into two parts: first, find > the peers that you want to work with, second, figure out how to work with > those peers. > >>>> > >>>> The first part needs to be fairly static, where a peer can advertise > "hey this is the kind of data I hold, this is the kind of work I'm willing > to perform." Once a group of peers is located, many of the scaling issues > go away: groups of peers tend to be small. If they are not, you organize > them hierarchically, they way you might organize people, with specialists > for certain tasks. > >>>> > >>>> I think it's a mistake to try to think of a distributed atomspace as > one super-giant, universe-filling uniform, undifferentiated blob of > storage. I think we'll run into all sorts of conceptual difficulties and > design problems if you try to do that. If nothing else, it starts smelling > like quorum-sensing in bacteria. Which is not an efficient way to > communicate. You don't want broadcast messages going out to the whole > universe. Think instead of atomspaces connecting to one-another like > dendrites and axons: a limited number, a small number of connections > between atomspaces, but point-to-point, sharing only the data that is > relevant for that particular peer-group. > >>>> > >>>> -- Linas > >>>> > >>>> -- > >>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer. > >>>> --Peter da Silva > >>>> > >>>> -- > >>>> You received this message because you are subscribed to the Google > Groups "opencog" group. > >>>> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > >>>> To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA35zN4aaSrZ2Dpu4qLUL1bYfjAF_rGiS_xxg2-E-SBqY3Q%40mail.gmail.com > . > >>> > >>> -- > >>> You received this message because you are subscribed to the Google > Groups "opencog" group. > >>> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > >>> To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAPE4pjCyzOcoRAOPj7aGsj_73dAUnWovbjeaM4qjeM43hzXA6A%40mail.gmail.com > . > >> > >> > >> > >> -- > >> Verbogeny is one of the pleasurettes of a creatific thinkerizer. > >> --Peter da Silva > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "opencog" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > >> To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA36esvtcgGrZ%3D4rCVMDde74TYKF1%3DS-AwLG95UYrT5Mdrg%40mail.gmail.com > . > > > > -- > > You received this message because you are subscribed to the Google > Groups "opencog" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAPE4pjALKeWmpzxwoYR7gCmS5ZcDqrrKPaB0V-UZe814G6cwTA%40mail.gmail.com > . > > > > -- > Ben Goertzel, PhD > http://goertzel.org > > “The only people for me are the mad ones, the ones who are mad to > live, mad to talk, mad to be saved, desirous of everything at the same > time, the ones who never yawn or say a commonplace thing, but burn, > burn, burn like fabulous yellow roman candles exploding like spiders > across the stars.” -- Jack Kerouac > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CACYTDBdhS0wXDfHMVFJ7R7vwoXn01uGPvT%3D-UT_yo6T5rtN0Gw%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAPE4pjBNz04W_-c_pRH_OxKHgupc_yY6%3DkW%3DDSOuFaEtErCVUw%40mail.gmail.com.
