Re: [opencog-dev] Distributed Atomspace

Matt Chapman Thu, 13 Aug 2020 10:21:21 -0700

I read it. I feel like you're talking about complete different things, and
misrepresenting what Cassandra is. All those node.microservice frequently
require a common data store (in additional to a local data store), and that
store is more often than not, Dynamo, Firebase, or Cassandra.


But I feel like someone said, "Which kind of plane should we take to
China?" and you answered, "Fishing is best done from a boat." True, and
maybe we do want to do some fishing when we get to China, but that's not at
all what I'm talking about here, and I make my living building the kinds of
micro-service based applications that youre describing.

So, I conclude we're talking at levels that are too abstract to resolve our
communication failures, and the only path forward is to build a PoC. It's
unlikely I'll have the time given my current employment, but now I know it
won't happen until I do.

If I can leave oy one impression: Tunable Consistency is valuable. Not
Eventual Consistency. Not Cassandra or Scylla or Seastar, or Token Rings.
An effective multi-mind-agent distributed atomspace probably requires
Tunable Consistency, however it's implemented. Arguably, current DB backing
offers a limited form of it, but more powerful forms exist.

Matt

On Tue, Aug 11, 2020, 5:04 PM Linas Vepstas <[email protected]> wrote:

> Sigh.
>
> I dislike writing long emails because I fear no one reads them, or that
> they are viewed as overly aggressive and pugnacious. But until such time as
> we have mind-reading or neural laces, its .. email.
>
> I want to talk about "service meshes". The problem with shopping for
> cassandra, or any of the other suggested databases, is that they are all
> "monolithic black boxes". You pick one, and you get what you get: whatever
> is provided, that's what it is. Sure, some configuration files somewhere
> allow you to tune this and that, but that's all.
>
> The service mesh idea (and the npm/js idea before that) is to assemble
> your system out of small, self-contained pieces. Sure, the object-oriented
> folks have been talking about this for 3 or 4 decades, and it's cited as
> the raison-d'etre for things like C++. But C++ never lived up to this
> ideal.  There are no generic C++ frameworks. None. At All. (OK, so SGI had
> one or two in the early 1990's ...) Something is ... missing...  in C++.
> Compare this to node.js and npm which are wildly successful over-achievers
> in this category.  People regularly build large applications by assembling
> a cacophony of tiny little javascript parts. Clearly, javascript has
> something that C++ does not.  Something that makes the OO dream achievable
> not just in theory, but regularly validated in practice.
>
> Now, there are some down-sides to npm apps: they contain hundreds or
> thousands of parts, and not all of them are well-maintained, and many have
> published security vulnerabilities that remain unpatched. Worse, patching
> some of them require incompatible API changes that would break users. So it
> has its own prickly and thorny issues that are unique and different from
> those that other languages (python, scheme, c++) suffer from.
>
> In the cloud world, there has long been, and continues to be a movement to
> meshes of containerized applications. Here, docker is the prototypical
> container -- lxc/lxd/lxe more generally.  Managing these containers
> requires kubernetes, and more: the "service meshes" (istio, microsoft open
> service mesh) provide a layer (a "control plane") that further manages
> deployments, error fallbacks, a/b testing, circuit-breakers,
> load-balancing, etc.   The mental model is that containerized apps are just
> like npm nodes, except they are million times bigger and beefier
> (literally) and they all have network interfaces instead of javascript
> methods/objects. And since they are so much bigger, they need more active
> management.
>
> Now compare the service-mesh idea to the olde-fashioned ideas of "web
> shopping carts" or "content management systems" or "customer relationship
> management systems".  Those things were single, monolithic black boxes that
> you bought from a vendor (or installed via open-source) that automagically
> did everything for you, once you configured a few templates.   They worked
> great, as long as what you wanted was (a) a web shopping cart, and (b) was
> customizable via some template or config file. If not .. you were SOL.
>
> These monolithic architectures were their downfall, were the driver to
> containers, kubernetes and service meshes. The founders of cloud startup
> XYZ can't spray-paint some config files onto a monolith and then raise $20M
> in venture funding.  But, give them a bunch of pieces-parts containers,
> that they can hook up in some new, novel and exciting way, plus a little
> secret sauce, and buzzword-bingo, a unicorn is born.
>
> And this is why Cassandra makes me yawn with disinterest, if not a bit of
> hostility. It's a big monolithic block. Sure, I can take the AtomSpace, and
> plaster it onto Cassandra, like wrapping some wet paper around a rock. The
> ultimate shape is still that of the rock, no matter how brightly-colored or
> thoughtful that paper wrapped around it is.
>
> So, I'm trying to grab hold of this idea of pieces-parts.  OpenCog needs
> pieces-parts that can be arranged and re-assembled into that mesh that
> provides the distributed-atomspace attributes and requirements du-jour.
>
> Yes, of course, singularity.net is also pursuing a vision of pieces-parts
> that can be assembled. Which is why I am a bit dumb-founded that we are
> entertaining ideas like Cassandra -- it is the very antithesis of modular
> architecture. It's the opposite of a dapp -- It's a big giant lump, the one
> ring to rule them all. It's kind of exactly the poster-child for what not
> to do ...
>
> For a distributed atomspace, what we really need to focus on is
> inter-operability, so that, like javascript (and unlike c++) it is easy to
> assemble modules out of other modules.  Like containers, there should be
> some fairly regularized API for communications (I nominate
> atomese-as-ascii-strings i.e. s-expressions and maybe plan-B
> atomese-as-json). With this under control, we can move on to creating
> unique, custom services aka agents aka dapps or whatever these other things
> might be.
>
> Again, I nominate the building-blocks idea: I took the earlier email, and
> pasted it into the README, here:
> https://github.com/opencog/atomspace-agents
>
> -- Linas
>
>
> On Tue, Aug 11, 2020 at 5:04 PM Linas Vepstas <[email protected]>
> wrote:
>
>> This appears to evade/avoid acknowledging issue #1, which is the (CPU)
>> overhead of translating between multiple formats, the competition for RAM
>> that those formats entail, and the need to ship the resulting bytes between
>> API's, or, worse, over (network or local) sockets.
>>
>> Sure, maybe cassandra has nice solutions for issues #2 #3 and #4, such as
>> consistency, replication, etc. but until you address issue #1 frontally and
>> completely, the remaining issues are utterly unimportant and even
>> delusional.
>>
>> --linas
>>
>> On Tue, Aug 11, 2020 at 10:35 AM Ben Goertzel <[email protected]> wrote:
>>
>>> Matt,
>>>
>>> So regarding Cassandra, it's clear there are many cool things there...
>>> From what I understand, the key differentiating functionality it seems
>>> potentially able to offer would be: The ability to replicate atoms
>>> locally accompanied by eventual consistency  ...
>>>
>>> As a first step, I wonder if it would make sense to try some simple
>>> experiments w/ Cassandra to see if it really does this effectively for
>>> an OpenCog context?   If you or anyone else w/ Cassandra experience
>>> has time to experiment w/ this, it might be quite interesting...
>>>
>>> Is Cassandra's notion of eventual consistency significantly different
>>> from that in Amazon's DynamoDB ?
>>>
>>> It seems that in some cases in OpenCog we might want to let two
>>> versions of an Atom drift even further/longer than is commonly allowed
>>> to happen in most Dynamo-based systems... but this really comes down
>>> to, how flexible is the eventual consistency management /
>>> configuration in these things?
>>>
>>> ben
>>>
>>> On Wed, Jul 29, 2020 at 12:19 PM Matt Chapman <[email protected]>
>>> wrote:
>>> >
>>> > > Which peers?
>>> > As determined by a token ring:
>>> >
>>> >
>>> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archDataDistributeDistribute.html
>>> >
>>> > I think you could almost replace "vnode" with "chunk" if you wanted to
>>> adopt the Cassandra architecture, although I wouldn't be surprised to see
>>> performance problems with a huge number of vnodes, so it might actually
>>> need to be a "chunk-hash modulo reasonable number of vnodes".
>>> >
>>> >  > How do you find them?
>>> >
>>> > By calculating the partition token via consistent hash, as Cassandra
>>> does with Murmur3. This tells you the authoritative source for the chunk
>>> you want. You might also have a local cache of other peers that have had
>>> replicas of that chunk, in case any of them are more responsive to you.
>>> Cassandra calls this process of finding potential replicas "Snitching".
>>> >
>>> >
>>> >  > You are thinking Kademlia (as do I, when I think of publishing) or
>>> OpenDHT or IPFS.
>>> >
>>> > Nope. I've only played with IPFS a bit, but I don't expect it to be
>>> performance for the atomsoace use case. I'm only vaguely familiar with
>>> openDHT; it seems worth exploring, but I'm sure you understand it far
>>> better than I do.
>>> >
>>> > I'm not very familiar with p2p systems like kademlia, but I suspect
>>> that's optimized for consistency & availability over performance, so not
>>> the right choice for datomspace.
>>> >
>>> > By this point, it should be clear that I look to Cassandra for how
>>> semi-conistent distributed data storage systems should be designed. (Fwiw,
>>> my inspiration for distributed messaging systems comes mostly from Apache
>>> Kafka.)
>>> >
>>> >
>>> > > Which is great, if all you're doing is publishing small amounts of
>>> static, infrequently-changing information.  Not so much, if interacting or
>>> blasting out millions of updates.  Neither system can handle that --
>>> literally -- tried that, been there, done that. They are simply not
>>> designed for that.
>>> >
>>> > Cassandra is.  To be fair, Cassandra is optimized for massive scale,
>>> with may involve some trade-offs that are not desirable for present-day
>>> atomspace use cases.
>>> >
>>> > See also, ScyllaaDB for a C++ reimplementation of Cassandra.
>>> >
>>> > > Now, perhaps using only a hash-driven system, it is possible to
>>> overcome these issues. I do not know how to do this. Perhaps someone does
>>> -- perhaps there are even published papers ... I admit I did not do a
>>> careful literature search.
>>> >
>>> > http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
>>> >
>>> http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
>>> >
>>> > Matt
>>> >
>>> >
>>> >
>>> > On Wed, Jul 29, 2020, 9:37 AM Linas Vepstas <[email protected]>
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jul 29, 2020 at 1:09 AM Matt Chapman <[email protected]>
>>> wrote:
>>> >>>
>>> >>> >I think it's a mistake to try to think of a distributed atomspace
>>> as one super-giant, universe-filling uniform, undifferentiated blob of
>>> storage.
>>> >>>
>>> >>> > You don't want broadcast messages going out to the whole universe.
>>> >>>
>>> >>> Not sure if you intended to imply it, but the reality of the first
>>> statmentt need not require the 2nd statement. Hashes of atoms/chunks can be
>>> mapped via modulo onto hashes of peer IDs so that messages need only go to
>>> one or few peers.
>>> >>
>>> >>
>>> >> Which peers?  How do you find them? You are thinking Kademlia (as do
>>> I, when I think of publishing) or OpenDHT or IPFS. Which is great, if all
>>> you're doing is publishing small amounts of static, infrequently-changing
>>> information.  Not so much, if interacting or blasting out millions of
>>> updates.  Neither system can handle that -- literally -- tried that, been
>>> there, done that. They are simply not designed for that.
>>> >>
>>> >> Now, perhaps using only a hash-driven system, it is possible to
>>> overcome these issues. I do not know how to do this. Perhaps someone does
>>> -- perhaps there are even published papers ... I admit I did not do a
>>> careful literature search.
>>> >>
>>> >> But, basically, before we are even out of the gate, we already have a
>>> snowball of problems with no obvious solution.  Haven't even written any
>>> code, and are beset by technical problems. That's not an auspicious
>>> beginning.
>>> >>
>>> >> If you have something more specific, let me know. Right now, I simply
>>> don't know how to do this.
>>> >>
>>> >> --linas
>>> >>>
>>> >>>
>>> >>> Specialization has a cost, in that you need to maintain some central
>>> directory or gossip protocol so that peers can learn which other peers are
>>> specialized to which purpose.
>>> >>>
>>> >>> An ideal general intelligence network may very well include both a
>>> large number of generalist, undifferentiated peers and clusters of highly
>>> interconnected specialized peers. If peers are neurons, I think this
>>> describes the human nervous system also, no?
>>> >>>
>>> >>> To borrow terms from my previous messsge, generalist peers own many
>>> atoms, and replicate few, while specialist peers own few or none, but
>>> replicate many.
>>> >>>
>>> >>> Matt
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 28, 2020, 10:36 PM Linas Vepstas <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jul 28, 2020 at 11:41 PM Ben Goertzel <[email protected]>
>>> wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Hmm... you are right that OpenCog hypergraphs have natural chunks
>>> >>>>> defined by recursive incoming sets.   However, I think these chunks
>>> >>>>> are going to be too small, in most real-life Atomspaces, to serve
>>> the
>>> >>>>> purpose of chunking for a distributed Atomspace
>>> >>>>>
>>> >>>>> I.e. it is true that in most cases the recursive incoming set of an
>>> >>>>> Atom should all be in the same chunk.  But I think we will probably
>>> >>>>> need to deal with chunks that are larger than the recursive
>>> incoming
>>> >>>>> set of a single Atom, in very many cases.
>>> >>>>
>>> >>>>
>>> >>>> I like the abstract to the Ja-be-ja paper, will read and ponder. It
>>> sounds exciting.
>>> >>>>
>>> >>>> But ... the properties of a chunk depends on what you want to do
>>> with it.
>>> >>>>
>>> >>>> For example: if some peer wants to declare a list of everything it
>>> holds, then clearly, creating a list of all of its atoms is self-defeating.
>>> But if some user wants some specific chunk, well, how does the user ask for
>>> that? How does the user know what to ask for?   How does the user say "hey
>>> I want that chunk which has these contents"?  Should the user say "deliver
>>> to me all chunks that contain Atom X"? If the user says this, then how does
>>> the peer/server know if it has any checks with Atom X in it?  Does the
>>> peer/server keep a giant index of all atoms it has, and what chunks they
>>> are in? Is every peer/server obliged to waste some CPU cycles to figure out
>>> if it's holding Atom X?  This gets yucky, fast.
>>> >>>>
>>> >>>> This is where QueryLinks are marvelous: the Query clearly states
>>> "this is what I want" and the query is just a single Atom, and it can be
>>> given an unambiguous, locally-computable (easily-computable; we already do
>>> this)  80-bit or a 128-bit (or bigger) hash and that hash can be blasted
>>> out to the network (I'm thinking Kademlia, again) in a compact way - its
>>> not a lot of bytes.  The request for the "query chunk" is completely
>>> unambiguous, and the user does not have to make any guesses whatsoever
>>> about what may be contained in that chunk.  Whatever is in there, is in
>>> there. This solves the naming problem above.
>>> >>>>
>>> >>>>>
>>> >>>>> What happens when the results for that (new) BindLink query are
>>> spread
>>> >>>>> among multiple peers on the network in some complex way?
>>> >>>>
>>> >>>>
>>> >>>> I'm going to avoid this question for now, because "it depends" and
>>> "not sure" and "I have some ideas".
>>> >>>>
>>> >>>> My gut impulse is that the problem splits into two parts: first,
>>> find the peers that you want to work with, second, figure out how to work
>>> with those peers.
>>> >>>>
>>> >>>> The first part needs to be fairly static, where a peer can
>>> advertise "hey this is the kind of data I hold, this is the kind of work
>>> I'm willing to perform." Once a group of peers is located, many of the
>>> scaling issues go away: groups of peers tend to be small.  If they are not,
>>> you organize them hierarchically, they way you might organize people, with
>>> specialists for certain tasks.
>>> >>>>
>>> >>>> I think it's a mistake to try to think of a distributed atomspace
>>> as one super-giant, universe-filling uniform, undifferentiated blob of
>>> storage. I think we'll run into all sorts of conceptual difficulties and
>>> design problems if you try to do that. If nothing else, it starts smelling
>>> like quorum-sensing in bacteria. Which is not an efficient way to
>>> communicate. You don't want broadcast messages going out to the whole
>>> universe. Think instead of atomspaces connecting to one-another like
>>> dendrites and axons: a limited number, a small number of connections
>>> between atomspaces,  but point-to-point, sharing only the data that is
>>> relevant for that particular peer-group.
>>> >>>>
>>> >>>> -- Linas
>>> >>>>
>>> >>>> --
>>> >>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>> >>>>         --Peter da Silva
>>> >>>>
>>> >>>> --
>>> >>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> >>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to [email protected].
>>> >>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAHrUA35zN4aaSrZ2Dpu4qLUL1bYfjAF_rGiS_xxg2-E-SBqY3Q%40mail.gmail.com
>>> .
>>> >>>
>>> >>> --
>>> >>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> >>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to [email protected].
>>> >>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAPE4pjCyzOcoRAOPj7aGsj_73dAUnWovbjeaM4qjeM43hzXA6A%40mail.gmail.com
>>> .
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>> >>         --Peter da Silva
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> >> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to [email protected].
>>> >> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAHrUA36esvtcgGrZ%3D4rCVMDde74TYKF1%3DS-AwLG95UYrT5Mdrg%40mail.gmail.com
>>> .
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> > To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAPE4pjALKeWmpzxwoYR7gCmS5ZcDqrrKPaB0V-UZe814G6cwTA%40mail.gmail.com
>>> .
>>>
>>>
>>>
>>> --
>>> Ben Goertzel, PhD
>>> http://goertzel.org
>>>
>>> “The only people for me are the mad ones, the ones who are mad to
>>> live, mad to talk, mad to be saved, desirous of everything at the same
>>> time, the ones who never yawn or say a commonplace thing, but burn,
>>> burn, burn like fabulous yellow roman candles exploding like spiders
>>> across the stars.” -- Jack Kerouac
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CACYTDBdYpCUcqMcEAUDtn_P4UbrCq1PrC7keJJoArFU5B%3Dq1Cw%40mail.gmail.com
>>> .
>>>
>>
>>
>> --
>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>         --Peter da Silva
>>
>>
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA362wWYPdp1L5g%3DORV77XYc5aDLh4SEGtn-zPJ-JWWge4g%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA362wWYPdp1L5g%3DORV77XYc5aDLh4SEGtn-zPJ-JWWge4g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAPE4pjC7D%2BQLwZbAKbZe2Mi3dftw9_o%2BMATbgnvoqN7hZ_wp8A%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to