Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Tue, 11 Aug 2020 17:04:42 -0700

Sigh.

I dislike writing long emails because I fear no one reads them, or that
they are viewed as overly aggressive and pugnacious. But until such time as
we have mind-reading or neural laces, its .. email.


I want to talk about "service meshes". The problem with shopping for
cassandra, or any of the other suggested databases, is that they are all
"monolithic black boxes". You pick one, and you get what you get: whatever
is provided, that's what it is. Sure, some configuration files somewhere
allow you to tune this and that, but that's all.

The service mesh idea (and the npm/js idea before that) is to assemble your
system out of small, self-contained pieces. Sure, the object-oriented folks
have been talking about this for 3 or 4 decades, and it's cited as the
raison-d'etre for things like C++. But C++ never lived up to this ideal.
There are no generic C++ frameworks. None. At All. (OK, so SGI had one or
two in the early 1990's ...) Something is ... missing...  in C++.  Compare
this to node.js and npm which are wildly successful over-achievers in this
category.  People regularly build large applications by assembling a
cacophony of tiny little javascript parts. Clearly, javascript has
something that C++ does not.  Something that makes the OO dream achievable
not just in theory, but regularly validated in practice.

Now, there are some down-sides to npm apps: they contain hundreds or
thousands of parts, and not all of them are well-maintained, and many have
published security vulnerabilities that remain unpatched. Worse, patching
some of them require incompatible API changes that would break users. So it
has its own prickly and thorny issues that are unique and different from
those that other languages (python, scheme, c++) suffer from.

In the cloud world, there has long been, and continues to be a movement to
meshes of containerized applications. Here, docker is the prototypical
container -- lxc/lxd/lxe more generally.  Managing these containers
requires kubernetes, and more: the "service meshes" (istio, microsoft open
service mesh) provide a layer (a "control plane") that further manages
deployments, error fallbacks, a/b testing, circuit-breakers,
load-balancing, etc.   The mental model is that containerized apps are just
like npm nodes, except they are million times bigger and beefier
(literally) and they all have network interfaces instead of javascript
methods/objects. And since they are so much bigger, they need more active
management.

Now compare the service-mesh idea to the olde-fashioned ideas of "web
shopping carts" or "content management systems" or "customer relationship
management systems".  Those things were single, monolithic black boxes that
you bought from a vendor (or installed via open-source) that automagically
did everything for you, once you configured a few templates.   They worked
great, as long as what you wanted was (a) a web shopping cart, and (b) was
customizable via some template or config file. If not .. you were SOL.

These monolithic architectures were their downfall, were the driver to
containers, kubernetes and service meshes. The founders of cloud startup
XYZ can't spray-paint some config files onto a monolith and then raise $20M
in venture funding.  But, give them a bunch of pieces-parts containers,
that they can hook up in some new, novel and exciting way, plus a little
secret sauce, and buzzword-bingo, a unicorn is born.

And this is why Cassandra makes me yawn with disinterest, if not a bit of
hostility. It's a big monolithic block. Sure, I can take the AtomSpace, and
plaster it onto Cassandra, like wrapping some wet paper around a rock. The
ultimate shape is still that of the rock, no matter how brightly-colored or
thoughtful that paper wrapped around it is.

So, I'm trying to grab hold of this idea of pieces-parts.  OpenCog needs
pieces-parts that can be arranged and re-assembled into that mesh that
provides the distributed-atomspace attributes and requirements du-jour.

Yes, of course, singularity.net is also pursuing a vision of pieces-parts
that can be assembled. Which is why I am a bit dumb-founded that we are
entertaining ideas like Cassandra -- it is the very antithesis of modular
architecture. It's the opposite of a dapp -- It's a big giant lump, the one
ring to rule them all. It's kind of exactly the poster-child for what not
to do ...

For a distributed atomspace, what we really need to focus on is
inter-operability, so that, like javascript (and unlike c++) it is easy to
assemble modules out of other modules.  Like containers, there should be
some fairly regularized API for communications (I nominate
atomese-as-ascii-strings i.e. s-expressions and maybe plan-B
atomese-as-json). With this under control, we can move on to creating
unique, custom services aka agents aka dapps or whatever these other things
might be.

Again, I nominate the building-blocks idea: I took the earlier email, and
pasted it into the README, here: https://github.com/opencog/atomspace-agents

-- Linas


On Tue, Aug 11, 2020 at 5:04 PM Linas Vepstas <[email protected]>
wrote:

> This appears to evade/avoid acknowledging issue #1, which is the (CPU)
> overhead of translating between multiple formats, the competition for RAM
> that those formats entail, and the need to ship the resulting bytes between
> API's, or, worse, over (network or local) sockets.
>
> Sure, maybe cassandra has nice solutions for issues #2 #3 and #4, such as
> consistency, replication, etc. but until you address issue #1 frontally and
> completely, the remaining issues are utterly unimportant and even
> delusional.
>
> --linas
>
> On Tue, Aug 11, 2020 at 10:35 AM Ben Goertzel <[email protected]> wrote:
>
>> Matt,
>>
>> So regarding Cassandra, it's clear there are many cool things there...
>> From what I understand, the key differentiating functionality it seems
>> potentially able to offer would be: The ability to replicate atoms
>> locally accompanied by eventual consistency  ...
>>
>> As a first step, I wonder if it would make sense to try some simple
>> experiments w/ Cassandra to see if it really does this effectively for
>> an OpenCog context?   If you or anyone else w/ Cassandra experience
>> has time to experiment w/ this, it might be quite interesting...
>>
>> Is Cassandra's notion of eventual consistency significantly different
>> from that in Amazon's DynamoDB ?
>>
>> It seems that in some cases in OpenCog we might want to let two
>> versions of an Atom drift even further/longer than is commonly allowed
>> to happen in most Dynamo-based systems... but this really comes down
>> to, how flexible is the eventual consistency management /
>> configuration in these things?
>>
>> ben
>>
>> On Wed, Jul 29, 2020 at 12:19 PM Matt Chapman <[email protected]>
>> wrote:
>> >
>> > > Which peers?
>> > As determined by a token ring:
>> >
>> >
>> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archDataDistributeDistribute.html
>> >
>> > I think you could almost replace "vnode" with "chunk" if you wanted to
>> adopt the Cassandra architecture, although I wouldn't be surprised to see
>> performance problems with a huge number of vnodes, so it might actually
>> need to be a "chunk-hash modulo reasonable number of vnodes".
>> >
>> >  > How do you find them?
>> >
>> > By calculating the partition token via consistent hash, as Cassandra
>> does with Murmur3. This tells you the authoritative source for the chunk
>> you want. You might also have a local cache of other peers that have had
>> replicas of that chunk, in case any of them are more responsive to you.
>> Cassandra calls this process of finding potential replicas "Snitching".
>> >
>> >
>> >  > You are thinking Kademlia (as do I, when I think of publishing) or
>> OpenDHT or IPFS.
>> >
>> > Nope. I've only played with IPFS a bit, but I don't expect it to be
>> performance for the atomsoace use case. I'm only vaguely familiar with
>> openDHT; it seems worth exploring, but I'm sure you understand it far
>> better than I do.
>> >
>> > I'm not very familiar with p2p systems like kademlia, but I suspect
>> that's optimized for consistency & availability over performance, so not
>> the right choice for datomspace.
>> >
>> > By this point, it should be clear that I look to Cassandra for how
>> semi-conistent distributed data storage systems should be designed. (Fwiw,
>> my inspiration for distributed messaging systems comes mostly from Apache
>> Kafka.)
>> >
>> >
>> > > Which is great, if all you're doing is publishing small amounts of
>> static, infrequently-changing information.  Not so much, if interacting or
>> blasting out millions of updates.  Neither system can handle that --
>> literally -- tried that, been there, done that. They are simply not
>> designed for that.
>> >
>> > Cassandra is.  To be fair, Cassandra is optimized for massive scale,
>> with may involve some trade-offs that are not desirable for present-day
>> atomspace use cases.
>> >
>> > See also, ScyllaaDB for a C++ reimplementation of Cassandra.
>> >
>> > > Now, perhaps using only a hash-driven system, it is possible to
>> overcome these issues. I do not know how to do this. Perhaps someone does
>> -- perhaps there are even published papers ... I admit I did not do a
>> careful literature search.
>> >
>> > http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
>> >
>> http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
>> >
>> > Matt
>> >
>> >
>> >
>> > On Wed, Jul 29, 2020, 9:37 AM Linas Vepstas <[email protected]>
>> wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Jul 29, 2020 at 1:09 AM Matt Chapman <[email protected]>
>> wrote:
>> >>>
>> >>> >I think it's a mistake to try to think of a distributed atomspace as
>> one super-giant, universe-filling uniform, undifferentiated blob of storage.
>> >>>
>> >>> > You don't want broadcast messages going out to the whole universe.
>> >>>
>> >>> Not sure if you intended to imply it, but the reality of the first
>> statmentt need not require the 2nd statement. Hashes of atoms/chunks can be
>> mapped via modulo onto hashes of peer IDs so that messages need only go to
>> one or few peers.
>> >>
>> >>
>> >> Which peers?  How do you find them? You are thinking Kademlia (as do
>> I, when I think of publishing) or OpenDHT or IPFS. Which is great, if all
>> you're doing is publishing small amounts of static, infrequently-changing
>> information.  Not so much, if interacting or blasting out millions of
>> updates.  Neither system can handle that -- literally -- tried that, been
>> there, done that. They are simply not designed for that.
>> >>
>> >> Now, perhaps using only a hash-driven system, it is possible to
>> overcome these issues. I do not know how to do this. Perhaps someone does
>> -- perhaps there are even published papers ... I admit I did not do a
>> careful literature search.
>> >>
>> >> But, basically, before we are even out of the gate, we already have a
>> snowball of problems with no obvious solution.  Haven't even written any
>> code, and are beset by technical problems. That's not an auspicious
>> beginning.
>> >>
>> >> If you have something more specific, let me know. Right now, I simply
>> don't know how to do this.
>> >>
>> >> --linas
>> >>>
>> >>>
>> >>> Specialization has a cost, in that you need to maintain some central
>> directory or gossip protocol so that peers can learn which other peers are
>> specialized to which purpose.
>> >>>
>> >>> An ideal general intelligence network may very well include both a
>> large number of generalist, undifferentiated peers and clusters of highly
>> interconnected specialized peers. If peers are neurons, I think this
>> describes the human nervous system also, no?
>> >>>
>> >>> To borrow terms from my previous messsge, generalist peers own many
>> atoms, and replicate few, while specialist peers own few or none, but
>> replicate many.
>> >>>
>> >>> Matt
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jul 28, 2020, 10:36 PM Linas Vepstas <[email protected]>
>> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 28, 2020 at 11:41 PM Ben Goertzel <[email protected]>
>> wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Hmm... you are right that OpenCog hypergraphs have natural chunks
>> >>>>> defined by recursive incoming sets.   However, I think these chunks
>> >>>>> are going to be too small, in most real-life Atomspaces, to serve
>> the
>> >>>>> purpose of chunking for a distributed Atomspace
>> >>>>>
>> >>>>> I.e. it is true that in most cases the recursive incoming set of an
>> >>>>> Atom should all be in the same chunk.  But I think we will probably
>> >>>>> need to deal with chunks that are larger than the recursive incoming
>> >>>>> set of a single Atom, in very many cases.
>> >>>>
>> >>>>
>> >>>> I like the abstract to the Ja-be-ja paper, will read and ponder. It
>> sounds exciting.
>> >>>>
>> >>>> But ... the properties of a chunk depends on what you want to do
>> with it.
>> >>>>
>> >>>> For example: if some peer wants to declare a list of everything it
>> holds, then clearly, creating a list of all of its atoms is self-defeating.
>> But if some user wants some specific chunk, well, how does the user ask for
>> that? How does the user know what to ask for?   How does the user say "hey
>> I want that chunk which has these contents"?  Should the user say "deliver
>> to me all chunks that contain Atom X"? If the user says this, then how does
>> the peer/server know if it has any checks with Atom X in it?  Does the
>> peer/server keep a giant index of all atoms it has, and what chunks they
>> are in? Is every peer/server obliged to waste some CPU cycles to figure out
>> if it's holding Atom X?  This gets yucky, fast.
>> >>>>
>> >>>> This is where QueryLinks are marvelous: the Query clearly states
>> "this is what I want" and the query is just a single Atom, and it can be
>> given an unambiguous, locally-computable (easily-computable; we already do
>> this)  80-bit or a 128-bit (or bigger) hash and that hash can be blasted
>> out to the network (I'm thinking Kademlia, again) in a compact way - its
>> not a lot of bytes.  The request for the "query chunk" is completely
>> unambiguous, and the user does not have to make any guesses whatsoever
>> about what may be contained in that chunk.  Whatever is in there, is in
>> there. This solves the naming problem above.
>> >>>>
>> >>>>>
>> >>>>> What happens when the results for that (new) BindLink query are
>> spread
>> >>>>> among multiple peers on the network in some complex way?
>> >>>>
>> >>>>
>> >>>> I'm going to avoid this question for now, because "it depends" and
>> "not sure" and "I have some ideas".
>> >>>>
>> >>>> My gut impulse is that the problem splits into two parts: first,
>> find the peers that you want to work with, second, figure out how to work
>> with those peers.
>> >>>>
>> >>>> The first part needs to be fairly static, where a peer can advertise
>> "hey this is the kind of data I hold, this is the kind of work I'm willing
>> to perform." Once a group of peers is located, many of the scaling issues
>> go away: groups of peers tend to be small.  If they are not, you organize
>> them hierarchically, they way you might organize people, with specialists
>> for certain tasks.
>> >>>>
>> >>>> I think it's a mistake to try to think of a distributed atomspace as
>> one super-giant, universe-filling uniform, undifferentiated blob of
>> storage. I think we'll run into all sorts of conceptual difficulties and
>> design problems if you try to do that. If nothing else, it starts smelling
>> like quorum-sensing in bacteria. Which is not an efficient way to
>> communicate. You don't want broadcast messages going out to the whole
>> universe. Think instead of atomspaces connecting to one-another like
>> dendrites and axons: a limited number, a small number of connections
>> between atomspaces,  but point-to-point, sharing only the data that is
>> relevant for that particular peer-group.
>> >>>>
>> >>>> -- Linas
>> >>>>
>> >>>> --
>> >>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>> >>>>         --Peter da Silva
>> >>>>
>> >>>> --
>> >>>> You received this message because you are subscribed to the Google
>> Groups "opencog" group.
>> >>>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to [email protected].
>> >>>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA35zN4aaSrZ2Dpu4qLUL1bYfjAF_rGiS_xxg2-E-SBqY3Q%40mail.gmail.com
>> .
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> Groups "opencog" group.
>> >>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to [email protected].
>> >>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAPE4pjCyzOcoRAOPj7aGsj_73dAUnWovbjeaM4qjeM43hzXA6A%40mail.gmail.com
>> .
>> >>
>> >>
>> >>
>> >> --
>> >> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>> >>         --Peter da Silva
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups "opencog" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>> >> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA36esvtcgGrZ%3D4rCVMDde74TYKF1%3DS-AwLG95UYrT5Mdrg%40mail.gmail.com
>> .
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "opencog" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAPE4pjALKeWmpzxwoYR7gCmS5ZcDqrrKPaB0V-UZe814G6cwTA%40mail.gmail.com
>> .
>>
>>
>>
>> --
>> Ben Goertzel, PhD
>> http://goertzel.org
>>
>> “The only people for me are the mad ones, the ones who are mad to
>> live, mad to talk, mad to be saved, desirous of everything at the same
>> time, the ones who never yawn or say a commonplace thing, but burn,
>> burn, burn like fabulous yellow roman candles exploding like spiders
>> across the stars.” -- Jack Kerouac
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CACYTDBdYpCUcqMcEAUDtn_P4UbrCq1PrC7keJJoArFU5B%3Dq1Cw%40mail.gmail.com
>> .
>>
>
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
>

-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA362wWYPdp1L5g%3DORV77XYc5aDLh4SEGtn-zPJ-JWWge4g%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to