Re: [opencog-dev] Distributed Atomspace

Matt Chapman Wed, 29 Jul 2020 11:18:43 -0700

> Which peers?
As determined by a token ring:

https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archDataDistributeDistribute.html


I think you could almost replace "vnode" with "chunk" if you wanted to
adopt the Cassandra architecture, although I wouldn't be surprised to see
performance problems with a huge number of vnodes, so it might actually
need to be a "chunk-hash modulo reasonable number of vnodes".

 > How do you find them?

By calculating the partition token via consistent hash, as Cassandra does
with Murmur3. This tells you the authoritative source for the chunk you
want. You might also have a local cache of other peers that have had
replicas of that chunk, in case any of them are more responsive to you.
Cassandra calls this process of finding potential replicas "Snitching".


 > You are thinking Kademlia (as do I, when I think of publishing) or
OpenDHT or IPFS.

Nope. I've only played with IPFS a bit, but I don't expect it to be
performance for the atomsoace use case. I'm only vaguely familiar with
openDHT; it seems worth exploring, but I'm sure you understand it far
better than I do.

I'm not very familiar with p2p systems like kademlia, but I suspect that's
optimized for consistency & availability over performance, so not the right
choice for datomspace.

By this point, it should be clear that I look to Cassandra for how
semi-conistent distributed data storage systems should be designed. (Fwiw,
my inspiration for distributed messaging systems comes mostly from Apache
Kafka.)


> Which is great, if all you're doing is publishing small amounts of
static, infrequently-changing information.  Not so much, if interacting or
blasting out millions of updates.  Neither system can handle that --
literally -- tried that, been there, done that. They are simply not
designed for that.

Cassandra is.  To be fair, Cassandra is optimized for massive scale, with
may involve some trade-offs that are not desirable for present-day
atomspace use cases.

See also, ScyllaaDB for a C++ reimplementation of Cassandra.

> Now, perhaps using only a hash-driven system, it is possible to overcome
these issues. I do not know how to do this. Perhaps someone does -- perhaps
there are even published papers ... I admit I did not do a careful
literature search.

http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

Matt



On Wed, Jul 29, 2020, 9:37 AM Linas Vepstas <[email protected]> wrote:

>
>
> On Wed, Jul 29, 2020 at 1:09 AM Matt Chapman <[email protected]>
> wrote:
>
>> >I think it's a mistake to try to think of a distributed atomspace as one
>> super-giant, universe-filling uniform, undifferentiated blob of storage.
>>
>> > You don't want broadcast messages going out to the whole universe.
>>
>> Not sure if you intended to imply it, but the reality of the first
>> statmentt need not require the 2nd statement. Hashes of atoms/chunks can be
>> mapped via modulo onto hashes of peer IDs so that messages need only go to
>> one or few peers.
>>
>
> Which peers?  How do you find them? You are thinking Kademlia (as do I,
> when I think of publishing) or OpenDHT or IPFS. Which is great, if all
> you're doing is publishing small amounts of static, infrequently-changing
> information.  Not so much, if interacting or blasting out millions of
> updates.  Neither system can handle that -- literally -- tried that, been
> there, done that. They are simply not designed for that.
>
> Now, perhaps using only a hash-driven system, it is possible to overcome
> these issues. I do not know how to do this. Perhaps someone does -- perhaps
> there are even published papers ... I admit I did not do a careful
> literature search.
>
> But, basically, before we are even out of the gate, we already have a
> snowball of problems with no obvious solution.  Haven't even written any
> code, and are beset by technical problems. That's not an auspicious
> beginning.
>
> If you have something more specific, let me know. Right now, I simply
> don't know how to do this.
>
> --linas
>
>>
>> Specialization has a cost, in that you need to maintain some central
>> directory or gossip protocol so that peers can learn which other peers are
>> specialized to which purpose.
>>
>> An ideal general intelligence network may very well include both a large
>> number of generalist, undifferentiated peers and clusters of highly
>> interconnected specialized peers. If peers are neurons, I think this
>> describes the human nervous system also, no?
>>
>> To borrow terms from my previous messsge, generalist peers own many
>> atoms, and replicate few, while specialist peers own few or none, but
>> replicate many.
>>
>> Matt
>>
>>
>>
>> On Tue, Jul 28, 2020, 10:36 PM Linas Vepstas <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jul 28, 2020 at 11:41 PM Ben Goertzel <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> Hmm... you are right that OpenCog hypergraphs have natural chunks
>>>> defined by recursive incoming sets.   However, I think these chunks
>>>> are going to be too small, in most real-life Atomspaces, to serve the
>>>> purpose of chunking for a distributed Atomspace
>>>>
>>>> I.e. it is true that in most cases the recursive incoming set of an
>>>> Atom should all be in the same chunk.  But I think we will probably
>>>> need to deal with chunks that are larger than the recursive incoming
>>>> set of a single Atom, in very many cases.
>>>>
>>>
>>> I like the abstract to the Ja-be-ja paper, will read and ponder. It
>>> sounds exciting.
>>>
>>> But ... the properties of a chunk depends on what you want to do with
>>> it.
>>>
>>> For example: if some peer wants to declare a list of everything it
>>> holds, then clearly, creating a list of all of its atoms is self-defeating.
>>> But if some user wants some specific chunk, well, how does the user ask for
>>> that? How does the user know what to ask for?   How does the user say "hey
>>> I want that chunk which has these contents"?  Should the user say "deliver
>>> to me all chunks that contain Atom X"? If the user says this, then how does
>>> the peer/server know if it has any checks with Atom X in it?  Does the
>>> peer/server keep a giant index of all atoms it has, and what chunks they
>>> are in? Is every peer/server obliged to waste some CPU cycles to figure out
>>> if it's holding Atom X?  This gets yucky, fast.
>>>
>>> This is where QueryLinks are marvelous: the Query clearly states "this
>>> is what I want" and the query is just a single Atom, and it can be given an
>>> unambiguous, locally-computable (easily-computable; we already do this)
>>> 80-bit or a 128-bit (or bigger) hash and that hash can be blasted out to
>>> the network (I'm thinking Kademlia, again) in a compact way - its not a lot
>>> of bytes.  The request for the "query chunk" is completely unambiguous, and
>>> the user does not have to make any guesses whatsoever about what may be
>>> contained in that chunk.  Whatever is in there, is in there. This solves
>>> the naming problem above.
>>>
>>>
>>>> What happens when the results for that (new) BindLink query are spread
>>>> among multiple peers on the network in some complex way?
>>>>
>>>
>>> I'm going to avoid this question for now, because "it depends" and "not
>>> sure" and "I have some ideas".
>>>
>>> My gut impulse is that the problem splits into two parts: first, find
>>> the peers that you want to work with, second, figure out how to work with
>>> those peers.
>>>
>>> The first part needs to be fairly static, where a peer can advertise
>>> "hey this is the kind of data I hold, this is the kind of work I'm willing
>>> to perform." Once a group of peers is located, many of the scaling issues
>>> go away: groups of peers tend to be small.  If they are not, you organize
>>> them hierarchically, they way you might organize people, with specialists
>>> for certain tasks.
>>>
>>> I think it's a mistake to try to think of a distributed atomspace as one
>>> super-giant, universe-filling uniform, undifferentiated blob of storage. I
>>> think we'll run into all sorts of conceptual difficulties and design
>>> problems if you try to do that. If nothing else, it starts smelling like
>>> quorum-sensing in bacteria. Which is not an efficient way to communicate.
>>> You don't want broadcast messages going out to the whole universe. Think
>>> instead of atomspaces connecting to one-another like dendrites and axons: a
>>> limited number, a small number of connections between atomspaces,  but
>>> point-to-point, sharing only the data that is relevant for that particular
>>> peer-group.
>>>
>>> -- Linas
>>>
>>> --
>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>         --Peter da Silva
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAHrUA35zN4aaSrZ2Dpu4qLUL1bYfjAF_rGiS_xxg2-E-SBqY3Q%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/opencog/CAHrUA35zN4aaSrZ2Dpu4qLUL1bYfjAF_rGiS_xxg2-E-SBqY3Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAPE4pjCyzOcoRAOPj7aGsj_73dAUnWovbjeaM4qjeM43hzXA6A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAPE4pjCyzOcoRAOPj7aGsj_73dAUnWovbjeaM4qjeM43hzXA6A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>         --Peter da Silva
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA36esvtcgGrZ%3D4rCVMDde74TYKF1%3DS-AwLG95UYrT5Mdrg%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA36esvtcgGrZ%3D4rCVMDde74TYKF1%3DS-AwLG95UYrT5Mdrg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAPE4pjALKeWmpzxwoYR7gCmS5ZcDqrrKPaB0V-UZe814G6cwTA%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to