Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Tue, 28 Jul 2020 16:18:24 -0700

So

> a miracle genius insight to bypass this cruel reality?

So .. hypergraphs really are different than graphs, in some important ways.
The graph-algorithm people struggle to find "chunky" pieces, because they
are limited to just ... vertexes and edges. We've got something ... better.
Hypergraphs have "natural chunks". Given an atom X, the corresponding
natural chunk is the entire, recursive incoming set of X.

It's worth dwelling on this point for a moment. For an ordinary graph,
given a vertex V, it is tempting to define a "natural chunk" as V plus all
the edges E attached to it. But then one has the overwhelming urge to make
it just a tiny bit larger ... by also adding those edges that would form
complete triangles. If one succumbs to this urge, then, heck, lets include
those edges that make squares, or maybe instead, if an edge is attached to
a dense hub, then include that hub and spokes as well... all of a sudden,
it snowballs out of control, and the simple idea of a "natural chunk" in a
graph is not natural at all, it's just some snowball of things that were
deemed "well-connected-enough".

Somehow, hypergraphs chunks don't succumb to this temptation. The incoming
set is very natural, and I can't think of any particular way that I would
want to make it "larger". I can't think of any heuristic, rule-of-thumb,
common-sense idea to make it larger. So it really feels natural, it's
resistant to snowballing, it's endowed with a naturalness property that
ordinary graphs don't have. ... and that's a good thing. We can come back
to this, but it seems reasonable to set aside the chunking problem for now.

OK ... so a full-decentralized peer-to-peer atomspace network. I think this
might be easier than you imagine, and that we are not very far from having
one. So ... first, a peer-to-peer network needs to have a way for two
peers to chat with each-other. I think that part is "more-or-less" done,
with the https://github.com/opencog/atomspace-cog/ work. Basically, each
peer runs a cogserver, and it can download or upload those parts of an
atomspace that it wants with the other connected peers. We should talk more
about "what parts each peer wants to up/download and share" (it would be
better if you tell me what you imagine these might be. Right now, I'm
thinking mostly "results of BindLinks or JoinLinks")

Next, a peer-to-peer network needs to have a peer-discovery system. I think
this might be relatively straightforward using either opendht or ipfs.
Several modes that can apply.

* Authorities: a peer could publish an "authoritative record" meaning
basically it has the complete entire dataset, and is willing to share it.
e.g. "Genomic Data version 2.0 (Complete)" and the URL it can be found at.
e.g "cog://example.com:27001/". We might want to stick a time-stamp in
there, maybe some other meta-data. Both IPFS and DHT provide natural ways
to ask "who's got 'Genomic Data version 2.0 (Complete)' and get back a
reply with the URL.

* Peer networks. The idea here is that (if) no one has the full dataset,
but everyone has some part of it. Suppose you want some portion of it.
Suppose that the portion you want can be described by a BindLink. You then
look (in dht/ipfs) to see which peers in the network have recent results
for that bindlink (i.e. they ran it recently, they've got hot, recent
results cached). You then contact those peers (at cog://example.com/ using
the existing cogserver) to get those results. It is up to you to merge them
all together (deduplicate). And then you would announce "hey I've recently
run this bindlink, I've got cached results, in case anyone cares" (this is
all automated under the covers).

It's possible that none of the peers on the network have recent results for
the bindlink. In which case, you have to ask them "oh pretty please can you
run this query for me?" -- you ask because some might refuse to, because
they may be overloaded. e.g. they may be willing to serve up incoming-sets,
but not bindlinks. You can then punt, or you can ask for the incoming sets
if you think you don't have everything you need.

I'm glossing over details, but I think all this is quite doable, and just
not that hard.

The meta-issue here really is: "who needs this, and by when?" I know that
Habush wants to have a genomic data server, but he has not expressed how
much data he wants it to store, how many peers he wants to hang off of it,
or whether he wants to have multiple redundant servers, or what. He was
inventing his own server infrastructure to do this ... which is not a bad
idea, but I think a network of cogservers could also do the trick.

Wait, let me rephrase that last sentence: I don't have any clear idea of
what is wanted for some genomic-data-service, so I can't directly claim
that the existing network of cogservers "does the trick". Maybe it does? If
not, why not? What desirable function is missing?

See where I'm going with this? I'm basically saying "let's prototype some
of these basics, and use them in an actual project, and see how it goes".
--linas

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CAHrUA37aTZMqCty0bsqq2_2Wqp3hZ%3DVZO2fArwFFmcGKTe_%2BDA%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to