Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Tue, 28 Jul 2020 21:47:24 -0700

Part the second.

This email only deals with the kinds of communications that two peers may
want to have between each other, in a distributed AtomSpace.  It's a
cut-n-paste of a proposed API (in C++; it is not hard to imagine the
python/scheme equivalents).  I think it is general enough to do everything
that Habush wants for the genomic data servers.


Since it's just an API for now, perhaps it's time to ponder an RFI process
-- Request for Implementation.  It's a cut and paste of some code I plan to
merge "real soon now", mostly because it's almost dirt-simple to
implement.  The API is just ... one!!  new function:

      /**
       * Run the `query` on the remote server, and place the query
       * results onto the `key`, both locally, and remotely.
       * The `query` must be either a JoinLink, MeetLink or QueryLink.
       *
       * Because MeetLinks and QueryLinks can be cpu-intensive, not
       * all backends will honor this request. (JoinLinks will be
       * honored, in general; they can be thought of as a generalized
       * incoming set, and are much faster to process.) Backends are
       * free to return previously-cached results for the search,
       * rather than running a fresh search. If the flag `fresh` is
       * set to `true`, then the server may interpret this as a
       * request to perform a fresh search.  It is not required to
       * honor this request.
       *
       * If the `metadata_key` is provided, then metadata about the
       * search is returned. This may include a time-stamp indicating
       * when the search was last performed. If the search was refused,
       * a value indicating that will be returned.  The metadata is
       * intended to allow the receiver (i.e the user of this local
       * AtomSpace) what to do next.
       *
       * Note that the remote server may periodically purge search
       * results to save on storage usage. This is why the search
       * results are returned and placed in the local space.
       *
       * Only the Atoms that were the result of the search are returned.
       * Any Values hanging off those Atoms are not transfered from the
       * remote server to the local AtomSpace.
       *
       * FYI Design Note: in principle, I suppose that we could have
       * this method run any atom that has an `execute()` method on
       * it. At this time, this is not allowed, for somewhat vague
       * and arbitrary reasons: (1) we do not want to DDOS the remote
       * server with heavy CPU processing demands (you can use the
       * CogServer directly, if you want to do that). We also want to
       * limit the amount of complexity that the remote server
implementation
       * must provide. For example, there's a slim chance that traditional
       * SQL ang GraphQL server might be able to support some of the
       * simpler queries.  If you want full-function hypergraph query,
       * just use the CogServer directly.
       */
      virtual void runQuery(const Handle& query, const Handle& key,
                            Handle metadata_key = Handle::UNDEFINED,
                            bool fresh=false);

That's it. This should take less than a day to implement (famous last
words).

--linas


On Tue, Jul 28, 2020 at 6:17 PM Linas Vepstas <[email protected]>
wrote:

> So
>
> > a miracle genius insight to bypass this cruel reality?
>
> So .. hypergraphs really are different than graphs, in some important
> ways. The graph-algorithm people struggle to find "chunky" pieces, because
> they are limited to just ... vertexes and edges. We've got something ...
> better. Hypergraphs have "natural chunks". Given an atom X, the
> corresponding natural chunk is the entire, recursive incoming set of X.
>
> It's worth dwelling on this point for a moment. For an ordinary graph,
> given a vertex V, it is tempting to define a "natural chunk" as V plus all
> the edges E attached to it. But then one has the overwhelming urge to make
> it just a tiny bit larger ... by also adding those edges that would form
> complete triangles. If one succumbs to this urge, then, heck, lets include
> those edges that make squares, or maybe instead, if an edge is attached to
> a dense hub, then include that hub and spokes as well... all of a sudden,
> it snowballs out of control, and the simple idea of a "natural chunk" in a
> graph is not natural at all, it's just some snowball of things that were
> deemed "well-connected-enough".
>
> Somehow, hypergraphs chunks don't succumb to this temptation. The incoming
> set is very natural, and I can't think of any particular way that I would
> want to make it "larger".  I can't think of any heuristic, rule-of-thumb,
> common-sense idea to make it larger. So it really feels natural, it's
> resistant to snowballing, it's endowed with a naturalness property that
> ordinary graphs don't have.  ... and that's a good thing.  We can come back
> to this, but it seems reasonable to set aside the chunking problem for now.
>
> OK ... so a full-decentralized peer-to-peer atomspace network. I think
> this might be easier than you imagine, and that we are not very far from
> having one.  So ... first, a peer-to-peer network needs to have a way for
> two peers to chat with each-other. I think that part is "more-or-less"
> done, with the https://github.com/opencog/atomspace-cog/ work. Basically,
> each peer runs a cogserver, and it can download or upload those parts of an
> atomspace that it wants with the other connected peers. We should talk more
> about "what parts each peer wants to up/download and share" (it would be
> better if you tell me what you imagine these might be. Right now, I'm
> thinking mostly "results of BindLinks or JoinLinks")
>
> Next, a peer-to-peer network needs to have a peer-discovery system. I
> think this might be relatively straightforward using either opendht or
> ipfs. Several modes that can apply.
>
> * Authorities: a peer could publish an "authoritative record" meaning
> basically it has the complete entire dataset, and is willing to share it.
> e.g. "Genomic Data version 2.0 (Complete)" and the URL it can be found at.
> e.g "cog://example.com:27001/". We might want to stick a time-stamp in
> there, maybe some other meta-data. Both IPFS and DHT provide natural ways
> to ask "who's got 'Genomic Data version 2.0 (Complete)' and get back a
> reply with the URL.
>
> * Peer networks. The idea here is that (if) no one has the full dataset,
> but everyone has some part of it. Suppose you want some portion of it.
> Suppose that the portion you want can be described by a BindLink. You then
> look (in dht/ipfs) to see which peers in the network have recent results
> for that bindlink (i.e. they ran it recently, they've got hot, recent
> results cached). You then contact those peers (at cog://example.com/
> using the existing cogserver) to get those results. It is up to you to
> merge them all together (deduplicate). And then you would announce "hey
> I've recently run this bindlink, I've got cached results, in case anyone
> cares" (this is all automated under the covers).
>
> It's possible that none of the peers on the network have recent results
> for the bindlink. In which case, you have to ask them "oh pretty please can
> you run this query for me?" -- you ask because some might refuse to,
> because they may be overloaded. e.g. they may be willing to serve up
> incoming-sets, but not bindlinks. You can then punt, or you can ask for the
> incoming sets if you think you don't have everything you need.
>
> I'm glossing over details, but I think all this is quite doable, and just
> not that hard.
>
> The meta-issue here really is: "who needs this, and by when?" I know that
> Habush wants to have a genomic data server, but he has not expressed how
> much data he wants it to store, how many peers he wants to hang off of it,
> or whether he wants to have multiple redundant servers, or what. He was
> inventing his own server infrastructure to do this ... which is not a bad
> idea, but I think a network of cogservers could also do the trick.
>
> Wait, let me rephrase that last sentence: I don't have any clear idea of
> what is wanted for some genomic-data-service, so I can't directly claim
> that the existing network of cogservers "does the trick". Maybe it does? If
> not, why not? What desirable function is missing?
>
> See where I'm going with this? I'm basically saying "let's prototype some
> of these basics, and use them in an actual project, and see how it goes".
> --linas
>


-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35XeOY8LX2VAOyM%3DQ-WzaaYQyvZQG%3DBpU%3D5%2B0h5jS6vbw%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to