We want a large Atomspace, parts of which are in RAM on various machines, parts of which are in persistent storage, and the ability to run a variety of queries and processes across this whole Atomspace. I posted something about the "distributed PLN inference" use-case on this list not long ago.
The current "distributed Atomspace" functionality is cool but it doesn't do this yet. It could be the foundation for a system doing the above, but it might also hit some serious problems. Matt is pointing out how Cassandra could potentially help work around some of these problems with its adjustable levels of consistency. Coordinating a network of distributed sub-Atomspace via a postgres or RocksDB backing store in a hub-and-spokes architecture seems like it's not going to do what we need ultimately... The document Matt Chapman linked above in this thread is the result of a lot of thought by a number of us, and I think explains the above points much more thoroughly than I could do in this brief email (plus a bunch of other points I didn't get to in this email) ben On Mon, Aug 10, 2020 at 11:57 AM Matt Chapman <[email protected]> wrote: >> >> >> >> Does it meet the 7 business requirements in Ben's document: >> >> https://docs.google.com/document/d/1n0xM5d3C_Va4ti9A6sgqK_RV6zXi_xFqZ2ppQ5koqco/edit >> >> ? > > > > I have no clue. I've never seen this document before. It's only the 41st > > document on this topic, and I'm suffering from reader-fatigue. Care to > > summarize what it says? > > Provide effective management of AtomSpaces that are too big to fit in RAM of > any one machine that is available. > > Decrease the overall processing time required to carry out AI operations to > reduce cost per AI operation. > > Decrease memory footprint providing better overall throughput in comparing > with current implementation to reduce cost per AI operation. > > Provide ability to use AtomSpace in the manner of hierarchical cache > structure. In other words, provide a way to look for a specific Atom locally > before start searching it among other components and fetching remotely. > > Provide ability to request for Atom(s) based on a given Atom's property. > > Provide ability to request for subgraphs based on patterns. > > To isolate application layer from source code modification keeping the > AtomSpace API as-is or with minor changes. > > > The idea I'm suggesting, which I readily admit is worth 1/1000th of the > effort required to implement it, is that a Cassandra-like architecture > provides a very good solution for requirements number 4 & 5, and possibly a > foundation for #6. It also provides #1 for some definition of "effective," > arguably better than any centralized architecture, for some definition of > "better." :-) It may very likely fail at #2 and #3 compared to current > alternatives; we won't know until someone builds & benchmarks it, and that's > 1000x more effort... > > If you think that those requirements are already adequately served by > existing solutions, then I will stop adding noise to the conversation. > Otherwise, I'm happy to share more of my experiences if it might be helpful > in formulating an approach. > > [Aside Req. 1 here is in fundamental conflict with 2 & 3; usually "we" accept > a local performance penalty in exchange for distributed & decentralized > scalability. But Cassandra's Tunable Consistency model is the only way I know > to expose this trade-off to the user on a per-query basis, which seems quite > powerful to me, for the Atomspace use case. The value of Tunable Consistency > (relative to its cost to implement) may be the thing I've failed to convince > you of, in which case, I certainly trust your opinion more than mine.] > > All the Best, > > Matt > > -- > Please interpret brevity as me valuing your time, and not as any negative > intention. > > > On Thu, Aug 6, 2020 at 11:18 AM Linas Vepstas <[email protected]> wrote: >> >> >> >> On Thu, Aug 6, 2020 at 11:30 AM Matt Chapman <[email protected]> wrote: >>> >>> >>> I've been hearing people talk about the need for distributed atomspace on >>> and off for 8+ years, >> >> >> Mee too. This was a head-scratcher, since we had a distributed atomspace. So >> I was never sure why they talked about it. >> >>> >>> and I've never seen an answer along the lines of "you can already have a >>> cluster, here's the documentation on how to set it up." >> >> >> Here's the tutorial for it: >> https://github.com/opencog/atomspace/blob/master/examples/atomspace/distributed-sql.scm >> >> I changed the name of the tutorial 5 days ago, because we now have not one, >> not two, but four different distributed atomspace solutions (of which two >> don't scale well) >> >> The instructions to set up each of the four are here: >> >> The oldest one, which is SQL-based: >> https://github.com/opencog/atomspace/tree/master/opencog/persist/sql >> >> The newest one, which is cogserver-based, and my current favorite: >> https://github.com/opencog/atomspace-cog >> >> The IPFS one, which is the one I love to hate: >> https://github.com/opencog/atomspace-ipfs >> >> The DHT one, which I hope to revive maybe if we get a good chunking solution: >> https://github.com/opencog/atomspace-dht >> >>> >>> >>> Does it meet the 7 business requirements in Ben's document: >>> https://docs.google.com/document/d/1n0xM5d3C_Va4ti9A6sgqK_RV6zXi_xFqZ2ppQ5koqco/edit >>> ? >> >> >> I have no clue. I've never seen this document before. It's only the 41st >> document on this topic, and I'm suffering from reader-fatigue. Care to >> summarize what it says? >> >> Performance: did anyone run any of the benchmarks on any of the distributed >> AtomSpaces that we currently have? We *do* have benchmarks for them. >> They're in https://github.com/opencog/benchmark/ >> >> -- Linas >> >> -- >> Verbogeny is one of the pleasurettes of a creatific thinkerizer. >> --Peter da Silva >> >> -- >> You received this message because you are subscribed to the Google Groups >> "opencog" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/opencog/CAHrUA35PqjOXv8uu8QnnAYDx%3D3KDSgoX6w-cRtcnCFLz%3DZKYPw%40mail.gmail.com. > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAPE4pjAZMcDWXNCvxR-v_KCVJxyg-DVskwhN7kQMekQHjMKidw%40mail.gmail.com. -- Ben Goertzel, PhD http://goertzel.org “The only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn like fabulous yellow roman candles exploding like spiders across the stars.” -- Jack Kerouac -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBdxT_pYSeoMSMD4znFdXjvUOsXOmTr22gQj1xP195SZ9Q%40mail.gmail.com.
