You misunderstood my first comment; I was agreeing with you that Cassandra-backed storage & distribution won't be faster than what I thought you were suggesting: a client-server model where one Rocks-backed atom server is used by many clients who retrieve, manipulate, and return atoms to the central server.
Maybe you're suggesting something very different, or I'm just very confused, because I've been hearing people talk about the need for distributed atomspace on and off for 8+ years, and I've never seen an answer along the lines of "you can already have a cluster, here's the documentation on how to set it up." If that was the answer, and people rejected it because of the lack of disk-back persistence, the I'm agreeing with you that RocksDB may solve much of the problem. Maybe the unsolved part of the problem is consistency/consensus? I tend to agree with your sentiment that consistency is overrated for Atomspace use cases, often not needed or not desirable, but it seems like maybe Ben and others are seeking something like Tunable Consistency. Maybe this is the big chocolate sprinkle? >Who is "we"? Practitioners of the computing arts & sciences, in general. > We've had the ability to run distributed AtomSpaces that far exceed installed RAM, running on a cluster, for more than a decade. Does it meet the 7 business requirements in Ben's document: https://docs.google.com/document/d/1n0xM5d3C_Va4ti9A6sgqK_RV6zXi_xFqZ2ppQ5koqco/edit ? Points 2 & 3 are about performance improvements; Do you believe such improvements are impossible, or would require more effort than the likely benefits would justify? Of the other 5, which already exist and which are what you call "chocolate sprinkles"? All the Best, Matt -- Please interpret brevity as me valuing your time, and not as any negative intention. On Wed, Aug 5, 2020 at 11:32 PM Linas Vepstas <[email protected]> wrote: > > > On Wed, Aug 5, 2020 at 11:59 PM Matt Chapman <[email protected]> > wrote: > >> > I'll bet you a bottle of good wine or other neuroactive substance that >> the existing atomspace client-server infrastructure is faster than >> Cassandra. >> >> No, it won't be faster, >> > > wrong. > > but you'll never be able to store an atomspace bigger than what you can >> fit in memory >> > > That is also wrong. > > on that single atomserver, and you'll never be able to perform more >> operations (on the canonical atomspace) in parallel than what that one atom >> server can support. >> > > That has been wrong for 12+ years. > > Obviously distributed systems have a performance penalty. >> > > We've had a distributed atomspace for 12+ years. > > We don't build them because we need to go faster (at the level of a single >> process), we build them because we need to go bigger (in terms of storage >> space or parallel processes). >> > > Who is "we"? > > We've had the ability to run distributed AtomSpaces that far exceed > installed RAM, running on a cluster, for more than a decade. People talk > about this as if it doesn't exist or it doesn't work or there's something > wrong with it, or they want something with more chocolate sprinkles on it. > > I'm annoyed. Seriously, is no one actually paying attention to anything? > WTF. > > --linas > > >> All the Best, >> >> Matt >> >> -- >> Please interpret brevity as me valuing your time, and not as any negative >> intention. >> >> >> On Wed, Aug 5, 2020 at 12:16 AM Linas Vepstas <[email protected]> >> wrote: >> >>> LevelDB/RocksDB and Cassandra are apples and kumquats. >>> >>> LevelDB/RocksDB are C++ libraries, single-user, non-networked, >>> non-distributed, link directly into the app, store data directly in files, >>> on the local system. They are "embedded databases". So conceptually, they >>> are like the 50-year old unix dbm, except that they have 50 years of >>> computer science behind them, such as bloom filters and log-structured >>> merge trees and what-not (e.g. Rocks is explicitly optimized for SSD >>> disks.). LevelDB was created by google in 2011. Then facebook took levelDB >>> in 2013 and forked it to create rocksdb, and added a bunch of stuff, made >>> some parts run faster. >>> >>> Just like dbm, people use leveldb/rocksdb to build *other* databases on >>> top of it. (that's the beauty of "embedded") For example, there's some >>> version of MariaDB that uses RocksDB for the actual storage. >>> >>> Cassandra is written in java, its a network database, so basically, its >>> like postgres, except its not postgres, because it uses CQL instead of SQL >>> so its not actually SQL compatible. Otherwise, it has exactly all of the >>> exact same issues that any other networked client-server database has, >>> including the need for an experienced DB Admin to set it up, run it, >>> administer it. (This is an easily forgotten but important detail --vs >>> rocksdb just ... writes to a file. No DBAdmin required.) >>> >>> For the app developer (i.e. me) one must to write in a custom query >>> language -- CQL, convert my data into CQL format, send that data via tcpip >>> to the server, which unpacks it, then runs it's interpreter to figure out >>> what I said/wanted, unpacks my data packets, converts them into it's own >>> internal format, (so, that's a second format conversion) and actually >>> performs whatever operations I had specified. This is conceptually >>> identical to *any* client-server database. >>> >>> For CQL I copy from wikipedia: >>> >>> CREATE KEYSPACE MyKeySpace >>> WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 >>> }; >>> USE MyKeySpace; >>> CREATE COLUMNFAMILY MyColumns (id text, Last text, First text, PRIMARY >>> KEY(id)); >>> INSERT INTO MyColumns (id, Last, First) VALUES ('1', 'Doe', 'John'); >>> SELECT * FROM MyColumns; >>> >>> Looks identical to SQL except its not actually compatible. Yuck. This >>> offers exactly zero advantages of SQL that I can see; the fact that its >>> key-value somewhere in there offers no perceivable advantage that I can >>> make out. >>> >>> I'll bet you a bottle of good wine or other neuroactive substance that >>> the existing atomspace client-server infrastructure is faster than >>> Cassandra. That is -- start a cogserver, as is, today, open the rocksdb >>> backend under it (so everything going to the cogserver gets stored), and >>> then let other atomspaces connect to the cogserver (using the existing >>> client-server code) that you will have a distributed atomspace that runs >>> faster than cassandra. >>> >>> OK, it doesn't have any of those other bells-n-whistles in cassandra, >>> but no one really knows how to do anything useful with those other >>> bells-n-whistles, other than to suggest that they might be somehow useful >>> in some way, maybe, for something. That surpasses my attention span. >>> >>> --linas >>> >>> >>> >>> >>> >>> On Tue, Aug 4, 2020 at 11:39 PM Ben Goertzel <[email protected]> wrote: >>> >>>> I wonder how different would be the API for RocksDB vs., say, >>>> Cassandra which Matt Chapman has recommended (which may have some >>>> advantages in terms of allowing more configurable/flexible notions of >>>> consistency?) >>>> >>>> >>>> >>>> On Tue, Aug 4, 2020 at 4:44 PM Linas Vepstas <[email protected]> >>>> wrote: >>>> > >>>> > >>>> > >>>> > On Tue, Aug 4, 2020 at 11:51 AM Ben Goertzel <[email protected]> >>>> wrote: >>>> >> >>>> >> Wow! >>>> > >>>> > >>>> > You're welcome. Querying from the database is now supported. The >>>> demo is in >>>> > >>>> https://github.com/opencog/atomspace-rocks/blob/master/examples/query-storage.scm >>>> > >>>> > At the moment it works, but I'm rethinking the API. Do check it >>>> out. Feedback, opinions, suggestions, etc. invited. >>>> > >>>> > --linas >>>> > >>>> >> >>>> >> On Tue, Aug 4, 2020, 8:45 AM Linas Vepstas <[email protected]> >>>> wrote: >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Thu, Jul 30, 2020 at 11:20 AM Ben Goertzel <[email protected]> >>>> wrote: >>>> >>>> >>>> >>>> >>>> >>>> -- send a Pattern Matcher query to BackingStore >>>> >>>> -- sent the Atom-chunk resulting from the query to Atomspace >>>> >>>> >>>> >>> >>>> >>> So, >>>> >>> >>>> >>> Someone needed to prove me wrong, and who better to do that but me. >>>> I took the weekend to implement a file-based backing store, using RocksDB >>>> (which itself is a variant on LevelDB). It's here: >>>> https://github.com/opencog/atomspace-rocks >>>> >>> >>>> >>> -- It works, all of the old persistent store unit tests pass (there >>>> are 8 of them) >>>> >>> -- its faster than the SQL by factors of 2x to 5x depending on >>>> dataset. With tuning, maybe one could do better. (I have no plans to tune, >>>> right now) >>>> >>> >>>> >>> I'm certain I know of a simple/easy way to "send a Pattern Matcher >>>> query to BackingStore and send the Atom-chunk resulting from the query to >>>> Atomspace" and will implement this afternoon (famous last words...) BTW, >>>> you can *already* do this with the cogserver-based network client (i.e. >>>> without sql, just the network only) here: >>>> https://github.com/opencog/atomspace-cog/blob/master/examples/remote-query.scm >>>> >>> >>>> >>> By combining these two backends, I think you can get file-backed >>>> storage that is also network-enabled. Or rather, you have two key building >>>> blocks for exploring both distributed and also decentralized designs. >>>> >>> >>>> >>> Some background info, from the README: >>>> >>> >>>> >>> AtomSpace RocksDB Backend >>>> >>> ========================= >>>> >>> >>>> >>> Save and restore AtomSpace contents to a RocksDB database. The >>>> RocksDB >>>> >>> database is a single-user, local-host-only file-backed database. >>>> That >>>> >>> means that only one AtomSpace can connect to it at any given moment. >>>> >>> >>>> >>> In ASCII-art: >>>> >>> >>>> >>> ``` >>>> >>> +-------------+ >>>> >>> | AtomSpace | >>>> >>> | | >>>> >>> +---- API-----+ >>>> >>> | | >>>> >>> | RocksDB | >>>> >>> | files | >>>> >>> +-------------+ >>>> >>> ``` >>>> >>> RocksDB (see https://rocksdb.org/) is an "embeddable persistent >>>> key-value >>>> >>> store for fast storage." The goal of layering the AtomSpace on top >>>> of it >>>> >>> is to provide fast persistent storage for the AtomSpace. There are >>>> >>> several advantages to doing this: >>>> >>> >>>> >>> * RocksDB is file-based, and so it is straight-forward to make >>>> backup >>>> >>> copies of datasets, as well as to share these copies with others. >>>> >>> * RocksDB runs locally, and so the overhead of pushing bytes through >>>> >>> the network is eliminated. The remaining >>>> inefficiencies/bottlenecks >>>> >>> have to do with converting between the AtomSpace's natural in-RAM >>>> >>> format, and the position-independent format that all databases >>>> need. >>>> >>> (Here, we say "position-independent" in that the DB format does >>>> not >>>> >>> contain any C/C++ pointers; all references are managed with local >>>> >>> unique ID's.) >>>> >>> * RocksDB is a "real" database, and so enables the storage of >>>> datasets >>>> >>> that might not otherwise fit into RAM. This back-end does not try >>>> >>> to guess what your working set is; it is up to you to load, work >>>> with >>>> >>> and save those Atoms that are important for you. The >>>> [examples](examples) >>>> >>> demonstrate exactly how that can be done. >>>> >>> >>>> >>> This backend, together with the CogServer-based >>>> >>> [network AtomSpace](https://github.com/opencog/atomspace-cog) >>>> >>> backend provides a building-block out of which more complex >>>> >>> distributed and/or decentralized AtomSpaces can be built. >>>> >>> >>>> >>> Status >>>> >>> ------ >>>> >>> This is **Version 0.8.0**. All unit tests pass. All known issues >>>> >>> have been fixed. This could effectively be version 1.0; waiting on >>>> >>> user feedback. >>>> >>> >>>> >>> -- Linas >>>> >>> >>>> >>> -- >>>> >>> Verbogeny is one of the pleasurettes of a creatific thinkerizer. >>>> >>> --Peter da Silva >>>> >>> >>>> >>> -- >>>> >>> You received this message because you are subscribed to the Google >>>> Groups "opencog" group. >>>> >>> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/opencog/CAHrUA37Agw0cg5gJX1fDffvSAjcW1kq4LdMOSuyknaEC_41F1g%40mail.gmail.com >>>> . >>>> >> >>>> >> -- >>>> >> You received this message because you are subscribed to the Google >>>> Groups "opencog" group. >>>> >> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> >> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/opencog/CACYTDBcnROxkUgppev8cW2LuAbzqvjWXxrrWZvCgvQv3g9Q3eg%40mail.gmail.com >>>> . >>>> > >>>> > >>>> > >>>> > -- >>>> > Verbogeny is one of the pleasurettes of a creatific thinkerizer. >>>> > --Peter da Silva >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> Groups "opencog" group. >>>> > To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [email protected]. >>>> > To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/opencog/CAHrUA36cTFc-S7C%3D0SqQgfAGZ1bpVupihVfOs0g6hpD14UtSxw%40mail.gmail.com >>>> . >>>> >>>> >>>> >>>> -- >>>> Ben Goertzel, PhD >>>> http://goertzel.org >>>> >>>> “The only people for me are the mad ones, the ones who are mad to >>>> live, mad to talk, mad to be saved, desirous of everything at the same >>>> time, the ones who never yawn or say a commonplace thing, but burn, >>>> burn, burn like fabulous yellow roman candles exploding like spiders >>>> across the stars.” -- Jack Kerouac >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "opencog" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/opencog/CACYTDBfmiyKhHusd2ThoD6dAYBDdyL73CB%3DJe6w0-aX7WbX_Uw%40mail.gmail.com >>>> . >>>> >>> >>> >>> -- >>> Verbogeny is one of the pleasurettes of a creatific thinkerizer. >>> --Peter da Silva >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "opencog" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/opencog/CAHrUA3554pK1ktwPmU2rzNAvNUC7U%3DAYV6StqEEkjMPofERkiw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/opencog/CAHrUA3554pK1ktwPmU2rzNAvNUC7U%3DAYV6StqEEkjMPofERkiw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "opencog" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/opencog/CAPE4pjAhha5RGHTqKxzvpwf8_%3D7TMue2FcEASP0tECMbCjkohQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/opencog/CAPE4pjAhha5RGHTqKxzvpwf8_%3D7TMue2FcEASP0tECMbCjkohQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Verbogeny is one of the pleasurettes of a creatific thinkerizer. > --Peter da Silva > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA34H0Xmq08N6nYe2xwcz8QkPigCLm1SnBNy1%3D80eed-fhQ%40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CAHrUA34H0Xmq08N6nYe2xwcz8QkPigCLm1SnBNy1%3D80eed-fhQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAPE4pjB6q3YF7%2Bc-X6S9reZVsuur4%3D%2BM64t1qe5CPz3HSQ7pqg%40mail.gmail.com.
