Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Fri, 07 Aug 2020 13:45:53 -0700

Sorry Matt, I was very snippy. Making things work takes a lot of time and
effort. It's frustrating when that effort isn't recognized.


Ideas are cheap -- there's an unbounded supply of ideas, and one can pump
them out rapidly -- new idea every 5 minutes. Converting an idea into
reality takes 1000x or 10000x longer. This is a stereotype in software: the
programmer who says "that's easy, I can do that in no time" and weeks later
they're still working on it. But it's also true of reality in general: the
ideas behind #BLM are not particularly sophisticated or complex, but it
will take 100 million man-years to turn them into reality. (and that's a
lower-bound)

--linas

On Thu, Aug 6, 2020 at 11:30 AM Matt Chapman <[email protected]> wrote:

> You misunderstood my first comment; I was agreeing with you that
> Cassandra-backed storage & distribution won't be faster than what I thought
> you were suggesting: a client-server model where one Rocks-backed atom
> server is used by many clients who retrieve, manipulate, and return atoms
> to the central server.
>
> Maybe you're suggesting something very different, or I'm just very
> confused, because I've been hearing people talk about the need for
> distributed atomspace on and off for 8+ years, and I've never seen an
> answer along the lines of "you can already have a cluster, here's the
> documentation on how to set it up." If that was the answer, and people
> rejected it because of the lack of disk-back persistence, the I'm agreeing
> with you that RocksDB may solve much of the problem.
>
> Maybe the unsolved part of the problem is consistency/consensus? I tend to
> agree with your sentiment that consistency is overrated for Atomspace use
> cases, often not needed or not desirable, but it seems like maybe Ben and
> others are seeking something like Tunable Consistency. Maybe this is the
> big chocolate sprinkle?
>
> >Who is "we"?
>
> Practitioners of the computing arts & sciences, in general.
>
> > We've had the ability to run distributed AtomSpaces that far exceed
> installed RAM, running on a cluster, for more than a decade.
>
> Does it meet the 7 business requirements in Ben's document:
> https://docs.google.com/document/d/1n0xM5d3C_Va4ti9A6sgqK_RV6zXi_xFqZ2ppQ5koqco/edit
>  ?
>
> Points 2 & 3 are about performance improvements; Do you believe such
> improvements are impossible, or would require more effort than the likely
> benefits would justify?
>
> Of the other 5, which already exist and which are what you call "chocolate
> sprinkles"?
>
> All the Best,
>
> Matt
>
> --
> Please interpret brevity as me valuing your time, and not as any negative
> intention.
>
>
> On Wed, Aug 5, 2020 at 11:32 PM Linas Vepstas <[email protected]>
> wrote:
>
>>
>>
>> On Wed, Aug 5, 2020 at 11:59 PM Matt Chapman <[email protected]>
>> wrote:
>>
>>> > I'll bet you a bottle of good wine or other neuroactive substance that
>>> the existing atomspace client-server infrastructure is faster than
>>> Cassandra.
>>>
>>> No, it won't be faster,
>>>
>>
>> wrong.
>>
>> but you'll never be able to store an atomspace bigger than what you can
>>> fit in memory
>>>
>>
>> That is also wrong.
>>
>> on that single atomserver, and you'll never be able to perform more
>>> operations (on the canonical atomspace) in parallel than what that one atom
>>> server can support.
>>>
>>
>> That has been wrong for 12+ years.
>>
>> Obviously distributed systems have a performance penalty.
>>>
>>
>> We've had a distributed atomspace for 12+ years.
>>
>> We don't build them because we need to go faster (at the level of a
>>> single process), we build them because we need to go bigger (in terms of
>>> storage space or parallel processes).
>>>
>>
>> Who is "we"?
>>
>> We've had the ability to run distributed AtomSpaces that far exceed
>> installed RAM, running on a cluster, for more than a decade.  People talk
>> about this as if it doesn't exist or it doesn't work or there's something
>> wrong with it, or they want something with more chocolate sprinkles on it.
>>
>> I'm annoyed.  Seriously, is no one actually paying attention to anything?
>> WTF.
>>
>> --linas
>>
>>
>>> All the Best,
>>>
>>> Matt
>>>
>>> --
>>> Please interpret brevity as me valuing your time, and not as any
>>> negative intention.
>>>
>>>
>>> On Wed, Aug 5, 2020 at 12:16 AM Linas Vepstas <[email protected]>
>>> wrote:
>>>
>>>> LevelDB/RocksDB and Cassandra are apples and kumquats.
>>>>
>>>> LevelDB/RocksDB are C++ libraries, single-user, non-networked,
>>>> non-distributed, link directly into the app, store data directly in files,
>>>> on the local system. They are "embedded databases". So conceptually, they
>>>> are like the 50-year old unix dbm, except that they have 50 years of
>>>> computer science behind them, such as bloom filters and log-structured
>>>> merge trees and what-not (e.g. Rocks is explicitly optimized for SSD
>>>> disks.).  LevelDB was created by google in 2011. Then facebook took levelDB
>>>> in 2013 and forked it to create rocksdb, and added a bunch of stuff, made
>>>> some parts run faster.
>>>>
>>>> Just like dbm, people use leveldb/rocksdb to build *other* databases on
>>>> top of it.  (that's the beauty of "embedded") For example, there's some
>>>> version of MariaDB that uses RocksDB for the actual storage.
>>>>
>>>> Cassandra is written in java, its a network database, so basically, its
>>>> like postgres, except its not postgres, because it uses CQL instead of SQL
>>>> so its not actually SQL compatible. Otherwise, it has exactly all of the
>>>> exact same issues that any other networked client-server database has,
>>>> including the need for an experienced DB Admin to set it up, run it,
>>>> administer it. (This is an easily forgotten but important detail --vs
>>>> rocksdb just ... writes to a file. No DBAdmin required.)
>>>>
>>>> For the app developer (i.e. me) one must to write in a custom query
>>>> language -- CQL, convert my data into CQL format, send that data via tcpip
>>>> to the server, which  unpacks it, then runs it's interpreter to figure out
>>>> what I said/wanted, unpacks my data packets, converts them into it's own
>>>> internal format, (so, that's a second format conversion) and actually
>>>> performs whatever operations I had specified.  This is conceptually
>>>> identical to *any* client-server database.
>>>>
>>>> For CQL I copy from wikipedia:
>>>>
>>>> CREATE KEYSPACE MyKeySpace
>>>>   WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 
>>>> 3 };
>>>> USE MyKeySpace;
>>>> CREATE COLUMNFAMILY MyColumns (id text, Last text, First text, PRIMARY 
>>>> KEY(id));
>>>> INSERT INTO MyColumns (id, Last, First) VALUES ('1', 'Doe', 'John');
>>>> SELECT * FROM MyColumns;
>>>>
>>>> Looks identical to SQL except its not actually compatible. Yuck.  This
>>>> offers exactly zero advantages of SQL that I can see; the fact that its
>>>> key-value somewhere in there offers no perceivable advantage that I can
>>>> make out.
>>>>
>>>> I'll bet you a bottle of good wine or other neuroactive substance that
>>>> the existing atomspace client-server infrastructure is faster than
>>>> Cassandra. That is --  start a cogserver, as is, today, open the rocksdb
>>>> backend under it (so everything going to the cogserver gets stored), and
>>>> then let other atomspaces connect to the cogserver (using the existing
>>>> client-server code) that you will have a distributed atomspace that runs
>>>> faster than cassandra.
>>>>
>>>> OK, it doesn't have any of those other bells-n-whistles in cassandra,
>>>> but no one really knows how to do anything useful with those other
>>>> bells-n-whistles, other than to suggest that they might be somehow useful
>>>> in some way, maybe, for something.  That surpasses my attention span.
>>>>
>>>> --linas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 4, 2020 at 11:39 PM Ben Goertzel <[email protected]> wrote:
>>>>
>>>>> I wonder how different would be the API for RocksDB vs., say,
>>>>> Cassandra which Matt Chapman has recommended (which may have some
>>>>> advantages in terms of allowing more configurable/flexible notions of
>>>>> consistency?)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 4, 2020 at 4:44 PM Linas Vepstas <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 4, 2020 at 11:51 AM Ben Goertzel <[email protected]>
>>>>> wrote:
>>>>> >>
>>>>> >> Wow!
>>>>> >
>>>>> >
>>>>> > You're welcome.  Querying from the database is now supported. The
>>>>> demo is in
>>>>> >
>>>>> https://github.com/opencog/atomspace-rocks/blob/master/examples/query-storage.scm
>>>>> >
>>>>> > At the moment it works, but I'm rethinking the API.  Do check it
>>>>> out.  Feedback, opinions, suggestions, etc. invited.
>>>>> >
>>>>> > --linas
>>>>> >
>>>>> >>
>>>>> >> On Tue, Aug 4, 2020, 8:45 AM Linas Vepstas <[email protected]>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Thu, Jul 30, 2020 at 11:20 AM Ben Goertzel <[email protected]>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> -- send a Pattern Matcher query to BackingStore
>>>>> >>>> -- sent the Atom-chunk resulting from the query to Atomspace
>>>>> >>>>
>>>>> >>>
>>>>> >>> So,
>>>>> >>>
>>>>> >>> Someone needed to prove me wrong, and who better to do that but
>>>>> me. I took the weekend to implement a file-based backing store, using
>>>>> RocksDB (which itself is a variant on LevelDB).  It's here:
>>>>> https://github.com/opencog/atomspace-rocks
>>>>> >>>
>>>>> >>> -- It works, all of the old persistent store unit tests pass
>>>>> (there are 8 of them)
>>>>> >>> -- its faster than the SQL by factors of 2x to 5x depending on
>>>>> dataset. With tuning, maybe one could do better. (I have no plans to tune,
>>>>> right now)
>>>>> >>>
>>>>> >>> I'm certain I know of a simple/easy way to "send a Pattern Matcher
>>>>> query to BackingStore and send the Atom-chunk resulting from the query to
>>>>> Atomspace" and will implement this afternoon (famous last words...)  BTW,
>>>>> you can *already* do this with the cogserver-based network client (i.e.
>>>>> without sql, just the network only) here:
>>>>> https://github.com/opencog/atomspace-cog/blob/master/examples/remote-query.scm
>>>>> >>>
>>>>> >>> By combining these two backends, I think you can get file-backed
>>>>> storage that is also network-enabled.  Or rather, you have two key 
>>>>> building
>>>>> blocks for exploring both distributed and also decentralized designs.
>>>>> >>>
>>>>> >>> Some background info, from the README:
>>>>> >>>
>>>>> >>> AtomSpace RocksDB Backend
>>>>> >>> =========================
>>>>> >>>
>>>>> >>> Save and restore AtomSpace contents to a RocksDB database. The
>>>>> RocksDB
>>>>> >>> database is a single-user, local-host-only file-backed database.
>>>>> That
>>>>> >>> means that only one AtomSpace can connect to it at any given
>>>>> moment.
>>>>> >>>
>>>>> >>> In ASCII-art:
>>>>> >>>
>>>>> >>> ```
>>>>> >>>  +-------------+
>>>>> >>>  |  AtomSpace  |
>>>>> >>>  |             |
>>>>> >>>  +---- API-----+
>>>>> >>>  |             |
>>>>> >>>  |   RocksDB   |
>>>>> >>>  |    files    |
>>>>> >>>  +-------------+
>>>>> >>> ```
>>>>> >>> RocksDB (see https://rocksdb.org/) is an "embeddable persistent
>>>>> key-value
>>>>> >>> store for fast storage." The goal of layering the AtomSpace on top
>>>>> of it
>>>>> >>> is to provide fast persistent storage for the AtomSpace.  There are
>>>>> >>> several advantages to doing this:
>>>>> >>>
>>>>> >>> * RocksDB is file-based, and so it is straight-forward to make
>>>>> backup
>>>>> >>>   copies of datasets, as well as to share these copies with others.
>>>>> >>> * RocksDB runs locally, and so the overhead of pushing bytes
>>>>> through
>>>>> >>>   the network is eliminated. The remaining
>>>>> inefficiencies/bottlenecks
>>>>> >>>   have to do with converting between the AtomSpace's natural in-RAM
>>>>> >>>   format, and the position-independent format that all databases
>>>>> need.
>>>>> >>>   (Here, we say "position-independent" in that the DB format does
>>>>> not
>>>>> >>>   contain any C/C++ pointers; all references are managed with local
>>>>> >>>   unique ID's.)
>>>>> >>> * RocksDB is a "real" database, and so enables the storage of
>>>>> datasets
>>>>> >>>   that might not otherwise fit into RAM. This back-end does not try
>>>>> >>>   to guess what your working set is; it is up to you to load, work
>>>>> with
>>>>> >>>   and save those Atoms that are important for you. The
>>>>> [examples](examples)
>>>>> >>>   demonstrate exactly how that can be done.
>>>>> >>>
>>>>> >>> This backend, together with the CogServer-based
>>>>> >>> [network AtomSpace](https://github.com/opencog/atomspace-cog)
>>>>> >>> backend provides a building-block out of which more complex
>>>>> >>> distributed and/or decentralized AtomSpaces can be built.
>>>>> >>>
>>>>> >>> Status
>>>>> >>> ------
>>>>> >>> This is **Version 0.8.0**.  All unit tests pass. All known issues
>>>>> >>> have been fixed. This could effectively be version 1.0; waiting on
>>>>> >>> user feedback.
>>>>> >>>
>>>>> >>> -- Linas
>>>>> >>>
>>>>> >>> --
>>>>> >>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>>> >>>         --Peter da Silva
>>>>> >>>
>>>>> >>> --
>>>>> >>> You received this message because you are subscribed to the Google
>>>>> Groups "opencog" group.
>>>>> >>> To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to [email protected].
>>>>> >>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/opencog/CAHrUA37Agw0cg5gJX1fDffvSAjcW1kq4LdMOSuyknaEC_41F1g%40mail.gmail.com
>>>>> .
>>>>> >>
>>>>> >> --
>>>>> >> You received this message because you are subscribed to the Google
>>>>> Groups "opencog" group.
>>>>> >> To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to [email protected].
>>>>> >> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/opencog/CACYTDBcnROxkUgppev8cW2LuAbzqvjWXxrrWZvCgvQv3g9Q3eg%40mail.gmail.com
>>>>> .
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>>> >         --Peter da Silva
>>>>> >
>>>>> > --
>>>>> > You received this message because you are subscribed to the Google
>>>>> Groups "opencog" group.
>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to [email protected].
>>>>> > To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/opencog/CAHrUA36cTFc-S7C%3D0SqQgfAGZ1bpVupihVfOs0g6hpD14UtSxw%40mail.gmail.com
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ben Goertzel, PhD
>>>>> http://goertzel.org
>>>>>
>>>>> “The only people for me are the mad ones, the ones who are mad to
>>>>> live, mad to talk, mad to be saved, desirous of everything at the same
>>>>> time, the ones who never yawn or say a commonplace thing, but burn,
>>>>> burn, burn like fabulous yellow roman candles exploding like spiders
>>>>> across the stars.” -- Jack Kerouac
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "opencog" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/opencog/CACYTDBfmiyKhHusd2ThoD6dAYBDdyL73CB%3DJe6w0-aX7WbX_Uw%40mail.gmail.com
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>>>         --Peter da Silva
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "opencog" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/opencog/CAHrUA3554pK1ktwPmU2rzNAvNUC7U%3DAYV6StqEEkjMPofERkiw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/opencog/CAHrUA3554pK1ktwPmU2rzNAvNUC7U%3DAYV6StqEEkjMPofERkiw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CAPE4pjAhha5RGHTqKxzvpwf8_%3D7TMue2FcEASP0tECMbCjkohQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/opencog/CAPE4pjAhha5RGHTqKxzvpwf8_%3D7TMue2FcEASP0tECMbCjkohQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Verbogeny is one of the pleasurettes of a creatific thinkerizer.
>>         --Peter da Silva
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA34H0Xmq08N6nYe2xwcz8QkPigCLm1SnBNy1%3D80eed-fhQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAHrUA34H0Xmq08N6nYe2xwcz8QkPigCLm1SnBNy1%3D80eed-fhQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAPE4pjB6q3YF7%2Bc-X6S9reZVsuur4%3D%2BM64t1qe5CPz3HSQ7pqg%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAPE4pjB6q3YF7%2Bc-X6S9reZVsuur4%3D%2BM64t1qe5CPz3HSQ7pqg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA375nBmNJfZ9Gb5aPRtEm5E-2ZPC%2BNp3LjhDOHAeC2L8mA%40mail.gmail.com.

Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Reply via email to