Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Wed, 29 Jul 2020 17:50:24 -0700

On Wed, Jul 29, 2020 at 6:45 PM Matt Chapman <[email protected]> wrote:

>
> If you think this is what I'm saying by describing Cassandra's
>

Sorry, it was not meant to be a jab at you ... over the last decade,
something like a dozen different databases have been proposed, each with
different reasons for using them. As I recall -- "nosql databases" -- BASE
not ACID -- so we tried memdb (couchdb(?) was recommended). The bitter
lesson was that it was optimized for 100MByte mp3's and 1MByte gifs and had
a throughput of about 100 atoms/second. The memdb developers couldn't care
less - "what kind of moron stores 12 bytes in a database?" was the general
reaction.

Then there was the neo4j work. The lesson there was that 95% of CPU was
spent converting atoms into ZeroMQ packets (using google protocol buffers,
if I recall) and RESTful API's written in python using python decorators
... lord knows how much CPU in neo4j itself unpacking the packets. Again,
I think this was also about 100 Atoms/second ... This is when the idea of
chunks and chunking started getting discussed, since obviously things could
run faster if we could ship thousands of atoms over at a time. Or maybe if
we could get neo4j to do the pattern matching, and ship back only the
results. How do you send a pattern-matcher query to neo4j?

By comparison, the current ASCII-file-reader for reading Atoms in
s-expression format does about 100K atoms/second (that's on my machine ...
I'm told that the latest Apple laptops are maybe 5x faster?...) I actually
measured: about 45% of CPU time is spent doing string-compares and
string-copying and find-first-character-in-string and 55% of the cpu time
was in the atomspace, actually adding Atoms. Or maybe it was 55/45 the
other way around. I forget.

I do have extensive notes on atomspace performance in
https://github.com/opencog/benchmark/ - on my machine, raw atomspace is
700K nodes/sec and 200K links/sec so maybe a million/sec on something
modern. Running at 100 atoms/sec through some RESTful/zero-mq/whatever
interface is embarrassing.

I'm writing in this flippant style because I'm trying to make it fun to
read my emails. There's a serious lesson here: converting things that are
12 bytes long into other things has just a huge overhead. I'm not sure how
c++ std::string is implemented -- how many cpu cycles it takes to compare a
byte, add one and go to the next byte ... but if you do anything much more
complicated than that, you pay a performance penalty. This is where the
performance bar is set. It's hard to figure out how to jump over that bar.
Or even get near it.

-- Linas

--
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
--Peter da Silva

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CAHrUA35WRUonm82pMLDXqgqS7oV339o7KjTDQg4o_gWQJnE7Bw%40mail.gmail.com.

Re: [opencog-dev] Distributed Atomspace

Reply via email to