On Wed, Jul 29, 2020 at 6:45 PM Matt Chapman <[email protected]> wrote:
> > If you think this is what I'm saying by describing Cassandra's > Sorry, it was not meant to be a jab at you ... over the last decade, something like a dozen different databases have been proposed, each with different reasons for using them. As I recall -- "nosql databases" -- BASE not ACID -- so we tried memdb (couchdb(?) was recommended). The bitter lesson was that it was optimized for 100MByte mp3's and 1MByte gifs and had a throughput of about 100 atoms/second. The memdb developers couldn't care less - "what kind of moron stores 12 bytes in a database?" was the general reaction. Then there was the neo4j work. The lesson there was that 95% of CPU was spent converting atoms into ZeroMQ packets (using google protocol buffers, if I recall) and RESTful API's written in python using python decorators ... lord knows how much CPU in neo4j itself unpacking the packets. Again, I think this was also about 100 Atoms/second ... This is when the idea of chunks and chunking started getting discussed, since obviously things could run faster if we could ship thousands of atoms over at a time. Or maybe if we could get neo4j to do the pattern matching, and ship back only the results. How do you send a pattern-matcher query to neo4j? By comparison, the current ASCII-file-reader for reading Atoms in s-expression format does about 100K atoms/second (that's on my machine ... I'm told that the latest Apple laptops are maybe 5x faster?...) I actually measured: about 45% of CPU time is spent doing string-compares and string-copying and find-first-character-in-string and 55% of the cpu time was in the atomspace, actually adding Atoms. Or maybe it was 55/45 the other way around. I forget. I do have extensive notes on atomspace performance in https://github.com/opencog/benchmark/ - on my machine, raw atomspace is 700K nodes/sec and 200K links/sec so maybe a million/sec on something modern. Running at 100 atoms/sec through some RESTful/zero-mq/whatever interface is embarrassing. I'm writing in this flippant style because I'm trying to make it fun to read my emails. There's a serious lesson here: converting things that are 12 bytes long into other things has just a huge overhead. I'm not sure how c++ std::string is implemented -- how many cpu cycles it takes to compare a byte, add one and go to the next byte ... but if you do anything much more complicated than that, you pay a performance penalty. This is where the performance bar is set. It's hard to figure out how to jump over that bar. Or even get near it. -- Linas -- Verbogeny is one of the pleasurettes of a creatific thinkerizer. --Peter da Silva -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35WRUonm82pMLDXqgqS7oV339o7KjTDQg4o_gWQJnE7Bw%40mail.gmail.com.
