Hi Pedja, I'm going to reply in small bites, to keep messages short. (actually, I'm just waiting for compiles to finish...)
On Tue, Aug 4, 2020 at 3:55 PM Predrag Radovic <[email protected]> wrote: > > We could use memory-mapped files on SSD. > Ohhh! I like that! This is actually a very interesting idea! And with the appropriate programmer-fu, this should not be hard to proof-of-concept, I think ... so I'm guessing, sometime before the AtomSpace starts up, replace the memory allocator by something that is allocating out of the mapped memory (I think I've seen libraries out there that simplify this). To get scientific about it, you'd want to create a heat-map -- load up some large datasets, say, some of the genomics datasets, run one of their standard work-loads as a bench-mark, and then see which pages are hit the most often. I mean -- what is the actual working-set size of the genomics processing? No one knows -- we know that during graph traversal, memory is hit "randomly" .. but what is the distribution? It's surely not uniform. Maybe 90% of the work is done on 10% of the pages? (Maybe it's Zipfian? I'd love to see those charts...) A lot of this is all about "how can one get the most function for the least amount of programming effort?" Dinking around with a memory-mapped atomspace is surely worth a few weeks investment in effort! (Cause -- let's look at the alternative ... the "classic" opencog design would be to write a "forgetting agent" that scans the atomspace and deletes unused Atoms (maybe after saving them to disk, first). It's not technically hard to write such an agent, but still, it's work. Oh, but for it to work, we have to attach either a time-stamp to each atom, or an "attention-value" (or other data of your choosing) -- this is easy -- it's exactly what the value subsystem is for, but each of these timestamps eats up ... more RAM...! And then creating a heat-map is icky, because we'd have to increment some counter on each atom every time it was accessed... yuck. Indeed, the OS is much better at this kind of stuff. Flip side ... if we let the OS do this work, then maybe there is just one Atom on any given 4K page (or 64K page, or 2M page) that is interesting, and everything else on there is a waste of space. The granular attention-values/timestamps mean that we can isolate this one "hot" Atom from all the cold ones. But none of this is known .. we don't know the heat map. On the third hand, there's probably a need for a forgetting agent for other reasons... so one has to be created anyway (well there already is one, but is quite unusable, that code should be discarded so that it stops wasting human attention-span.) Rest of your email in other replies. --linas -- Verbogeny is one of the pleasurettes of a creatific thinkerizer. --Peter da Silva -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36CZwQqvzekhDn2CJZgduq7kQTqpusrVHb3UprUmWfByA%40mail.gmail.com.
