Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Linas Vepstas Tue, 04 Aug 2020 14:52:07 -0700

Hi Pedja,

I'm going to reply in small bites, to keep messages short. (actually, I'm
just waiting for compiles to finish...)

On Tue, Aug 4, 2020 at 3:55 PM Predrag Radovic <[email protected]> wrote:

>
> We could use memory-mapped files on SSD.
>

Ohhh! I like that! This is actually a very interesting idea! And with the
appropriate programmer-fu, this should not be hard to proof-of-concept, I
think ... so I'm guessing, sometime before the AtomSpace starts up, replace
the memory allocator by something that is allocating out of the mapped
memory (I think I've seen libraries out there that simplify this).

To get scientific about it, you'd want to create a heat-map -- load up some
large datasets, say, some of the genomics datasets, run one of their
standard work-loads as a bench-mark, and then see which pages are hit the
most often. I mean -- what is the actual working-set size of the genomics
processing? No one knows -- we know that during graph traversal, memory is
hit "randomly" .. but what is the distribution? It's surely not uniform.
Maybe 90% of the work is done on 10% of the pages? (Maybe it's Zipfian?
I'd love to see those charts...)

A lot of this is all about "how can one get the most function for the least
amount of programming effort?" Dinking around with a memory-mapped
atomspace is surely worth a few weeks investment in effort!

(Cause -- let's look at the alternative ... the "classic" opencog design
would be to write a "forgetting agent" that scans the atomspace and deletes
unused Atoms (maybe after saving them to disk, first). It's not technically
hard to write such an agent, but still, it's work. Oh, but for it to work,
we have to attach either a time-stamp to each atom, or an "attention-value"
(or other data of your choosing) -- this is easy -- it's exactly what the
value subsystem is for, but each of these timestamps eats up ... more
RAM...! And then creating a heat-map is icky, because we'd have to
increment some counter on each atom every time it was accessed... yuck.
Indeed, the OS is much better at this kind of stuff.

Flip side ... if we let the OS do this work, then maybe there is just one
Atom on any given 4K page (or 64K page, or 2M page) that is interesting,
and everything else on there is a waste of space. The granular
attention-values/timestamps mean that we can isolate this one "hot" Atom
from all the cold ones. But none of this is known .. we don't know the
heat map.

On the third hand, there's probably a need for a forgetting agent for other
reasons... so one has to be created anyway (well there already is one, but
is quite unusable, that code should be discarded so that it stops wasting
human attention-span.)

Rest of your email in other replies.
--linas

--
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
--Peter da Silva

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CAHrUA36CZwQqvzekhDn2CJZgduq7kQTqpusrVHb3UprUmWfByA%40mail.gmail.com.

Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Reply via email to