On Fri, Aug 7, 2020 at 12:46 PM Predrag Radovic <[email protected]>
wrote:

> Hi Linas,
>
> On Aug 4, 2020, at 23:51, Linas Vepstas <[email protected]> wrote:
>
>>
>> We could use memory-mapped files on SSD.
>>
>
> Ohhh! I like that! This is actually a very interesting idea! And with the
> appropriate programmer-fu, this should not be hard to proof-of-concept, I
> think ... so I'm guessing, sometime before the AtomSpace starts up, replace
> the memory allocator by something that is allocating out of the mapped
> memory (I think I've seen libraries out there that simplify this).
>
>
> I'm glad that you like the idea. I want to try to make the poc!
>
> We may even use sparse memory-mapped files representing much much bigger
> virtual address space than physical RAM as well as SSD storage. All nodes
> have the same file as previously described. Memory management functionality
> will deallocate unused blocks to maintain the file sparse enough, keeping
> actual storage usage limited. I hope this could simplify handling of Atom's
> identity.
>

Have you ever done anything like this before? Because some of what you
wrote does not seem right; the kernel does page-faulting and page flush as
needed. There's no "deallocation", there is only unmapping.  Mem usage may
be fragmented, but it's never going to be sparse, unless the mem allocator
is broken.


> To get scientific about it, you'd want to create a heat-map -- load up
> some large datasets, say, some of the genomics datasets, run one of their
> standard work-loads as a bench-mark, and then see which pages are hit the
> most often. I mean -- what is the actual working-set size of the genomics
> processing? No one knows -- we know that during graph traversal, memory is
> hit "randomly" .. but what is the distribution? It's surely not uniform.
> Maybe 90% of the work is done on 10% of the pages? (Maybe it's Zipfian?
> I'd love to see those charts...)
>
>
> I would like to see the heat-map for realistic dataset too. That's a next
> step.
>
> Are you referring to this genomics dataset benchmark:
> https://github.com/opencog/benchmark/tree/master/query-loop or there is
> some bigger and better benchmark and dataset for this kind of experiments.
> What about https://github.com/opencog/agi-bio ?
>

The "query-loop" is a subset/sample from one of the agi-bio datasets. It's
a good one to experiment with, since it will never change, so you can
compare before-and-after results.  The agi-bio datasets change all the
time, as they add and remove new features, new data sources, etc. They're
bigger, but not stable.

--linas

-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35hiXj6asE9ktXJbhLao5YgCoyLP3J19YsK-d5y9GtiLQ%40mail.gmail.com.

Reply via email to