Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Predrag Radović Sun, 16 Aug 2020 15:24:58 -0700

Hi Linas,

On Fri, Aug 07, 2020 at 01:30:14PM -0500, Linas Vepstas wrote:
> >
> > To get scientific about it, you'd want to create a heat-map -- load up
> > some large datasets, say, some of the genomics datasets, run one of their
> > standard work-loads as a bench-mark, and then see which pages are hit the
> > most often. I mean -- what is the actual working-set size of the genomics
> > processing? No one knows -- we know that during graph traversal, memory is
> > hit "randomly" .. but what is the distribution? It's surely not uniform.
> > Maybe 90% of the work is done on 10% of the pages? (Maybe it's Zipfian?
> > I'd love to see those charts...)
> 
> The "query-loop" is a subset/sample from one of the agi-bio datasets. It's
> a good one to experiment with, since it will never change, so you can
> compare before-and-after results.  The agi-bio datasets change all the
> time, as they add and remove new features, new data sources, etc. They're
> bigger, but not stable.

VM page heat-map for query-loop benchmark is here:

https://github.com/crackleware/opencog-experiments/tree/c0cc508dc5757635ce6c069b20f8ae13ccf8ef8a/mmapped-atomspace

Everything is getting dirty during loading. There is a "hot" subset of pages
being referenced during processing stage. Total size of referenced pages in
processing stage is around ~150MB of 1.6GB (total allocation). Heat-map is very
crude because it groups pages in linear order which is probably bad
grouping. I may experiment with page grouping to get more informative graphs
(could be useful chunking research).

I also did several experimental runs where I used swap-space on NFS and NBD
(network block device). 2 cores, 1GB RAM, 2GB swap. Performance was not very
good (~10%). CPU is too fast for this amount of memory. :-)

Intermittent peaks are probably garbage collections.

All in all, I expect much better performance with very concurrent workloads,
hundreds of threads. When a processing thread hits a page which is not yet in
physical RAM it blocks. Request for that page from storage is queued. Other
threads continue to work and after some time they will block too waiting for
some of their pages to load. Storage layer will collect multiple requests and
deliver data in batches, introducing latency. That's why when they benchmark
SSDs there are graphs for various queue depths. Deeper queue, better throughput.

Query-loop benchmark is single-threaded. I would like to run more concurrent
workload with bigger datasets. Any suggestions?

--pedja

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/20200816222406.GA1557615%40intelnuc.localdomain.

Re: Decentralized building blocks [was Re: [opencog-dev] Distributed Atomspace

Reply via email to