Hi Linas, On Fri, Aug 07, 2020 at 01:30:14PM -0500, Linas Vepstas wrote: > > > > To get scientific about it, you'd want to create a heat-map -- load up > > some large datasets, say, some of the genomics datasets, run one of their > > standard work-loads as a bench-mark, and then see which pages are hit the > > most often. I mean -- what is the actual working-set size of the genomics > > processing? No one knows -- we know that during graph traversal, memory is > > hit "randomly" .. but what is the distribution? It's surely not uniform. > > Maybe 90% of the work is done on 10% of the pages? (Maybe it's Zipfian? > > I'd love to see those charts...) > > The "query-loop" is a subset/sample from one of the agi-bio datasets. It's > a good one to experiment with, since it will never change, so you can > compare before-and-after results. The agi-bio datasets change all the > time, as they add and remove new features, new data sources, etc. They're > bigger, but not stable.
VM page heat-map for query-loop benchmark is here: https://github.com/crackleware/opencog-experiments/tree/c0cc508dc5757635ce6c069b20f8ae13ccf8ef8a/mmapped-atomspace Everything is getting dirty during loading. There is a "hot" subset of pages being referenced during processing stage. Total size of referenced pages in processing stage is around ~150MB of 1.6GB (total allocation). Heat-map is very crude because it groups pages in linear order which is probably bad grouping. I may experiment with page grouping to get more informative graphs (could be useful chunking research). I also did several experimental runs where I used swap-space on NFS and NBD (network block device). 2 cores, 1GB RAM, 2GB swap. Performance was not very good (~10%). CPU is too fast for this amount of memory. :-) Intermittent peaks are probably garbage collections. All in all, I expect much better performance with very concurrent workloads, hundreds of threads. When a processing thread hits a page which is not yet in physical RAM it blocks. Request for that page from storage is queued. Other threads continue to work and after some time they will block too waiting for some of their pages to load. Storage layer will collect multiple requests and deliver data in batches, introducing latency. That's why when they benchmark SSDs there are graphs for various queue depths. Deeper queue, better throughput. Query-loop benchmark is single-threaded. I would like to run more concurrent workload with bigger datasets. Any suggestions? --pedja -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/20200816222406.GA1557615%40intelnuc.localdomain.
