Hi,

I came up with this test trying to replicate the fragmentation scenario 
Rick Payne was experiencing:

int main() {
   size_t size = 0x4C1000;

   void* mem = malloc(size);
   auto start = std::chrono::high_resolution_clock::now();

   for (int i = 0; i < 100; i++) {
       size = (size * 105) / 100;
       malloc(0x4000);
       printf("%d allocation of %ld bytes\n", i + 1, size);
       mem = realloc(mem, size);
   }

   auto end = std::chrono::high_resolution_clock::now();
   std::chrono::duration<double> elapsed = end - start;
   std::cout << "Elapsed time: " << elapsed.count() << " s\n";
   printf("DONE - last size: %ld!\n", size);

   sleep(1000);
   free(mem);
}

So before the latest patch that implements mapped-based malloc_large() this 
test needs at least 1.6GB of memory to pass - the last realloc() would try 
to allocate ~ 625MB or memory. However, I did not really see the 
fragmentation I expected to see in physical memory so I do not think this 
test is very representative.

The latest version of the code with all patches applied would need a 
minimum of 1.2GB - which is not surprising given that realloc() needs two 
copies of similar size to copy data from one place to another one until it 
frees the smaller one. But the good thing is, it does not need contiguous 
625MB of physical memory to do it anymore.

However, with this experiment, I noticed that new malloc_large is slower - 
this test takes 3 seconds vs 2 seconds before the patch. As I suspected the 
culprit was the fact that mapped_malloc_large calls map_anon with 
mmap_populate that pre-faults entire memory which I thought was necessary 
to do. But it turns out that when I replaced mmap_populate 
with mmap_uninitialized all the unit test and a couple of other apps were 
still working just fine. And the test above would take a similar amount of 
time as before - around 2 seconds. So maybe we should use 
mmap_uninitialized. The only concern would be kernel code running in 
preemption disabled mode and trying to access memory that was allocated 
with mapped_malloc_large(). Could it happen?

Finally, when I ran the same test on Linux host it would complete under 2ms 
(milliseconds) - so 1000 times faster. Why is that? Linux obviously uses 
mremap() to implement reallloc() and OSv copies the data using memcpy(). So 
if we want to make realloc() work faster (it could be also useful to 
speedup ramfs), we should implement remap() - here is an open issue for 
that - https://github.com/cloudius-systems/osv/issues/184 and supposedly 
Pekka sent a patch a while ago which we can build upon.

Waldek 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/54a9bb96-ea86-4c75-8d86-9a0be128fc58%40googlegroups.com.

Reply via email to