Hi! I think a problem with your test program is that you don't wait for the write() thread to finish before you try to read the mmap(). See how locking on a producer-consumer (or reader-writer) relationship is usually implemented (If you don't have it ready, I could send you the algorithms).
Regards, Ulrich >>> Martin Lucina <[email protected]> schrieb am 10.03.2014 um 22:10 in >>> Nachricht <[email protected]>: > [email protected] said: >> Martin Lucina wrote: >> >That still doesn't explain the MIPS issues, any suggestions on how to >> >proceed there? I can give someone access to a MIPS host if that would help. >> >> Copying back to the list: >> >> Martin Lucina wrote: >> > [email protected] said: >> >> It appears that this system also lacks a coherent FS cache, like >> >> some BSDs. I changed mtest.c to use MDB_WRITEMAP and it now runs >> >> fine. >> >> >> >> The unmodified mtest.c also worked when single-stepping thru gdb, >> >> which apparently gives time for the cache to sort itself out between >> >> mdb function calls. >> > >> > Interesting. What you're saying is that without MDB_WRITEMAP pages are >> > written out separately and it is up to the FS cache to ensure that reading >> > back via the memory map is consistent, correct? >> >> That's the general idea. As the LMDB design paper states, LMDB >> requires the OS to use a unified buffer cache - so that mmap pages >> and FS cache pages are the same. >> >> > I'll try and dig through the OpenWRT kernel configuration, they must have >> > changed something that triggers this behaviour. >> >> Frankly it seems unlikely that they could have changed something so >> fundamental to the VM subsystem of the kernel. It's also possible that we're >> seeing *CPU* cache inconsistencies, and that adding a few >> MIPS-specific memory barrier instructions here and there may fix >> things up. > > I did some more investigating: > > 1) Tried adding calls to sync_file_range() (Linux-specific syscall) and > in desperation even sync(2) to mdb_txn_commit() just after mdb_page_flush() > et al. No change. > > 2) Compiled the below test program on various plaforms. This tries (rather > unscientifically) to test how "long" it takes for a mmap to become > consistent after writing to the underlying file through a different fd > opened with O_DSYNC (what mdb does). > > The results are interesting: > > x86_64 core i5m (2 cores, 4 threads): gcc -O2: consistently less than 1k > iterations > x86_64 core i5m (2 cores, 4 threads): gcc -O2 -DNOBARRIER: consistently > around > 10k iterations > x86_64 dual 4-core xeon, gcc -O2: around 2k iterations > x86_64 dual 4-core xeon, gcc -O2 -DNOBARRIER: 10-15k iterations > MIPS target, musl gcc -O2 -mips32r2: varies, mostly 1, in each 10 runs at > least one run completes in the high 100k's of iterations > MIPS target, musl gcc -O2 -mips32r2 -DNOBARRIER: about the same as previous, > but > when not 1 the result is subjectively higher (around 1m iterations) > single CPU SPARCv9 solaris 10, Sun cc -fast -mt: always[*] 1 > single CPU SPARCv9 solaris 10, CSW gcc -O2, with or without -DNOBARRIER: > always[*] 1 > ia64 dual Itanium 2, Linux gcc -O2: around 2k iterations > ia64 dual Itanium 2, Linux gcc -O2 -DNOBARRIER: anwhere between 3-8k > iterations > > [*] very rarely several million iterations > > Does this help in any way? It certainly seems to suggest that the MIPS > target's fs cache is (eventually) consistent. > > Any pointers on how to proceed or what else to try/who else to ask will be > much appreciated. > > Martin > > ----test program---- > #include <fcntl.h> > #include <sys/types.h> > #include <sys/mman.h> > #include <assert.h> > #include <stdio.h> > #include <pthread.h> > #include <unistd.h> > > pthread_barrier_t b; > > static void *thread (void *arg) > { > int fd; > > pthread_barrier_wait (&b); > fd = open ("/tmp/testfile", O_RDWR | O_CREAT | O_DSYNC, 0600); > unsigned long v = 1; > assert (write (fd, &v, sizeof v) == sizeof v); > close (fd); > return NULL; > } > > int main (int argc, char *argv[]) > { > int fd; > pthread_barrier_init (&b, NULL, 2); > > unlink ("/tmp/testfile"); > fd = open ("/tmp/testfile", O_RDWR | O_CREAT, 0600); > unsigned long v = 0; > assert (write (fd, &v, sizeof v) == sizeof v); > volatile unsigned long *p = mmap (NULL, getpagesize (), PROT_READ, > MAP_SHARED, fd, 0); > assert (p != MAP_FAILED); > > int i = 0; > pthread_t thread_id = 0; > pthread_create (&thread_id, NULL, thread, NULL); > > while (*p != 1) { > if (!i) > pthread_barrier_wait (&b); > i++; > #if defined (__GNUC__) && !defined (NOBARRIER) > __sync_synchronize (); > #endif > } > printf ("%d\n", i); > > munmap ((void *)p, getpagesize ()); > close (fd); > return 0; > }
