Hi! I am not an memory expert. However, I think that a zero-only page is handled specially by the MMU (it actually does not use physical memory). This is the reason why a malloc for a huge amount of memory is typically successful even if there is not that much physical memory available. With malloc and a memset to zero only this will typically not lead to a physical RAM usage (I thinks this is the "copy-on-write" (COW) stuff) Thus, I recommend to do a memset with a non-zero value after allocating the memory.
memset(buf1,123,msgsize); memset(buf2,123,msgsize); This should lead to a fair comparison. Regards Mathias > > Some interesting insights about my last tests. > > > > 1.) The culprit is mlockall(MCL_FUTURE|MCL_CURRENT); > > > > As soon I leave this away, I get much better results: > > > > Without mlockall(): > > Test (10) memcpy of sizes (10485760) > > 10 memcpy. Time per memcpy: 78147209 [nsec] (134 MB/sec) > > finished. > > > > With mlockall(): > > Test (10) memcpy of sizes (10485760) .... > > 10 memcpy. Time per memcpy: 124194618 [nsec] (84 MB/sec) > > finished. > > > I think you are not measuring the same thing in both case. > I did some test on 2.6.20 (precompiled debian etch kernel) > on a 1.6 GHz Pentium M. > > I think the fact that you malloced your buffer and then > immediatly memcpy the buffers does a non repeatable measure > (at least on my side) > depending on something I do not understand . > > Could you try my modified version of your code which > adds: > > memset(buf1,'\0',msgsize); > memset(buf2,'\0',msgsize); > > just after malloc (you may try calloc too). > > With this modification > I get similar figure for the mlockall version on my (quasi)-vanilla kernel. > > that is: > > ./memcpy_perf_mlockall > Test (10) memcpy of sizes (10485760) .... > 10 memcpy. Time per memcpy: 35716568 [nsec] (293 MB/sec) > finished. > > ./memcpy_perf_memset > Test (10) memcpy of sizes (10485760) .... > 10 memcpy. Time per memcpy: 36004454 [nsec] (291 MB/sec) > finished. > > ./memcpy_perf > Test (10) memcpy of sizes (10485760) .... > 10 memcpy. Time per memcpy: 23881352 [nsec] (439 MB/sec) > finished. > > > I think that without mlockall or no memset the memory pages you > requested with malloc and did not --really-- get are brought to > physical memory only when memcpy comes. > > What puzzles me is WHY it is faster WITHOUT touching the page > BEFORE memcpy??? > > Any memory handling expert is welcomed to answer. > > > Then again I cannot use Xenomai without mlockall() > > :( > > And you cannot design a realtime application without > ensuring you really have the memory you requested, > this is not a xenomai issue (my opinion though). > > PS: on line compilation used: > > gcc memcpy_perf-erk.c -o memcpy_perf -lrt > gcc -DMLOCK memcpy_perf-erk.c -o memcpy_perf_mlockall -lrt > gcc -DMEMSET memcpy_perf-erk.c -o memcpy_perf_memset -lrt > -- Mathias Koehrer [EMAIL PROTECTED] Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer, nur 39,85 inkl. DSL- und ISDN-Grundgebühr! http://www.arcor.de/rd/emf-dsl-2 _______________________________________________ Xenomai-help mailing list [email protected] https://mail.gna.org/listinfo/xenomai-help
