Hi!

I am not an memory expert.
However, I think that a zero-only page is handled specially by the MMU
(it actually does not use physical memory).
This is the reason why a malloc for a huge amount of memory is typically 
successful even if there
is not that much physical memory available.
With malloc and a memset to zero only this will typically not lead to a 
physical RAM usage (I thinks this
is the "copy-on-write" (COW) stuff)
Thus, I recommend to do a memset with a non-zero value after allocating the 
memory.

memset(buf1,123,msgsize);  
memset(buf2,123,msgsize);

This should lead to a fair comparison. 

Regards

Mathias
> > Some interesting insights about my last tests.
> >
> > 1.) The culprit is mlockall(MCL_FUTURE|MCL_CURRENT);
> >
> > As soon I leave this away, I get much better results:
> >
> > Without mlockall():
> > Test (10) memcpy of sizes (10485760)
> > 10 memcpy. Time per memcpy: 78147209 [nsec] (134 MB/sec)
> >  finished.
> >
> > With mlockall():
> > Test (10) memcpy of sizes (10485760) ....
> > 10 memcpy. Time per memcpy: 124194618 [nsec] (84 MB/sec)
> >  finished.
> 
> 
> I think you are not measuring the same thing in both case.
> I did some test on 2.6.20 (precompiled debian etch kernel)
> on a 1.6 GHz Pentium M.
> 
> I think the fact that you malloced your buffer and then
> immediatly memcpy the buffers does a non repeatable measure
> (at least on my side)
> depending on something I do not understand .
> 
> Could you try my modified version of your code which
> adds:
> 
> memset(buf1,'\0',msgsize);
> memset(buf2,'\0',msgsize);
> 
> just after malloc (you may try calloc too).
> 
> With this modification
> I get similar figure for the mlockall version on my (quasi)-vanilla kernel.
> 
> that is:
> 
> ./memcpy_perf_mlockall
> Test (10) memcpy of sizes (10485760) ....
> 10 memcpy. Time per memcpy: 35716568 [nsec] (293 MB/sec)
>  finished.
> 
> ./memcpy_perf_memset
> Test (10) memcpy of sizes (10485760) ....
> 10 memcpy. Time per memcpy: 36004454 [nsec] (291 MB/sec)
>  finished.
> 
> ./memcpy_perf
> Test (10) memcpy of sizes (10485760) ....
> 10 memcpy. Time per memcpy: 23881352 [nsec] (439 MB/sec)
>  finished.
> 
> 
> I think that without mlockall or no memset the memory pages you
> requested with malloc and did not --really-- get are brought to
> physical memory only when memcpy comes.
> 
> What puzzles me is WHY it is faster WITHOUT touching the page
> BEFORE memcpy???
> 
> Any memory handling expert is welcomed to answer.
> 
> > Then again I cannot use Xenomai without mlockall()
> > :(
> 
> And you cannot design a realtime application without
> ensuring you really have the memory you requested,
> this is not a xenomai issue (my opinion though).
> 
> PS: on line compilation used:
> 
> gcc memcpy_perf-erk.c -o memcpy_perf -lrt
> gcc -DMLOCK memcpy_perf-erk.c -o memcpy_perf_mlockall -lrt
> gcc -DMEMSET memcpy_perf-erk.c -o memcpy_perf_memset -lrt
> 


-- 
Mathias Koehrer
[EMAIL PROTECTED]


Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur  39,85 €  inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to