2007/5/15, Daniel Schnell <[EMAIL PROTECTED]>:
Hi,

Some interesting insights about my last tests.

1.) The culprit is mlockall(MCL_FUTURE|MCL_CURRENT);

As soon I leave this away, I get much better results:

Without mlockall():
Test (10) memcpy of sizes (10485760)
10 memcpy. Time per memcpy: 78147209 [nsec] (134 MB/sec)
 finished.

With mlockall():
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 124194618 [nsec] (84 MB/sec)
 finished.


I think you are not measuring the same thing in both case.
I did some test on 2.6.20 (precompiled debian etch kernel)
on a 1.6 GHz Pentium M.

I think the fact that you malloced your buffer and then
immediatly memcpy the buffers does a non repeatable measure
(at least on my side)
depending on something I do not understand .

Could you try my modified version of your code which
adds:

memset(buf1,'\0',msgsize);
memset(buf2,'\0',msgsize);

just after malloc (you may try calloc too).

With this modification
I get similar figure for the mlockall version on my (quasi)-vanilla kernel.

that is:

./memcpy_perf_mlockall
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 35716568 [nsec] (293 MB/sec)
finished.

./memcpy_perf_memset
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 36004454 [nsec] (291 MB/sec)
finished.

./memcpy_perf
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 23881352 [nsec] (439 MB/sec)
finished.


I think that without mlockall or no memset the memory pages you
requested with malloc and did not --really-- get are brought to
physical memory only when memcpy comes.

What puzzles me is WHY it is faster WITHOUT touching the page
BEFORE memcpy???

Any memory handling expert is welcomed to answer.

Then again I cannot use Xenomai without mlockall()
:(

And you cannot design a realtime application without
ensuring you really have the memory you requested,
this is not a xenomai issue (my opinion though).

PS: on line compilation used:

gcc memcpy_perf-erk.c -o memcpy_perf -lrt
gcc -DMLOCK memcpy_perf-erk.c -o memcpy_perf_mlockall -lrt
gcc -DMEMSET memcpy_perf-erk.c -o memcpy_perf_memset -lrt

--
Erk
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
#include <string.h>
#include <sys/mman.h>

int osa_now_timespec(struct timespec* t)
{
	return clock_gettime (CLOCK_REALTIME, t);
}


int osa_timediff(const struct timespec* t1, const struct timespec* t2, struct timespec* diff)
{
	if (t1!=NULL && t2!=NULL && diff!=NULL)
	{
		unsigned long long a_nsec, b_nsec, diff_nsec; 

		// calculate difference time
		a_nsec = t1->tv_sec*1000000000ULL + t1->tv_nsec;
		b_nsec = t2->tv_sec*1000000000ULL + t2->tv_nsec;
		diff_nsec = b_nsec - a_nsec;
		diff->tv_sec =  diff_nsec/1000000000ULL;
		diff->tv_nsec = diff_nsec%1000000000ULL;
		return 0;
	}
	return -1;
}


unsigned long long osa_to_ns(const struct timespec* t)
{
	 return (t->tv_sec*1000000000ULL + t->tv_nsec);
}


int testMemcpy(unsigned long num, size_t msgsize)
{
	printf("Test (%ld) memcpy of sizes (%ld) ....\n",
            num, msgsize);
    unsigned long i;
    unsigned long long nstime;
    struct timespec t1, t2, t3;

    char *buf1=malloc (msgsize);
    char *buf2=malloc (msgsize);
#ifdef MEMSET
    memset(buf1,'\0',msgsize);
    memset(buf2,'\0',msgsize);
#endif
    // measure
	osa_now_timespec(&t1);
    for (i=0; i<num; i++)
        {
        memcpy (buf1, buf2, msgsize);
        }
    // measure
	osa_now_timespec(&t2);
	osa_timediff(&t1, &t2, &t3);
    
    free (buf2);
    free (buf1);
    
    nstime = osa_to_ns(&t3)/(unsigned long long) i;
	printf("%ld memcpy. Time per memcpy: %llu [nsec] (%llu MB/sec)\n",
            i, nstime,
            (1000ULL * (unsigned long long) msgsize)/(nstime)) ;
    fflush (stdout);


	printf(" finished.\n");
    fflush (stdout);
}


int main()
{
#ifdef MLOCK
    mlockall(MCL_FUTURE|MCL_CURRENT);
#endif
    testMemcpy(10000,      1*1024);
    testMemcpy(10000,      2*1024);
    testMemcpy(10000,      4*1024);
    testMemcpy(10000,      8*1024);
    testMemcpy(10000,     16*1024);
    testMemcpy(1000 ,     32*1024);
    testMemcpy(1000 ,     50*1024);
    testMemcpy(1000 ,    100*1024);
    testMemcpy( 100 ,   1024*1024);
    testMemcpy(  10 ,10*1024*1024);
    testMemcpy(   5 ,50*1024*1024);

    return 0;
}
_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to