Hi,
 
 
I am testing the memcpy() performance of Xenomai on my board in
comparision to the memcpy() performance of native linux and I get
significant differences.

Attached find a program which compiles on native linux simply with
(-lrt).
It gives me the following output:

=======
bash-2.05b# ./memcpy_perf
Test (10000) memcpy of sizes (1024) ....
10000 memcpy. Time per memcpy: 1567 [nsec] (653 MB/sec)
 finished.
Test (10000) memcpy of sizes (2048) ....
10000 memcpy. Time per memcpy: 2939 [nsec] (696 MB/sec)
 finished.
Test (10000) memcpy of sizes (4096) ....
10000 memcpy. Time per memcpy: 5706 [nsec] (717 MB/sec)
 finished.
Test (10000) memcpy of sizes (8192) ....
10000 memcpy. Time per memcpy: 17077 [nsec] (479 MB/sec)
 finished.
Test (10000) memcpy of sizes (16384) ....
10000 memcpy. Time per memcpy: 133314 [nsec] (122 MB/sec)
 finished.
Test (1000) memcpy of sizes (32768) ....
1000 memcpy. Time per memcpy: 243417 [nsec] (134 MB/sec)
 finished.
Test (1000) memcpy of sizes (51200) ....
1000 memcpy. Time per memcpy: 403455 [nsec] (126 MB/sec)
 finished.
Test (1000) memcpy of sizes (102400) ....
1000 memcpy. Time per memcpy: 713316 [nsec] (143 MB/sec)
 finished.
Test (100) memcpy of sizes (1048576) ....
100 memcpy. Time per memcpy: 7210570 [nsec] (145 MB/sec)
 finished.
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 78162400 [nsec] (134 MB/sec)
 finished.
Test (5) memcpy of sizes (52428800) ....
5 memcpy. Time per memcpy: 425281800 [nsec] (123 MB/sec)
 finished.

======

Spawning the function testMemcpy() as a POSIX thread inside another
program
yields the following results:

bash-2.05b# bin/testspecs
Test (10000) memcpy of sizes (1024) ....
10000 memcpy. Time per memcpy: 1566 [nsec] (653 MB/sec)
 finished.
Test (10000) memcpy of sizes (2048) ....
10000 memcpy. Time per memcpy: 2943 [nsec] (695 MB/sec)
 finished.
Test (10000) memcpy of sizes (4096) ....
10000 memcpy. Time per memcpy: 5696 [nsec] (719 MB/sec)
 finished.
Test (10000) memcpy of sizes (8192) ....
10000 memcpy. Time per memcpy: 17325 [nsec] (472 MB/sec)
 finished.
Test (10000) memcpy of sizes (16384) ....
10000 memcpy. Time per memcpy: 200892 [nsec] (81 MB/sec)
 finished.
Test (1000) memcpy of sizes (32768) ....
1000 memcpy. Time per memcpy: 400213 [nsec] (81 MB/sec)
 finished.
Test (1000) memcpy of sizes (51200) ....
1000 memcpy. Time per memcpy: 555240 [nsec] (92 MB/sec)
 finished.
Test (1000) memcpy of sizes (102400) ....
1000 memcpy. Time per memcpy: 1253123 [nsec] (81 MB/sec)
 finished.
Test (100) memcpy of sizes (1048576) ....
100 memcpy. Time per memcpy: 12413170 [nsec] (84 MB/sec)
 finished.
Test (10) memcpy of sizes (10485760) ....
10 memcpy. Time per memcpy: 124039572 [nsec] (84 MB/sec)
 finished.
Test (5) memcpy of sizes (52428800) ....
5 memcpy. Time per memcpy: 596899212 [nsec] (87 MB/sec)
 finished.

As long as the memcpy works on the cache line only, the results are
identical. As soon as the real DDR memory is used, performance drops by
66% !

I am assuming because of different linked-in time functions
(clock_gettime())) I am measuring somehow differently. But I am clueless
at the moment where and if the performance is eaten up.

Please can anybody try to reproduce this behaviour on its board ?


Best regards,

Daniel Schnell.

Attachment: memcpy_perf.c
Description: memcpy_perf.c

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to