I measured the impact of yv12_to_yuy2_il_mmx2() by running
OProfile for 30 seconds and playing a certain recording.
My usleep() patch was applied. To disable yv12_to_yuy2_il_mmx2(),
I replaced the condition
if (pixelformat == DSPF_I420 || pixelformat == DSPF_YV12)
in cDFBVideoOut::YUV() with
if (1)
so that fast_memcpy() would be used instead. Here are the highlights:
CPU: PIII, speed 906.765 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit
mask of 0x00 (No unit mask) count 90000
samples % symbol name
53030 32.7959 yv12_to_yuy2_il_mmx2_line(...)
13263 8.2024 mpeg_decode_mb
10962 6.7793 ff_mpa_synth_filter
9032 5.5858 put_pixels16_xy2_mmx
8593 5.3143 put_pixels8_mmx
8041 4.9729 ff_simple_idct_add_mmx
7233 4.4732 put_pixels16_mmx
3945 2.4397 MPV_decode_mb
3606 2.2301 MPV_motion
CPU: PIII, speed 906.765 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit
mask of 0x00 (No unit mask) count 90000
samples % symbol name
28314 20.3541 fast_memcpy(void*, void const*, unsigned int)
14103 10.1382 mpeg_decode_mb
11570 8.3173 ff_mpa_synth_filter
9119 6.5554 put_pixels16_xy2_mmx
8567 6.1586 put_pixels8_mmx
7992 5.7452 ff_simple_idct_add_mmx
7796 5.6043 put_pixels16_mmx
4008 2.8812 MPV_decode_mb
3903 2.8058 MPV_motion
Above, you can see that fast_memcpy() results in 28314 samples
while yv12_to_yuy2_il_mmx2_line() results in 53030 samples,
almost twice the execution time of fast_memcpy(). The sample
counts of the other top functions are comparable. The variance
is less than 10%.
I consider this a good result. Furthermore, given that the CPU
is 50% idle according to "top", reducing 10% of the CPU usage of
softdevice would reduce the total CPU usage by only 5%.
Marko
_______________________________________________
Softdevice-devel mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/softdevice-devel