Dear all Following are a set of patches I used to improve the performance of urjtag on my ppc system using gpio. Patches are against git head.
The first two are already posted before and repeated here. The patches are fairly standalone, but I have not tested applying them out-of-order. On my ppc system with urjtag over gpio initially I needed about 1 min 58 sec using file descriptors gave an improvement to about 1 min 12 sec then I started profiling using oprofile. That told me strlen was the bad guy eating up 94% of the CPU cycles. I've been peeking at usages of strlen. That resulted in some smaller improvements and some cleanup. However the big CPU user seemed to be strcat (haven't checked the implementation, guess it does an internal strlen to find the end of the string) Rewriting this to a strcpy (exploiting that we already know the length) gave a big gain (on ppc from 1 min 12 sec to 42 sec) On my Intel PC with usbblaster the strcat change resulted in a gain of about 2 seconds: before: real 5.35 user 1.89 sys 0.05 after: real 3.62 user 0.16 sys 0.07 For the ppc using pread removed another half a second resulting in these results real 0m 41.57s user 0m 4.11s sys 0m 33.98s and on ppc with usbblaster: real 0m 4.44s user 0m 0.82s sys 0m 0.04s Seems the gpio user space part takes 3.3 s. This is all programming an Altera EP3C40F484. The tested .svf file is about 2.4 MB and resides in /tmp (ramfs) (so no disk/flash I/O to read it) Last thing I did was running oprofile again. This of course also told me that most of my time was spent in the kernel: BTW: top 5 of oprofile on ppc with jtag: samples % image name app name symbol name 414013 85.4259 no-vmlinux no-vmlinux /no-vmlinux 36959 7.6260 libpthread-2.11.2.so libpthread-2.11.2.so /lib/libpthread-2.11.2.so 21134 4.3607 liburjtag.so.0.0.0 liburjtag.so.0.0.0 gpio_clock 3091 0.6378 libc-2.11.2.so libc-2.11.2.so fgetc 1453 0.2998 liburjtag.so.0.0.0 liburjtag.so.0.0.0 urj_svf_lex As we see kernel is indeed soaking up most of the time. Somehow I didn't manage to let oprofile split out more detailed kernel info, but -d gave the addresses which I could relate to the symbol table. Nothing unexpected there. If someone sees room to improve things be my guest. Frans. Frans Meulenbroeks (7): gpio.c: added fseek gpio.c: used file descriptors instead of file pointers svf_flex.l: improved performance svn_bison.y: avoid evaluating the length of a hex fragment twice svf_bison.y: improve performance when processing hex fragments. svf_flex.l: pass length argument to fix_yylloc gpio.c: replace lseek/read by pread urjtag/src/svf/svf_bison.y | 7 +++-- urjtag/src/svf/svf_flex.l | 53 +++++++++++++++++++----------------- urjtag/src/tap/cable/gpio.c | 61 +++++++++++++++++++------------------------ 3 files changed, 59 insertions(+), 62 deletions(-) ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ UrJTAG-development mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/urjtag-development
