Looking around in the code, I notice that the implementation of 
ddi_swap32() is fairly inefficient.  In particular, it uses macros to 
swap the bytes around using shifting operations.

Conversely, htonl on i386 systems is implemented using the natural x86 
bswap instruction.  (This is true for both the kernel and libc.)

A little bit of testing shows *on my system* that the bswap 
implementation is considerably faster.  A short loop of tests runs 
almost twice as fast using htonl() compared to ddi_swap32().

On UltraSPARC systems, there are also some UltraSPARC-specific 
extensions which add little-endian direct access, potentially saving a 
lot of time.  (I'm thinking of code that does PIOs into little endian 
PCI devices, for example.  Endian swap of data like audio data is 
another good example.)  It appears that the little endian ASIs are not 
available on generic V9, but only the UltraSPARC variants.  (Not sure 
whether Niagra family CPUs have them or not.)

What do folks think about replacing ddi_swap16/32 with 
processor-specific variants that could do a much more efficient CPU 
instruction?

    - Garrett
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to