Looking around in the code, I notice that the implementation of ddi_swap32() is fairly inefficient. In particular, it uses macros to swap the bytes around using shifting operations.
Conversely, htonl on i386 systems is implemented using the natural x86 bswap instruction. (This is true for both the kernel and libc.) A little bit of testing shows *on my system* that the bswap implementation is considerably faster. A short loop of tests runs almost twice as fast using htonl() compared to ddi_swap32(). On UltraSPARC systems, there are also some UltraSPARC-specific extensions which add little-endian direct access, potentially saving a lot of time. (I'm thinking of code that does PIOs into little endian PCI devices, for example. Endian swap of data like audio data is another good example.) It appears that the little endian ASIs are not available on generic V9, but only the UltraSPARC variants. (Not sure whether Niagra family CPUs have them or not.) What do folks think about replacing ddi_swap16/32 with processor-specific variants that could do a much more efficient CPU instruction? - Garrett _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code