On Oct 13, 2017, at 2:11 PM, Guy Harris <g...@alum.mit.edu> wrote:

> On Oct 13, 2017, at 1:50 PM, Gerald Combs <ger...@wireshark.org> wrote:
> 
>> Before we migrated away from NMake, epan/Makefile.nmake built the assembly 
>> versions of various routines for x86 (but not x64) defined in 
>> epan/asm_utils_win32_x86.asm. Should we resurrect it in epan/CMakeLists.txt 
>> or get rid of it along with the NASM download in tools/win-setup.ps1?
> 
> Are there any platforms on which the assembler versions are significantly 
> faster than the non-assembler versions?
> 
> If not, I'd say get rid of it.

OK, this all dates back to:

        https://www.wireshark.org/lists/wireshark-dev/200711/msg00303.html

which was about speeding up Wireshark startup.

asm_utils_win32_x86.asm contains:

        wrs_strcmp() (with an apparently-unused alias wrs_strcmp_with_data()), 
which is a 4-bytes-at-a-time unrolled version of strcmp()

        wrs_str_equal(), which is a 4-bytes-at-a-time routine to compare 
routines for equality, without caring whether, if unequal, string A is greater 
than or less than string B (thus a bit simpler than strcmp())

        wrs_check_charset(), which is a 4-bytes-at-a-time routine to check 
whether all characters in an 1-byte-character string are in a given character 
set, with the set represented as a table of 256 bytes with 1 meaning "in the 
set" and 0 meaning "not in the set";

        wrs_str_hash(), which is a 4-bytes-at-a-time string hashing function.

It's in Intel assembler syntax; I don't know how many UN*X assemblers support 
Intel syntax rather than AT&T syntax, so for use on UN*X this might require two 
versions.

For wrs_strcmp(), that seems useful only if Microsoft's own strcmp() isn't fast 
enough.

For wrs_str_equal(), the bulk of the loop is the same as wrs_strcmp(), so, if 
Microsoft's own strcmp() is fast enough, the only advantage of wrs_str_equal() 
would be that you'd spend a little less time per string pair computing a 3-way 
less/greater/equal result and then turning it into 1 for equal and 0 for less 
or greater.

For the others, they're interesting optimizations, but if they were rewritten 
in C, and used on all platforms where you can do unaligned loads and stores (at 
this point, that might mean "anything that's not SPARC"), it might be as fast 
(assuming the compiler generated similar code for extracting the 4 bytes from 
the word) and usable on other platforms.  For extra credit, do it 8 byte at a 
time on ILP64/LLP64 platforms.

So some questions are:

        1) How much do they speed up Wireshark startup on 32-bit x86 on Windows?

        2) How much do they speed up Wireshark startup on 32-bit x86 on various 
UN*Xes (which may mean "translate them to AT&T assembler") - the answers may 
differ on different platforms?

        3) What about x86-64?

        4) For wrs_check_charset() and wrs_str_hash(), how much of a difference 
do they make on non-x86 platforms not from Oracle :-) if done in C?
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to