Re: Patch: Optimise ascii char classificaton

Mike Williams Wed, 12 Jun 2013 04:00:45 -0700

On 12/06/2013 11:33, Bram Moolenaar wrote:


Mike Williams wrote:

Never leave a developer in possession of a profiler ...

So with my setup, nerdtree takes a while to start in large directory of
files, such as VIM's src directory.  On Windows the profiler shows a
large amount of time is being spent in the CRT's isalnum(), isalpha()
and islower() internals, coping with locale handling.

Working backwards shows the calls are actually from the ASCII_IS...()
macros in macros.h and mainly coming from get_name_len(),
has_loop_cmd(), find_command(), etc - so from VIML processing.  There is
also a call in modifier_len() that AFAICT only has to handle ASCII as well.

Since these macros enforce ASCII character range there is no need to hit
locale handling.  The simplest solution is to use a lookup table with a
bit mask for the classifications needed.  I have thrown in handling for
VIM_ISDIGIT as well.

After the patch, starting nerdtree in VIM's src is ~35% faster (1.7s vs
2.6s) and the CRT's locale handling functions no longer appear in the
profile (and I warmed the file cache to remove disk reading from the
times).  In general this should help speed up longer running VIML scripts.


Yeah, we've had problems with library functions before.

I think using a table is often slower than simple compares, because a
table lookup has to access memory, while a compare is local inside the
CPU.  Especially for ASCII_ISLOWER and ASCII_ISUPPER, it's just two
compares.  Changing VIM_ISDIGIT() this way is likely to make it slower.
ASCII_ISALNUM() requires six compares, perhaps a table is faster then?

For compares you have potential pipeline stalls and relying on CPUbranch prediction to keep up instruction throughput. Depending on thelength of any loops, especially when using compiler optimisation whichmay in-line a lot more code into what you think is a short functionsource code wise, you can end up being slower using compares overbranchless code using memory.

Memory access should be relatively cheap - caches are relatively largethese days, loops of a reasonable size should benefit from locality ofreference resulting in frequently accessed memory appearing in the L1cache, and we are talking about 128 bytes, or 2-cache lines (for Intel)- less if the string being checked is all one case say. Plus out oforder processing of modern CPUs and the level of optimisations donethese days should ensure memory has been read in time to keep theprocessor in full flow. And the code is simpler and easier to see thatit is right ;-)

If you are desperate to reduce memory, as it stands you could trypacking the array into nibbles which increases the complexity of theindexing ((ascii_table[(c)>>2] >> ((c)&1<<2) | ... - is that right?) andthen you have increased the complexity of the table (harder to check)and could slow things down again due to register spills from having todo the extra calculations.

As I said, in the profiling after the change these functions havedisappeared all together. The memory cost is minimal especially giventhe recent patches to change allocated structure packing is saving way more.


My 2ps worth anyway.

Mike
--
Yoghurt the Great, Yoghurt the All Powerful? No just plain yoghurt.

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---You received this message because you are subscribed to the Google Groups "vim_dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Patch: Optimise ascii char classificaton

Raspunde prin e-mail lui