On 12/06/2013 11:33, Bram Moolenaar wrote:
Mike Williams wrote:
Never leave a developer in possession of a profiler ...
So with my setup, nerdtree takes a while to start in large directory of
files, such as VIM's src directory. On Windows the profiler shows a
large amount of time is being spent in the CRT's isalnum(), isalpha()
and islower() internals, coping with locale handling.
Working backwards shows the calls are actually from the ASCII_IS...()
macros in macros.h and mainly coming from get_name_len(),
has_loop_cmd(), find_command(), etc - so from VIML processing. There is
also a call in modifier_len() that AFAICT only has to handle ASCII as well.
Since these macros enforce ASCII character range there is no need to hit
locale handling. The simplest solution is to use a lookup table with a
bit mask for the classifications needed. I have thrown in handling for
VIM_ISDIGIT as well.
After the patch, starting nerdtree in VIM's src is ~35% faster (1.7s vs
2.6s) and the CRT's locale handling functions no longer appear in the
profile (and I warmed the file cache to remove disk reading from the
times). In general this should help speed up longer running VIML scripts.
Yeah, we've had problems with library functions before.
I think using a table is often slower than simple compares, because a
table lookup has to access memory, while a compare is local inside the
CPU. Especially for ASCII_ISLOWER and ASCII_ISUPPER, it's just two
compares. Changing VIM_ISDIGIT() this way is likely to make it slower.
ASCII_ISALNUM() requires six compares, perhaps a table is faster then?
For compares you have potential pipeline stalls and relying on CPU
branch prediction to keep up instruction throughput. Depending on the
length of any loops, especially when using compiler optimisation which
may in-line a lot more code into what you think is a short function
source code wise, you can end up being slower using compares over
branchless code using memory.
Memory access should be relatively cheap - caches are relatively large
these days, loops of a reasonable size should benefit from locality of
reference resulting in frequently accessed memory appearing in the L1
cache, and we are talking about 128 bytes, or 2-cache lines (for Intel)
- less if the string being checked is all one case say. Plus out of
order processing of modern CPUs and the level of optimisations done
these days should ensure memory has been read in time to keep the
processor in full flow. And the code is simpler and easier to see that
it is right ;-)
If you are desperate to reduce memory, as it stands you could try
packing the array into nibbles which increases the complexity of the
indexing ((ascii_table[(c)>>2] >> ((c)&1<<2) | ... - is that right?) and
then you have increased the complexity of the table (harder to check)
and could slow things down again due to register spills from having to
do the extra calculations.
As I said, in the profiling after the change these functions have
disappeared all together. The memory cost is minimal especially given
the recent patches to change allocated structure packing is saving way more.
My 2ps worth anyway.
Mike
--
Yoghurt the Great, Yoghurt the All Powerful? No just plain yoghurt.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.