On 06/06/2013 13:19, Bram Moolenaar wrote:
Mike Williams wrote:
[...]
Clearly the new engine is much faster for long lines (>5000 chars). The
old engine comes to a grinding halt there.
I'm planning to add profiling to syntax patterns, so that we can see
which pattern is taking most time. This will help both optimizing the
regexp engine and syntax file authors.
I have had a quick look using Intel VTune on Windows with the XML file
and the vast majority of the time appears to be spent in calls to
memset() from lalloc_clear() at the start of nfa_regmatch() and lalloc()
through nfa_regcomp_start().
This is quick and dirty results from an optimised build with functions
inlined, always fun mapping reported lines back to original source in
that situation.
Thanks, that is very useful to know.
One problem with the Vim implementation of the NFA engine is that it
needs to remember submatches in every state. Other NFA engines can just
have a list with state IDs, we need much more space allocated and copied
around.
I'll have to think about a way to do this more efficiently.
Some of the memset()s are probably unnecessary.
Still working with grammar.xml.
Another speed hog, at least on Windows, is the call to mch_avail_mem()
which is called for every call to lalloc(). Some experimentation shows
a single call takes ~0.85ms which is an age. I imagine it doing a
context switch to interact with the kernel to get the memory
information. Context switches are always slow.
My profile showed a time in lalloc() on calls to mch_avail_mem() of
~1.5s, so ~1.7million allocations are being done. Stubbing out
mch_avail_mem() removes lalloc() from the profile results altogether.
Given that current machines tend to have GiBs of RAM, and the fact that
by the time the function returns the numbers could have changed
significantly is this check still necessary? Should memory available
check be made optional on a setting for those where it is an issue?
This leaves the top functions in the profile of addstate() and
nfa_regmatch() but with no obvious hotspots within them. However, after
them malloc() and free() appear taking ~.4s combined (less than a third
of the time for the check on memory availability!) Perhaps a new
allocator is needed to cope better with the allocation patterns now
happening. Not for 7.4 though.
HTH - TTFN
Mike
--
This is certainly more fun than being hit with a hammer.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.