Re: Vim 7 performance notes

2007-02-10 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> Hi Bram Moolenaar, you wrote:
> 
> > 
> > It sounds like keeping only 1024 bytes would already work for most
> > situations.  That would be an acceptable amount to keep allocated at
> > all times.  So why don't we use this as the initial size, and when it
> > grows larger we free it when finished.  The growth size can be doubled
> > each time perhaps.
> 
> I chose 8192/16384 pair because it's the closest to original 1
> bytes. 1 itself would also be fine but I like round numbers...
> 
> The patch with changes which, I think, close to what you describe
> above is attached. Could you please take a look at it?

It's starting to look better, less disadvantages and should still give a
fair speedup.

1 bytes is OK for when it's going to be freed again soon.  If you
want to keep the memory allocated something less would be more
appropriate.  And thus you need to start low, but could increase it in
larger steps (since it's going to be freed on return anyway).

The growarray is used in lots of places.  Adding another field to it
will cause more memory to be used.  Isn't it easier to make another
version of ga_grow() that increases the growsize when allocating another
block?

Instead of doubling each time, which is going to be big chunks quickly,
another way would be to first allocate one block at the start size of
about 1000 bytes, then set the growsize to 1.  So we grow at the
same speed as before.  Then no extra field or function is needed, it's
local change in the regexp code.

Something like:
if (regstack.ga_data == NULL)
{
ga_init2(®stack, 1, 1000);
ga_grow(®stack, 1);
}
regstack.ga_growsize = 1;

I do wonder if the memory needs to be cleared when re-using the
allocated memory.  Can you check that?

-- 
A meeting is an event at which the minutes are kept and the hours are lost.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-10 Thread Alexei Alexandrov
Hi Bram Moolenaar, you wrote:

> 
> It sounds like keeping only 1024 bytes would already work for most
> situations.  That would be an acceptable amount to keep allocated at
> all times.  So why don't we use this as the initial size, and when it
> grows larger we free it when finished.  The growth size can be doubled
> each time perhaps.
> 

I chose 8192/16384 pair because it's the closest to original 1 bytes. 1 
itself would also be fine but I like round numbers...

The patch with changes which, I think, close to what you describe above is 
attached. Could you please take a look at it?

> 
> Right, this may happen and stack size wil greatly depend on the line
> length.
> 
...
> 
> That's very useful, thanks for diving into this.
> 

My pleasure.

-- 
Alexei Alexandrov


regexp-malloc.patch
Description: Binary data


Re: Vim 7 performance notes

2007-02-10 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> > This sounds like a bug in getc().  It should know that only one thread
> > is reading the file and skip the locking.  I don't think we should fix
> > library bugs in Vim, unless it's crashing or another significant problem.
> > 
> It can't be a bug. I might be missing what you mean, but I can't see
> how it can know that only one thread is reading a file. It doesn't
> have a clue whether you gave this FILE * to other threads or not. It
> tries to be lightweight - as I described in a separate mail it uses
> InterlockedIncrement/Decrement but they are not that lightweight -
> they don't require switching to kernel mode but still take about 200
> cycles of CPU each.
> 
> The only optimization that I see could be avoiding blocking (and even
> trying to block) in case if there is only one thread in current
> process and if there is guarantee that this particular call is
> guaranteed not to create any threads. But 1) it still may be expensive
> and 2) Vim has some background threads anyway, probably.

Hmm, getc() is apparently preparing for the worst and isn't optimized
for the 98% of the situations where there is only one thread reading
from the fd.  With the result that the, very often used, getc() is slow.

Besides using getc_unlocked() we could use read() and do our own
buffering.  That can't bee very complicated.  And we don't need to
figure out if the getc_unlocked() function is available.


-- 
hundred-and-one symptoms of being an internet addict:
102. When filling out your driver's license application, you give
 your IP address.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-10 Thread Alexei Alexandrov
Hi Bram Moolenaar, you wrote:

>> It's Windows, but on Linux it should be similar.
> 
> I would not assume it's similar until there is proof.
> 
Of course. I'm going to investigate it there.

> 
> This sounds like a bug in getc().  It should know that only one thread
> is reading the file and skip the locking.  I don't think we should fix
> library bugs in Vim, unless it's crashing or another significant problem.
> 
It can't be a bug. I might be missing what you mean, but I can't see how it can 
know that only one thread is reading a file. It doesn't have a clue whether you 
gave this FILE * to other threads or not. It tries to be lightweight - as I 
described in a separate mail it uses InterlockedIncrement/Decrement but they 
are not that lightweight - they don't require switching to kernel mode but 
still take about 200 cycles of CPU each.

The only optimization that I see could be avoiding blocking (and even trying to 
block) in case if there is only one thread in current process and if there is 
guarantee that this particular call is guaranteed not to create any threads. 
But 1) it still may be expensive and 2) Vim has some background threads anyway, 
probably.

> Perhaps you can already get a big increase by not compiling for
> debugging?  With MSVC this usually has a big impact.  Also largely
> defeats profiling with debugging enabled.
> 
I do _all_ performance measurements using optimized version of binary with 
symbols. This is just a must for performance tuning. 

-- 
Alexei Alexandrov


Re: Vim 7 performance notes

2007-02-08 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> I've also noticed that Vim spends somewhat significant time on startup
> loading spell files (I have 2 languages in my .vimrc: set
> spelllang=en,ru). The time is mostly spent in
> EnterCriticalSection/LeaveCriticalSection with getc() upper the stack.
> The reason for this is CRT blocking since the runtime is
> multithreaded. It's Windows, but on Linux it should be similar.

I would not assume it's similar until there is proof.

> As far as I understand, Vim doesn't access the spell file from
> multiple threads. Thus, the situation can be improved a lot: on Linux
> by using getc_unlocked. On Windows, starting VS 2005 there is a
> function _getc_nolock. Before VS 2005 this function can be emulated by
> macro:
> 
> #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
>   0xff & *(_stream)->_ptr++ : _filbuf(_stream))
> 
> By switching to non-blocking getc() in spell.c I was able to reduce
> Vim startup time from about 0.9 seconds to about 0.7 seconds.

This sounds like a bug in getc().  It should know that only one thread
is reading the file and skip the locking.  I don't think we should fix
library bugs in Vim, unless it's crashing or another significant problem.

Perhaps you can already get a big increase by not compiling for
debugging?  With MSVC this usually has a big impact.  Also largely
defeats profiling with debugging enabled.

-- 
>From "know your smileys":
 :-EHas major dental problems

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-08 Thread Nikolai Weibull

On 2/8/07, Alexei Alexandrov <[EMAIL PROTECTED]> wrote:


> When testing it with VST it gave 3.4% speed improvements (the same
> metodology - 3 tests before and after, choose the best results).



Well, it's not that much but it's still positive result. :-)


Considering how intense these computations are and how often they are
performed (probably the most time-consuming part of Vim), any
improvement will help a lot.

It's nice to see people alongside Bram working on improving Vim.  Your
work is highly appreciated (as well ;-).

 nikolai


Re: Vim 7 performance notes

2007-02-08 Thread Alexei Alexandrov
Hi Mikolaj Machowski, you wrote:

> 
> When testing it with VST it gave 3.4% speed improvements (the same
> metodology - 3 tests before and after, choose the best results).
> 

Well, it's not that much but it's still positive result. :-)

-- 
Alexei Alexandrov


Re: Vim 7 performance notes

2007-02-07 Thread Mikolaj Machowski
Dnia środa 07 luty 2007, Alexei Alexandrov napisał:
> Hi Mikolaj Machowski, you wrote:
> > Nice work. Could you send or place somewhere patches? I'd like to test
> > them on more complex regexps.
>
> Here it is. Note that the biggest speed-up is observed when regexp is
> matched a lot of times. The regexp mechanism itself is not affected at
> all here - so if you have one regexp which runs very long you won't
> probably notice any major difference.

When testing it with VST it gave 3.4% speed improvements (the same
metodology - 3 tests before and after, choose the best results).

m.



Re: Vim 7 performance notes

2007-02-07 Thread Alexei Alexandrov
Hi Alexei Alexandrov, you wrote:

>
> The program which confirms these numbers is attached.
>
Forgot the attachment.

-- 
Alexei Alexandrov


test_cs.c
Description: Binary data


Re: Vim 7 performance notes

2007-02-07 Thread Alexei Alexandrov
Hi George V. Reilly, you wrote:

> 
> How did you measure the time in EnterCriticalSection and 
> LeaveCriticalSection?

I discovered the problem by using a performance tuning tool which uses sampling 
approach to get statistical profile data. The intrusivity of this class of 
tools is very low. I verified that the problem exists by compiling Vim with 
unlocked getc() and measuring the difference (without any tool, just by $ time 
...).

> If there's no lock contention, these routines are little more than 
> InterlockedIncrement and InterlockedDecrement, without a kernel transition
> or blocking.

You're absolutely right, it doesn't need to switch the context when the CS is 
free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor 
machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, 
EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which 
confirms these numbers is attached. So it means that to read 1 Mbyte of data 
with locking getc (this is roughly the size of Russian + English spl files) we 
need to pay 400e6 cycles for these useless attempts to syncronize. Given the 
frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second 
which is 0.17 seconds - exactly the speedup I observed.

And remember that you need to pay this price on every Vim startup.

> 
> In other words, if you're seeing significant time in Enter/LeaveCS, I 
> can think of two causes. Either your measurement tool has perturbed the 
> results, or there really is some multithreaded lock contention. The 
> former seems more likely, as Vim is single-threaded, but who knows what 
> some DLLs in the Vim process might be doing.

No, there isn't any contention. The critical section in Microsoft 
multi-threaded CRT is per-FILE* so it's impossible that any guy competes with 
you unless you give them the FILE *. As far as I can say, descriptor to opened 
spell file is absolutetly private inside spell.c

Also, the numbers above show that the overhead is exactly this without any 
contention. If there were competition, the overhead would be much bigger.

> 
> I would be vary wary of using the _getc_nolock macro until we understand 
> why you are seeing those results.
> 

-- 
Alexei Alexandrov

test_cs.c_
Description: Binary data


Re: Vim 7 performance notes

2007-02-07 Thread Alexei Alexandrov
Hi George V. Reilly, you wrote:

> 
> How did you measure the time in EnterCriticalSection and 
> LeaveCriticalSection?
>

I discovered the problem by using a performance tuning tool which uses sampling 
approach to get statistical profile data. The intrusivity of this class of 
tools is very low. I verified that the problem exists by compiling Vim with 
unlocked getc() and measuring the difference (without any tool, just by $ time 
...).
>
> If there's no lock contention, these routines are little more than 
> InterlockedIncrement and InterlockedDecrement, without a kernel transition
> or blocking.
>
You're absolutely right, it doesn't need to switch the context when the CS is 
free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor 
machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, 
EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which 
confirms these numbers is attached. So it means that to read 1 Mbyte of data 
with locking getc (this is roughly the size of Russian + English spl files) we 
need to pay 400e6 cycles for these useless attempts to syncronize. Given the 
frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second 
which is 0.17 seconds - exactly the speedup I observed.

And remember that you need to pay this price on every Vim startup.
>
> 
> In other words, if you're seeing significant time in Enter/LeaveCS, I 
> can think of two causes. Either your measurement tool has perturbed the 
> results, or there really is some multithreaded lock contention. The 
> former seems more likely, as Vim is single-threaded, but who knows what 
> some DLLs in the Vim process might be doing.
>
No, there isn't any contention. The critical section in Microsoft 
multi-threaded CRT is per-FILE* so it's impossible that any guy competes with 
you unless you give them the FILE *. As far as I can say, descriptor to opened 
spell file is absolutetly private inside spell.c

Also, the numbers above show that the overhead is exactly this without any 
contention. If there were competition, the overhead would be much bigger.
>
> 
> I would be vary wary of using the _getc_nolock macro until we understand 
> why you are seeing those results.
> 
>
-- 
Alexei Alexandrov

Re: Vim 7 performance notes

2007-02-06 Thread George V. Reilly

Alexei Alexandrov wrote:

I've also noticed that Vim spends somewhat significant time on startup loading 
spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is 
mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the 
stack. The reason for this is CRT blocking since the runtime is multithreaded. 
It's Windows, but on Linux it should be similar.

As far as I understand, Vim doesn't access the spell file from multiple 
threads. Thus, the situation can be improved a lot: on Linux by using 
getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. 
Before VS 2005 this function can be emulated by macro:

#define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
  0xff & *(_stream)->_ptr++ : _filbuf(_stream))

By switching to non-blocking getc() in spell.c I was able to reduce Vim startup 
time from about 0.9 seconds to about 0.7 seconds.
  


How did you measure the time in EnterCriticalSection and 
LeaveCriticalSection? If there's no lock contention, these routines are 
little more than InterlockedIncrement and InterlockedDecrement, without 
a kernel transition or blocking. If the lock is already held, then by 
definition, EnterCriticalSection has to block until the lock is 
available. Similarly, if LeaveCriticalSection detects that there are 
other callers waiting, it will signal one of the waiters.


In other words, if you're seeing significant time in Enter/LeaveCS, I 
can think of two causes. Either your measurement tool has perturbed the 
results, or there really is some multithreaded lock contention. The 
former seems more likely, as Vim is single-threaded, but who knows what 
some DLLs in the Vim process might be doing.


I would be vary wary of using the _getc_nolock macro until we understand 
why you are seeing those results.


--
/George V. Reilly  [EMAIL PROTECTED]
http://www.georgevreilly.com/blog
The biggest mistake is not learning from all your other mistakes.



Re: Vim 7 performance notes

2007-02-06 Thread Alexei Alexandrov
Hi Alexei Alexandrov, you wrote:

> Hi Bram et al.,
> 
> I'm doing some performance investigations of Vim code trying to understand
> whether there are any possibilities to improve it.

I've also noticed that Vim spends somewhat significant time on startup loading 
spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is 
mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the 
stack. The reason for this is CRT blocking since the runtime is multithreaded. 
It's Windows, but on Linux it should be similar.

As far as I understand, Vim doesn't access the spell file from multiple 
threads. Thus, the situation can be improved a lot: on Linux by using 
getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. 
Before VS 2005 this function can be emulated by macro:

#define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \
  0xff & *(_stream)->_ptr++ : _filbuf(_stream))

By switching to non-blocking getc() in spell.c I was able to reduce Vim startup 
time from about 0.9 seconds to about 0.7 seconds.

-- 
Alexei Alexandrov


Re: Vim 7 performance notes

2007-02-06 Thread Alexei Alexandrov
Hi Mikolaj Machowski, you wrote:

> Nice work. Could you send or place somewhere patches? I'd like to test
> them on more complex regexps.

Here it is. Note that the biggest speed-up is observed when regexp is matched a 
lot of times. The regexp mechanism itself is not affected at all here - so if 
you have one regexp which runs very long you won't probably notice any major 
difference.

-- 
Alexei Alexandrov


regexp.c.diff
Description: Binary data


Re: Vim 7 performance notes

2007-02-06 Thread Mikolaj Machowski
Hello,

Nice work. Could you send or place somewhere patches? I'd like to test
them on more complex regexps.

TIA

m.



Re: Vim 7 performance notes

2007-02-05 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4
> Northwood, 2.4 GHz, 512M RAM.
> I did 2 things: understanding stack usage and performance measurement.
> To understand the stack usage I added some simple logging to regexp.c:
> printing ga_maxlen before regstack and backpos clearing and forced the
> arrays to have the grow size 1 (so that ga_maxlen will be high
> watermark in bytes). Of course, for performance investigations I used
> version with normal grow size and without logging.
> 
> The version with logging was used to perform the following:
> 
> 1. With syntax highlighting on, the following files were viewed from
> gg to G (with PgDn) and the following high watermark of stack size was
> observed: spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes),
> big text file with a lot of syntax errors (252 bytes)

It sounds like keeping only 1024 bytes would already work for most
situations.  That would be an acceptable amount to keep allocated at
all times.  So why don't we use this as the initial size, and when it
grows larger we free it when finished.  The growth size can be doubled
each time perhaps.

> 2. Command
> 
> gvim -c "vimgrep /a\(.\)*z/ *.c | q"
> 
> was executed in VIM 7 source directory. Stack watermark - 31008 bytes.
> This is example of non-optimal regexp which tends to take a lot of
> stack space.

Right, this may happen and stack size wil greatly depend on the line
length.

> Similar other test cases were tried leading to the following
> conclusions: 1) there is a lot of vim_regexec_both calls during syntax
> highlighting which work in very shallow stacks (<1K); 2) when user
> searches for something with regexp there are cases when regular
> expression can require big amount of memory (>10K).
> 
> The performance measurements were done against original version
> (7.0.188) and modified regexp.c (initial: 8192, keep limit: 16384).
> Each measurement was performed 3 times, minimal time was picked up.
> 
> First, I test the syntax highlighting speed:
> Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | 
> redraw! | endfor | q"
> Original version: 10.6 seconds
> Modified version: 8.5 seconds
> The difference is about 25%.
> 
> Second, I did some grepping through Vim sources again:
> Command: gvim.exe -c "vimgrep /a.*z/ *.c | q"
> Original version: 6.6 seconds
> Modified version: 5.6 seconds
> The difference is about 15%.

That's very useful, thanks for diving into this.

-- 
>From "know your smileys":
 8<}}   Glasses, big nose, beard

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-05 Thread Alexei Alexandrov
Hi Bram Moolenaar, you wrote:

> 
> Speed should be OK this way, but it does keep up to 32 Kbyte allocated.
> That may not seem much, but if we do this in many places it quickly adds
> up.

Any "keep limit" greater than initial size (e.g. 16384 bytes) will give the 
same effect in many cases. By "many cases" here I mean cases when the stack 
just doesn't grow more than 512 bytes (very usual for syntax highlighting). Is 
16384 still too big? Regexp matching and syntax highlighting is one of the most 
important things in VIM so maybe it's tolerable? Also, as far as I understand, 
previosly (version 6) VIM used program stack as regstack, right? Compared with 
program stack solution current solution with the change proposed is much better 
because program stack never shrinks - once it grows to 1M, for example, the 
memory won't be ever given back. At least, this is so on Windows, Linux and I 
guess most Unixes.
 
> Can you show the benchmarks you used to see the performance and the
> stack space that is being used?  Otherwise we keep guessing the best
> numbers.

OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4 
Northwood, 2.4 GHz, 512M RAM.
I did 2 things: understanding stack usage and performance measurement. To 
understand the stack usage I added some simple logging to regexp.c: printing 
ga_maxlen before regstack and backpos clearing and forced the arrays to have 
the grow size 1 (so that ga_maxlen will be high watermark in bytes). Of course, 
for performance investigations I used version with normal grow size and without 
logging.

The version with logging was used to perform the following:

1. With syntax highlighting on, the following files were viewed from gg to G 
(with PgDn) and the following high watermark of stack size was observed: 
spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes), big text file with a 
lot of syntax errors (252 bytes)

2. Command

gvim -c "vimgrep /a\(.\)*z/ *.c | q"

was executed in VIM 7 source directory. Stack watermark - 31008 bytes. This is 
example of non-optimal regexp which tends to take a lot of stack space.

Similar other test cases were tried leading to the following conclusions: 1) 
there is a lot of vim_regexec_both calls during syntax highlighting which work 
in very shallow stacks (<1K); 2) when user searches for something with regexp 
there are cases when regular expression can require big amount of memory (>10K).

The performance measurements were done against original version (7.0.188) and 
modified regexp.c (initial: 8192, keep limit: 16384). Each measurement was 
performed 3 times, minimal time was picked up.

First, I test the syntax highlighting speed:
Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | redraw! 
| endfor | q"
Original version: 10.6 seconds
Modified version: 8.5 seconds
The difference is about 25%.

Second, I did some grepping through Vim sources again:
Command: gvim.exe -c "vimgrep /a.*z/ *.c | q"
Original version: 6.6 seconds
Modified version: 5.6 seconds
The difference is about 15%.

> Coding detail: please don't use "if (!number)", use "if (number == 0)",
> that is so much easier to read.  Checking if ga_data is NULL would be
> simpler.

Got it - no problem.

-- 
Alexei Alexandrov


Re: Vim 7 performance notes

2007-02-04 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> >
> > The idea to gradually increase the chunk size makes sense, but not
> > everywhere.  For the syntax stack it's probably better to start with a
> > stack that is mostly needed, then growing quite quickly (say double the
> > chunk size every time).  That's because when recursive things are
> > involved we need much more space.  And copying the stack to another
> > place is more expensive than clearing with zeroes.
> >
> > Perhaps you can do some investigation about what the size mostly ends up
> > to be.  Then use that with a special version of ga_grow() that increases
> > the chunk size every time.
> 
> I've also tried the approach with persisting the regstack and backpos
> allocation across calls to vim_regexec_both. It seems to work even
> better than increasing the grow size in my particular test cases. And
> I can't think up any situation when it can slow things down. Could you
> please take a look at the patch attached and provide your opinion?

Speed should be OK this way, but it does keep up to 32 Kbyte allocated.
That may not seem much, but if we do this in many places it quickly adds
up.

Can you show the benchmarks you used to see the performance and the
stack space that is being used?  Otherwise we keep guessing the best
numbers.

Coding detail: please don't use "if (!number)", use "if (number == 0)",
that is so much easier to read.  Checking if ga_data is NULL would be
simpler.

-- 
>From "know your smileys":
 :q vi user saying, "How do I get out of this damn emacs editor?"

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-04 Thread Nikolai Weibull

On 2/4/07, Yakov Lerner <[EMAIL PROTECTED]> wrote:


Gnu malloc (glibc) is exceptionally fast, iirc.


Yes.  It's based on Doug Lea's malloc:

http://gee.cs.oswego.edu/dl/html/malloc.html

 nikolai


Re: Vim 7 performance notes

2007-02-04 Thread Yakov Lerner

On 2/4/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote:


Alexei Alexandrov wrote:

> I'm doing some performance investigations of Vim code trying to
> understand whether there are any possibilities to improve it.
> Currently I've made the following observations (all investigations are
> done on Windows):
>
> Redundant work is done during regexp operations in syntax
> highlighting. On some files it is very noticable. The stack of the
> hotspot is ... > syn_current_attr > syn_regexec > vim_regexec_multi >
> vim_regexec_both > regtry > regmatch > ga_grow > alloc_clear > memset.
> So alloc_clear spends quite a few clockticks in lalloc() and memset().
> The reason for this is pessimistically big grow size for regset
> growing array:
>
> ga_init2(®stack, 1, 1);
>
> This is not very good: many regexp operations don't go deep -
> non-match is detected very quickly. But even one element on the stack
> will lead to allocating at least 1 bytes (which should be fast
> with good CRT memory allocator) and (worse) initializing these 1
> bytes with zeros (won't be that fast).
>
> One possible solution would be to keep regstack alive across calls to
> vim_regexec_both, but I'm not sure if it's can be done safely. What I
> did was replacing the grow size with smaller number and making the
> grow size for growing arrays dynamic with increase of 25%:
>
> --- regexp.c (revision 136)
> +++ regexp.c (working copy)
> @@ -3350,7 +3350,7 @@
>  /* Init the regstack empty.  Use an item size of 1 byte, since we push
>   * different things onto it.  Use a large grow size to avoid reallocating
>   * it too often. */
> -ga_init2(®stack, 1, 1);
> +ga_init2(®stack, 1, 64);
>
>  /* Init the backpos table empty. */
>  ga_init2(&backpos, sizeof(backpos_T), 10);
>
> --- misc2.c (revision 136)
> +++ misc2.c (working copy)
> @@ -1905,6 +1905,7 @@
>
>  {
>   if (n < gap->ga_growsize)
>   n = gap->ga_growsize;
> +gap->ga_growsize += (gap->ga_growsize >> 2);
>   len = gap->ga_itemsize * (gap->ga_len + n);
>   pp = alloc_clear((unsigned)len);
>   if (pp == NULL)
>
> With this change I can see serious performance improvements, but I'm
> not sure if they are safe.
> Bram, does it look making any sense?

This is a tradeoff between allocating too much and allocating too often.
What works best completely depends on the implementation of malloc() and
free().  I know that there are many implementations where they are quite
slow.  Then allocating larger chunks less often is better.  On some
systems they are fast and the chunks can be smaller.


Gnu malloc (glibc) is exceptionally fast, iirc. It is possible
to benchmark the malloc speed during  the ./configure  time.
And auto-select the initital size depending on the results.

The procmail this similar technique in configure: It automatically
benchmarks  it's own builtin strstr() vs system's strstr() and selects
the one which is faster.

Yakov


Re: Vim 7 performance notes

2007-02-04 Thread Bram Moolenaar

Alexei Alexandrov wrote:

> I'm doing some performance investigations of Vim code trying to
> understand whether there are any possibilities to improve it.
> Currently I've made the following observations (all investigations are
> done on Windows):
> 
> Redundant work is done during regexp operations in syntax
> highlighting. On some files it is very noticable. The stack of the
> hotspot is ... > syn_current_attr > syn_regexec > vim_regexec_multi >
> vim_regexec_both > regtry > regmatch > ga_grow > alloc_clear > memset.
> So alloc_clear spends quite a few clockticks in lalloc() and memset().
> The reason for this is pessimistically big grow size for regset
> growing array:
> 
> ga_init2(®stack, 1, 1);
> 
> This is not very good: many regexp operations don't go deep -
> non-match is detected very quickly. But even one element on the stack
> will lead to allocating at least 1 bytes (which should be fast
> with good CRT memory allocator) and (worse) initializing these 1
> bytes with zeros (won't be that fast).
> 
> One possible solution would be to keep regstack alive across calls to
> vim_regexec_both, but I'm not sure if it's can be done safely. What I
> did was replacing the grow size with smaller number and making the
> grow size for growing arrays dynamic with increase of 25%:
> 
> --- regexp.c (revision 136)
> +++ regexp.c (working copy)
> @@ -3350,7 +3350,7 @@
>  /* Init the regstack empty.  Use an item size of 1 byte, since we push
>   * different things onto it.  Use a large grow size to avoid reallocating
>   * it too often. */
> -ga_init2(®stack, 1, 1);
> +ga_init2(®stack, 1, 64);
> 
>  /* Init the backpos table empty. */
>  ga_init2(&backpos, sizeof(backpos_T), 10);
> 
> --- misc2.c (revision 136)
> +++ misc2.c (working copy)
> @@ -1905,6 +1905,7 @@
> 
>  {
>   if (n < gap->ga_growsize)
>   n = gap->ga_growsize;
> +gap->ga_growsize += (gap->ga_growsize >> 2);
>   len = gap->ga_itemsize * (gap->ga_len + n);
>   pp = alloc_clear((unsigned)len);
>   if (pp == NULL)
> 
> With this change I can see serious performance improvements, but I'm
> not sure if they are safe.
> Bram, does it look making any sense?

This is a tradeoff between allocating too much and allocating too often.
What works best completely depends on the implementation of malloc() and
free().  I know that there are many implementations where they are quite
slow.  Then allocating larger chunks less often is better.  On some
systems they are fast and the chunks can be smaller.

The idea to gradually increase the chunk size makes sense, but not
everywhere.  For the syntax stack it's probably better to start with a
stack that is mostly needed, then growing quite quickly (say double the
chunk size every time).  That's because when recursive things are
involved we need much more space.  And copying the stack to another
place is more expensive than clearing with zeroes.

Perhaps you can do some investigation about what the size mostly ends up
to be.  Then use that with a special version of ga_grow() that increases
the chunk size every time.

-- 
hundred-and-one symptoms of being an internet addict:
87. Everyone you know asks why your phone line is always busy ...and
you tell them to send an e-mail.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: Vim 7 performance notes

2007-02-01 Thread Martin Stubenschrott
On Thu, Feb 01, 2007 at 11:20:16PM +0300, Alexei Alexandrov wrote:
> Hi Bram et al.,
> 
> I'm doing some performance investigations of Vim code trying to understand 
> whether there are any possibilities to improve it.
> Currently I've made the following observations (all investigations are done 
> on Windows):
> 
> Redundant work is done during regexp operations in syntax highlighting. On 
> some files it is very noticable. The stack of the hotspot is ... > 
> syn_current_attr > syn_regexec > vim_regexec_multi > vim_regexec_both > 
> regtry > regmatch > ga_grow > alloc_clear > memset. So alloc_clear spends 
> quite a few clockticks in lalloc() and memset().
> The reason for this is pessimistically big grow size for regset growing array:
> 
> ga_init2(®stack, 1, 1);
> 
> This is not very good: many regexp operations don't go deep - non-match is 
> detected very quickly. But even one element on the stack will lead to 
> allocating at least 1 bytes (which should be fast with good CRT memory 
> allocator) and (worse) initializing these 1 bytes with zeros (won't be 
> that fast).

I am not sure if this is really relevant to vim, because I don't have a
clue which regexp matching algorithm it is using, but you might want to
look into this article, when it comes to regexp speed:

http://swtch.com/~rsc/regexp/regexp1.html

regards,

Martin


Vim 7 performance notes

2007-02-01 Thread Alexei Alexandrov
Hi Bram et al.,

I'm doing some performance investigations of Vim code trying to understand 
whether there are any possibilities to improve it.
Currently I've made the following observations (all investigations are done on 
Windows):

Redundant work is done during regexp operations in syntax highlighting. On some 
files it is very noticable. The stack of the hotspot is ... > syn_current_attr 
> syn_regexec > vim_regexec_multi > vim_regexec_both > regtry > regmatch > 
ga_grow > alloc_clear > memset. So alloc_clear spends quite a few clockticks in 
lalloc() and memset().
The reason for this is pessimistically big grow size for regset growing array:

ga_init2(®stack, 1, 1);

This is not very good: many regexp operations don't go deep - non-match is 
detected very quickly. But even one element on the stack will lead to 
allocating at least 1 bytes (which should be fast with good CRT memory 
allocator) and (worse) initializing these 1 bytes with zeros (won't be that 
fast).

One possible solution would be to keep regstack alive across calls to 
vim_regexec_both, but I'm not sure if it's can be done safely. What I did was 
replacing the grow size with smaller number and making the grow size for 
growing arrays dynamic with increase of 25%:

--- regexp.c (revision 136)
+++ regexp.c (working copy)
@@ -3350,7 +3350,7 @@
 /* Init the regstack empty.  Use an item size of 1 byte, since we push
  * different things onto it.  Use a large grow size to avoid reallocating
  * it too often. */
-ga_init2(®stack, 1, 1);
+ga_init2(®stack, 1, 64);

 /* Init the backpos table empty. */
 ga_init2(&backpos, sizeof(backpos_T), 10);

--- misc2.c (revision 136)
+++ misc2.c (working copy)
@@ -1905,6 +1905,7 @@

 {
  if (n < gap->ga_growsize)
  n = gap->ga_growsize;
+gap->ga_growsize += (gap->ga_growsize >> 2);
  len = gap->ga_itemsize * (gap->ga_len + n);
  pp = alloc_clear((unsigned)len);
  if (pp == NULL)

With this change I can see serious performance improvements, but I'm not sure 
if they are safe.
Bram, does it look making any sense?

-- 
Alexei Alexandrov