Re: Vim 7 performance notes
Alexei Alexandrov wrote: > Hi Bram Moolenaar, you wrote: > > > > > It sounds like keeping only 1024 bytes would already work for most > > situations. That would be an acceptable amount to keep allocated at > > all times. So why don't we use this as the initial size, and when it > > grows larger we free it when finished. The growth size can be doubled > > each time perhaps. > > I chose 8192/16384 pair because it's the closest to original 1 > bytes. 1 itself would also be fine but I like round numbers... > > The patch with changes which, I think, close to what you describe > above is attached. Could you please take a look at it? It's starting to look better, less disadvantages and should still give a fair speedup. 1 bytes is OK for when it's going to be freed again soon. If you want to keep the memory allocated something less would be more appropriate. And thus you need to start low, but could increase it in larger steps (since it's going to be freed on return anyway). The growarray is used in lots of places. Adding another field to it will cause more memory to be used. Isn't it easier to make another version of ga_grow() that increases the growsize when allocating another block? Instead of doubling each time, which is going to be big chunks quickly, another way would be to first allocate one block at the start size of about 1000 bytes, then set the growsize to 1. So we grow at the same speed as before. Then no extra field or function is needed, it's local change in the regexp code. Something like: if (regstack.ga_data == NULL) { ga_init2(®stack, 1, 1000); ga_grow(®stack, 1); } regstack.ga_growsize = 1; I do wonder if the memory needs to be cleared when re-using the allocated memory. Can you check that? -- A meeting is an event at which the minutes are kept and the hours are lost. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
Hi Bram Moolenaar, you wrote: > > It sounds like keeping only 1024 bytes would already work for most > situations. That would be an acceptable amount to keep allocated at > all times. So why don't we use this as the initial size, and when it > grows larger we free it when finished. The growth size can be doubled > each time perhaps. > I chose 8192/16384 pair because it's the closest to original 1 bytes. 1 itself would also be fine but I like round numbers... The patch with changes which, I think, close to what you describe above is attached. Could you please take a look at it? > > Right, this may happen and stack size wil greatly depend on the line > length. > ... > > That's very useful, thanks for diving into this. > My pleasure. -- Alexei Alexandrov regexp-malloc.patch Description: Binary data
Re: Vim 7 performance notes
Alexei Alexandrov wrote: > > This sounds like a bug in getc(). It should know that only one thread > > is reading the file and skip the locking. I don't think we should fix > > library bugs in Vim, unless it's crashing or another significant problem. > > > It can't be a bug. I might be missing what you mean, but I can't see > how it can know that only one thread is reading a file. It doesn't > have a clue whether you gave this FILE * to other threads or not. It > tries to be lightweight - as I described in a separate mail it uses > InterlockedIncrement/Decrement but they are not that lightweight - > they don't require switching to kernel mode but still take about 200 > cycles of CPU each. > > The only optimization that I see could be avoiding blocking (and even > trying to block) in case if there is only one thread in current > process and if there is guarantee that this particular call is > guaranteed not to create any threads. But 1) it still may be expensive > and 2) Vim has some background threads anyway, probably. Hmm, getc() is apparently preparing for the worst and isn't optimized for the 98% of the situations where there is only one thread reading from the fd. With the result that the, very often used, getc() is slow. Besides using getc_unlocked() we could use read() and do our own buffering. That can't bee very complicated. And we don't need to figure out if the getc_unlocked() function is available. -- hundred-and-one symptoms of being an internet addict: 102. When filling out your driver's license application, you give your IP address. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
Hi Bram Moolenaar, you wrote: >> It's Windows, but on Linux it should be similar. > > I would not assume it's similar until there is proof. > Of course. I'm going to investigate it there. > > This sounds like a bug in getc(). It should know that only one thread > is reading the file and skip the locking. I don't think we should fix > library bugs in Vim, unless it's crashing or another significant problem. > It can't be a bug. I might be missing what you mean, but I can't see how it can know that only one thread is reading a file. It doesn't have a clue whether you gave this FILE * to other threads or not. It tries to be lightweight - as I described in a separate mail it uses InterlockedIncrement/Decrement but they are not that lightweight - they don't require switching to kernel mode but still take about 200 cycles of CPU each. The only optimization that I see could be avoiding blocking (and even trying to block) in case if there is only one thread in current process and if there is guarantee that this particular call is guaranteed not to create any threads. But 1) it still may be expensive and 2) Vim has some background threads anyway, probably. > Perhaps you can already get a big increase by not compiling for > debugging? With MSVC this usually has a big impact. Also largely > defeats profiling with debugging enabled. > I do _all_ performance measurements using optimized version of binary with symbols. This is just a must for performance tuning. -- Alexei Alexandrov
Re: Vim 7 performance notes
Alexei Alexandrov wrote: > I've also noticed that Vim spends somewhat significant time on startup > loading spell files (I have 2 languages in my .vimrc: set > spelllang=en,ru). The time is mostly spent in > EnterCriticalSection/LeaveCriticalSection with getc() upper the stack. > The reason for this is CRT blocking since the runtime is > multithreaded. It's Windows, but on Linux it should be similar. I would not assume it's similar until there is proof. > As far as I understand, Vim doesn't access the spell file from > multiple threads. Thus, the situation can be improved a lot: on Linux > by using getc_unlocked. On Windows, starting VS 2005 there is a > function _getc_nolock. Before VS 2005 this function can be emulated by > macro: > > #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \ > 0xff & *(_stream)->_ptr++ : _filbuf(_stream)) > > By switching to non-blocking getc() in spell.c I was able to reduce > Vim startup time from about 0.9 seconds to about 0.7 seconds. This sounds like a bug in getc(). It should know that only one thread is reading the file and skip the locking. I don't think we should fix library bugs in Vim, unless it's crashing or another significant problem. Perhaps you can already get a big increase by not compiling for debugging? With MSVC this usually has a big impact. Also largely defeats profiling with debugging enabled. -- >From "know your smileys": :-EHas major dental problems /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
On 2/8/07, Alexei Alexandrov <[EMAIL PROTECTED]> wrote: > When testing it with VST it gave 3.4% speed improvements (the same > metodology - 3 tests before and after, choose the best results). Well, it's not that much but it's still positive result. :-) Considering how intense these computations are and how often they are performed (probably the most time-consuming part of Vim), any improvement will help a lot. It's nice to see people alongside Bram working on improving Vim. Your work is highly appreciated (as well ;-). nikolai
Re: Vim 7 performance notes
Hi Mikolaj Machowski, you wrote: > > When testing it with VST it gave 3.4% speed improvements (the same > metodology - 3 tests before and after, choose the best results). > Well, it's not that much but it's still positive result. :-) -- Alexei Alexandrov
Re: Vim 7 performance notes
Dnia środa 07 luty 2007, Alexei Alexandrov napisał: > Hi Mikolaj Machowski, you wrote: > > Nice work. Could you send or place somewhere patches? I'd like to test > > them on more complex regexps. > > Here it is. Note that the biggest speed-up is observed when regexp is > matched a lot of times. The regexp mechanism itself is not affected at > all here - so if you have one regexp which runs very long you won't > probably notice any major difference. When testing it with VST it gave 3.4% speed improvements (the same metodology - 3 tests before and after, choose the best results). m.
Re: Vim 7 performance notes
Hi Alexei Alexandrov, you wrote: > > The program which confirms these numbers is attached. > Forgot the attachment. -- Alexei Alexandrov test_cs.c Description: Binary data
Re: Vim 7 performance notes
Hi George V. Reilly, you wrote: > > How did you measure the time in EnterCriticalSection and > LeaveCriticalSection? I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of tools is very low. I verified that the problem exists by compiling Vim with unlocked getc() and measuring the difference (without any tool, just by $ time ...). > If there's no lock contention, these routines are little more than > InterlockedIncrement and InterlockedDecrement, without a kernel transition > or blocking. You're absolutely right, it doesn't need to switch the context when the CS is free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which confirms these numbers is attached. So it means that to read 1 Mbyte of data with locking getc (this is roughly the size of Russian + English spl files) we need to pay 400e6 cycles for these useless attempts to syncronize. Given the frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second which is 0.17 seconds - exactly the speedup I observed. And remember that you need to pay this price on every Vim startup. > > In other words, if you're seeing significant time in Enter/LeaveCS, I > can think of two causes. Either your measurement tool has perturbed the > results, or there really is some multithreaded lock contention. The > former seems more likely, as Vim is single-threaded, but who knows what > some DLLs in the Vim process might be doing. No, there isn't any contention. The critical section in Microsoft multi-threaded CRT is per-FILE* so it's impossible that any guy competes with you unless you give them the FILE *. As far as I can say, descriptor to opened spell file is absolutetly private inside spell.c Also, the numbers above show that the overhead is exactly this without any contention. If there were competition, the overhead would be much bigger. > > I would be vary wary of using the _getc_nolock macro until we understand > why you are seeing those results. > -- Alexei Alexandrov test_cs.c_ Description: Binary data
Re: Vim 7 performance notes
Hi George V. Reilly, you wrote: > > How did you measure the time in EnterCriticalSection and > LeaveCriticalSection? > I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of tools is very low. I verified that the problem exists by compiling Vim with unlocked getc() and measuring the difference (without any tool, just by $ time ...). > > If there's no lock contention, these routines are little more than > InterlockedIncrement and InterlockedDecrement, without a kernel transition > or blocking. > You're absolutely right, it doesn't need to switch the context when the CS is free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which confirms these numbers is attached. So it means that to read 1 Mbyte of data with locking getc (this is roughly the size of Russian + English spl files) we need to pay 400e6 cycles for these useless attempts to syncronize. Given the frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second which is 0.17 seconds - exactly the speedup I observed. And remember that you need to pay this price on every Vim startup. > > > In other words, if you're seeing significant time in Enter/LeaveCS, I > can think of two causes. Either your measurement tool has perturbed the > results, or there really is some multithreaded lock contention. The > former seems more likely, as Vim is single-threaded, but who knows what > some DLLs in the Vim process might be doing. > No, there isn't any contention. The critical section in Microsoft multi-threaded CRT is per-FILE* so it's impossible that any guy competes with you unless you give them the FILE *. As far as I can say, descriptor to opened spell file is absolutetly private inside spell.c Also, the numbers above show that the overhead is exactly this without any contention. If there were competition, the overhead would be much bigger. > > > I would be vary wary of using the _getc_nolock macro until we understand > why you are seeing those results. > > -- Alexei Alexandrov
Re: Vim 7 performance notes
Alexei Alexandrov wrote: I've also noticed that Vim spends somewhat significant time on startup loading spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the stack. The reason for this is CRT blocking since the runtime is multithreaded. It's Windows, but on Linux it should be similar. As far as I understand, Vim doesn't access the spell file from multiple threads. Thus, the situation can be improved a lot: on Linux by using getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. Before VS 2005 this function can be emulated by macro: #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \ 0xff & *(_stream)->_ptr++ : _filbuf(_stream)) By switching to non-blocking getc() in spell.c I was able to reduce Vim startup time from about 0.9 seconds to about 0.7 seconds. How did you measure the time in EnterCriticalSection and LeaveCriticalSection? If there's no lock contention, these routines are little more than InterlockedIncrement and InterlockedDecrement, without a kernel transition or blocking. If the lock is already held, then by definition, EnterCriticalSection has to block until the lock is available. Similarly, if LeaveCriticalSection detects that there are other callers waiting, it will signal one of the waiters. In other words, if you're seeing significant time in Enter/LeaveCS, I can think of two causes. Either your measurement tool has perturbed the results, or there really is some multithreaded lock contention. The former seems more likely, as Vim is single-threaded, but who knows what some DLLs in the Vim process might be doing. I would be vary wary of using the _getc_nolock macro until we understand why you are seeing those results. -- /George V. Reilly [EMAIL PROTECTED] http://www.georgevreilly.com/blog The biggest mistake is not learning from all your other mistakes.
Re: Vim 7 performance notes
Hi Alexei Alexandrov, you wrote: > Hi Bram et al., > > I'm doing some performance investigations of Vim code trying to understand > whether there are any possibilities to improve it. I've also noticed that Vim spends somewhat significant time on startup loading spell files (I have 2 languages in my .vimrc: set spelllang=en,ru). The time is mostly spent in EnterCriticalSection/LeaveCriticalSection with getc() upper the stack. The reason for this is CRT blocking since the runtime is multithreaded. It's Windows, but on Linux it should be similar. As far as I understand, Vim doesn't access the spell file from multiple threads. Thus, the situation can be improved a lot: on Linux by using getc_unlocked. On Windows, starting VS 2005 there is a function _getc_nolock. Before VS 2005 this function can be emulated by macro: #define _getc_nolock(_stream) (--(_stream)->_cnt >= 0 ? \ 0xff & *(_stream)->_ptr++ : _filbuf(_stream)) By switching to non-blocking getc() in spell.c I was able to reduce Vim startup time from about 0.9 seconds to about 0.7 seconds. -- Alexei Alexandrov
Re: Vim 7 performance notes
Hi Mikolaj Machowski, you wrote: > Nice work. Could you send or place somewhere patches? I'd like to test > them on more complex regexps. Here it is. Note that the biggest speed-up is observed when regexp is matched a lot of times. The regexp mechanism itself is not affected at all here - so if you have one regexp which runs very long you won't probably notice any major difference. -- Alexei Alexandrov regexp.c.diff Description: Binary data
Re: Vim 7 performance notes
Hello, Nice work. Could you send or place somewhere patches? I'd like to test them on more complex regexps. TIA m.
Re: Vim 7 performance notes
Alexei Alexandrov wrote: > OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4 > Northwood, 2.4 GHz, 512M RAM. > I did 2 things: understanding stack usage and performance measurement. > To understand the stack usage I added some simple logging to regexp.c: > printing ga_maxlen before regstack and backpos clearing and forced the > arrays to have the grow size 1 (so that ga_maxlen will be high > watermark in bytes). Of course, for performance investigations I used > version with normal grow size and without logging. > > The version with logging was used to perform the following: > > 1. With syntax highlighting on, the following files were viewed from > gg to G (with PgDn) and the following high watermark of stack size was > observed: spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes), > big text file with a lot of syntax errors (252 bytes) It sounds like keeping only 1024 bytes would already work for most situations. That would be an acceptable amount to keep allocated at all times. So why don't we use this as the initial size, and when it grows larger we free it when finished. The growth size can be doubled each time perhaps. > 2. Command > > gvim -c "vimgrep /a\(.\)*z/ *.c | q" > > was executed in VIM 7 source directory. Stack watermark - 31008 bytes. > This is example of non-optimal regexp which tends to take a lot of > stack space. Right, this may happen and stack size wil greatly depend on the line length. > Similar other test cases were tried leading to the following > conclusions: 1) there is a lot of vim_regexec_both calls during syntax > highlighting which work in very shallow stacks (<1K); 2) when user > searches for something with regexp there are cases when regular > expression can require big amount of memory (>10K). > > The performance measurements were done against original version > (7.0.188) and modified regexp.c (initial: 8192, keep limit: 16384). > Each measurement was performed 3 times, minimal time was picked up. > > First, I test the syntax highlighting speed: > Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | > redraw! | endfor | q" > Original version: 10.6 seconds > Modified version: 8.5 seconds > The difference is about 25%. > > Second, I did some grepping through Vim sources again: > Command: gvim.exe -c "vimgrep /a.*z/ *.c | q" > Original version: 6.6 seconds > Modified version: 5.6 seconds > The difference is about 15%. That's very useful, thanks for diving into this. -- >From "know your smileys": 8<}} Glasses, big nose, beard /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
Hi Bram Moolenaar, you wrote: > > Speed should be OK this way, but it does keep up to 32 Kbyte allocated. > That may not seem much, but if we do this in many places it quickly adds > up. Any "keep limit" greater than initial size (e.g. 16384 bytes) will give the same effect in many cases. By "many cases" here I mean cases when the stack just doesn't grow more than 512 bytes (very usual for syntax highlighting). Is 16384 still too big? Regexp matching and syntax highlighting is one of the most important things in VIM so maybe it's tolerable? Also, as far as I understand, previosly (version 6) VIM used program stack as regstack, right? Compared with program stack solution current solution with the change proposed is much better because program stack never shrinks - once it grows to 1M, for example, the memory won't be ever given back. At least, this is so on Windows, Linux and I guess most Unixes. > Can you show the benchmarks you used to see the performance and the > stack space that is being used? Otherwise we keep guessing the best > numbers. OK. Platform used for investigations: x86, Windows XP SP 2. Pentium 4 Northwood, 2.4 GHz, 512M RAM. I did 2 things: understanding stack usage and performance measurement. To understand the stack usage I added some simple logging to regexp.c: printing ga_maxlen before regstack and backpos clearing and forced the arrays to have the grow size 1 (so that ga_maxlen will be high watermark in bytes). Of course, for performance investigations I used version with normal grow size and without logging. The version with logging was used to perform the following: 1. With syntax highlighting on, the following files were viewed from gg to G (with PgDn) and the following high watermark of stack size was observed: spell.c (444 bytes), $VIMRUNTIME/filetype.vim (820 bytes), big text file with a lot of syntax errors (252 bytes) 2. Command gvim -c "vimgrep /a\(.\)*z/ *.c | q" was executed in VIM 7 source directory. Stack watermark - 31008 bytes. This is example of non-optimal regexp which tends to take a lot of stack space. Similar other test cases were tried leading to the following conclusions: 1) there is a lot of vim_regexec_both calls during syntax highlighting which work in very shallow stacks (<1K); 2) when user searches for something with regexp there are cases when regular expression can require big amount of memory (>10K). The performance measurements were done against original version (7.0.188) and modified regexp.c (initial: 8192, keep limit: 16384). Each measurement was performed 3 times, minimal time was picked up. First, I test the syntax highlighting speed: Command: gvim.exe -f $VIMRUNTIME/filetype.vim -c "for i in range(199) | redraw! | endfor | q" Original version: 10.6 seconds Modified version: 8.5 seconds The difference is about 25%. Second, I did some grepping through Vim sources again: Command: gvim.exe -c "vimgrep /a.*z/ *.c | q" Original version: 6.6 seconds Modified version: 5.6 seconds The difference is about 15%. > Coding detail: please don't use "if (!number)", use "if (number == 0)", > that is so much easier to read. Checking if ga_data is NULL would be > simpler. Got it - no problem. -- Alexei Alexandrov
Re: Vim 7 performance notes
Alexei Alexandrov wrote: > > > > The idea to gradually increase the chunk size makes sense, but not > > everywhere. For the syntax stack it's probably better to start with a > > stack that is mostly needed, then growing quite quickly (say double the > > chunk size every time). That's because when recursive things are > > involved we need much more space. And copying the stack to another > > place is more expensive than clearing with zeroes. > > > > Perhaps you can do some investigation about what the size mostly ends up > > to be. Then use that with a special version of ga_grow() that increases > > the chunk size every time. > > I've also tried the approach with persisting the regstack and backpos > allocation across calls to vim_regexec_both. It seems to work even > better than increasing the grow size in my particular test cases. And > I can't think up any situation when it can slow things down. Could you > please take a look at the patch attached and provide your opinion? Speed should be OK this way, but it does keep up to 32 Kbyte allocated. That may not seem much, but if we do this in many places it quickly adds up. Can you show the benchmarks you used to see the performance and the stack space that is being used? Otherwise we keep guessing the best numbers. Coding detail: please don't use "if (!number)", use "if (number == 0)", that is so much easier to read. Checking if ga_data is NULL would be simpler. -- >From "know your smileys": :q vi user saying, "How do I get out of this damn emacs editor?" /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
On 2/4/07, Yakov Lerner <[EMAIL PROTECTED]> wrote: Gnu malloc (glibc) is exceptionally fast, iirc. Yes. It's based on Doug Lea's malloc: http://gee.cs.oswego.edu/dl/html/malloc.html nikolai
Re: Vim 7 performance notes
On 2/4/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote: Alexei Alexandrov wrote: > I'm doing some performance investigations of Vim code trying to > understand whether there are any possibilities to improve it. > Currently I've made the following observations (all investigations are > done on Windows): > > Redundant work is done during regexp operations in syntax > highlighting. On some files it is very noticable. The stack of the > hotspot is ... > syn_current_attr > syn_regexec > vim_regexec_multi > > vim_regexec_both > regtry > regmatch > ga_grow > alloc_clear > memset. > So alloc_clear spends quite a few clockticks in lalloc() and memset(). > The reason for this is pessimistically big grow size for regset > growing array: > > ga_init2(®stack, 1, 1); > > This is not very good: many regexp operations don't go deep - > non-match is detected very quickly. But even one element on the stack > will lead to allocating at least 1 bytes (which should be fast > with good CRT memory allocator) and (worse) initializing these 1 > bytes with zeros (won't be that fast). > > One possible solution would be to keep regstack alive across calls to > vim_regexec_both, but I'm not sure if it's can be done safely. What I > did was replacing the grow size with smaller number and making the > grow size for growing arrays dynamic with increase of 25%: > > --- regexp.c (revision 136) > +++ regexp.c (working copy) > @@ -3350,7 +3350,7 @@ > /* Init the regstack empty. Use an item size of 1 byte, since we push > * different things onto it. Use a large grow size to avoid reallocating > * it too often. */ > -ga_init2(®stack, 1, 1); > +ga_init2(®stack, 1, 64); > > /* Init the backpos table empty. */ > ga_init2(&backpos, sizeof(backpos_T), 10); > > --- misc2.c (revision 136) > +++ misc2.c (working copy) > @@ -1905,6 +1905,7 @@ > > { > if (n < gap->ga_growsize) > n = gap->ga_growsize; > +gap->ga_growsize += (gap->ga_growsize >> 2); > len = gap->ga_itemsize * (gap->ga_len + n); > pp = alloc_clear((unsigned)len); > if (pp == NULL) > > With this change I can see serious performance improvements, but I'm > not sure if they are safe. > Bram, does it look making any sense? This is a tradeoff between allocating too much and allocating too often. What works best completely depends on the implementation of malloc() and free(). I know that there are many implementations where they are quite slow. Then allocating larger chunks less often is better. On some systems they are fast and the chunks can be smaller. Gnu malloc (glibc) is exceptionally fast, iirc. It is possible to benchmark the malloc speed during the ./configure time. And auto-select the initital size depending on the results. The procmail this similar technique in configure: It automatically benchmarks it's own builtin strstr() vs system's strstr() and selects the one which is faster. Yakov
Re: Vim 7 performance notes
Alexei Alexandrov wrote: > I'm doing some performance investigations of Vim code trying to > understand whether there are any possibilities to improve it. > Currently I've made the following observations (all investigations are > done on Windows): > > Redundant work is done during regexp operations in syntax > highlighting. On some files it is very noticable. The stack of the > hotspot is ... > syn_current_attr > syn_regexec > vim_regexec_multi > > vim_regexec_both > regtry > regmatch > ga_grow > alloc_clear > memset. > So alloc_clear spends quite a few clockticks in lalloc() and memset(). > The reason for this is pessimistically big grow size for regset > growing array: > > ga_init2(®stack, 1, 1); > > This is not very good: many regexp operations don't go deep - > non-match is detected very quickly. But even one element on the stack > will lead to allocating at least 1 bytes (which should be fast > with good CRT memory allocator) and (worse) initializing these 1 > bytes with zeros (won't be that fast). > > One possible solution would be to keep regstack alive across calls to > vim_regexec_both, but I'm not sure if it's can be done safely. What I > did was replacing the grow size with smaller number and making the > grow size for growing arrays dynamic with increase of 25%: > > --- regexp.c (revision 136) > +++ regexp.c (working copy) > @@ -3350,7 +3350,7 @@ > /* Init the regstack empty. Use an item size of 1 byte, since we push > * different things onto it. Use a large grow size to avoid reallocating > * it too often. */ > -ga_init2(®stack, 1, 1); > +ga_init2(®stack, 1, 64); > > /* Init the backpos table empty. */ > ga_init2(&backpos, sizeof(backpos_T), 10); > > --- misc2.c (revision 136) > +++ misc2.c (working copy) > @@ -1905,6 +1905,7 @@ > > { > if (n < gap->ga_growsize) > n = gap->ga_growsize; > +gap->ga_growsize += (gap->ga_growsize >> 2); > len = gap->ga_itemsize * (gap->ga_len + n); > pp = alloc_clear((unsigned)len); > if (pp == NULL) > > With this change I can see serious performance improvements, but I'm > not sure if they are safe. > Bram, does it look making any sense? This is a tradeoff between allocating too much and allocating too often. What works best completely depends on the implementation of malloc() and free(). I know that there are many implementations where they are quite slow. Then allocating larger chunks less often is better. On some systems they are fast and the chunks can be smaller. The idea to gradually increase the chunk size makes sense, but not everywhere. For the syntax stack it's probably better to start with a stack that is mostly needed, then growing quite quickly (say double the chunk size every time). That's because when recursive things are involved we need much more space. And copying the stack to another place is more expensive than clearing with zeroes. Perhaps you can do some investigation about what the size mostly ends up to be. Then use that with a special version of ga_grow() that increases the chunk size every time. -- hundred-and-one symptoms of being an internet addict: 87. Everyone you know asks why your phone line is always busy ...and you tell them to send an e-mail. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: Vim 7 performance notes
On Thu, Feb 01, 2007 at 11:20:16PM +0300, Alexei Alexandrov wrote: > Hi Bram et al., > > I'm doing some performance investigations of Vim code trying to understand > whether there are any possibilities to improve it. > Currently I've made the following observations (all investigations are done > on Windows): > > Redundant work is done during regexp operations in syntax highlighting. On > some files it is very noticable. The stack of the hotspot is ... > > syn_current_attr > syn_regexec > vim_regexec_multi > vim_regexec_both > > regtry > regmatch > ga_grow > alloc_clear > memset. So alloc_clear spends > quite a few clockticks in lalloc() and memset(). > The reason for this is pessimistically big grow size for regset growing array: > > ga_init2(®stack, 1, 1); > > This is not very good: many regexp operations don't go deep - non-match is > detected very quickly. But even one element on the stack will lead to > allocating at least 1 bytes (which should be fast with good CRT memory > allocator) and (worse) initializing these 1 bytes with zeros (won't be > that fast). I am not sure if this is really relevant to vim, because I don't have a clue which regexp matching algorithm it is using, but you might want to look into this article, when it comes to regexp speed: http://swtch.com/~rsc/regexp/regexp1.html regards, Martin
Vim 7 performance notes
Hi Bram et al., I'm doing some performance investigations of Vim code trying to understand whether there are any possibilities to improve it. Currently I've made the following observations (all investigations are done on Windows): Redundant work is done during regexp operations in syntax highlighting. On some files it is very noticable. The stack of the hotspot is ... > syn_current_attr > syn_regexec > vim_regexec_multi > vim_regexec_both > regtry > regmatch > ga_grow > alloc_clear > memset. So alloc_clear spends quite a few clockticks in lalloc() and memset(). The reason for this is pessimistically big grow size for regset growing array: ga_init2(®stack, 1, 1); This is not very good: many regexp operations don't go deep - non-match is detected very quickly. But even one element on the stack will lead to allocating at least 1 bytes (which should be fast with good CRT memory allocator) and (worse) initializing these 1 bytes with zeros (won't be that fast). One possible solution would be to keep regstack alive across calls to vim_regexec_both, but I'm not sure if it's can be done safely. What I did was replacing the grow size with smaller number and making the grow size for growing arrays dynamic with increase of 25%: --- regexp.c (revision 136) +++ regexp.c (working copy) @@ -3350,7 +3350,7 @@ /* Init the regstack empty. Use an item size of 1 byte, since we push * different things onto it. Use a large grow size to avoid reallocating * it too often. */ -ga_init2(®stack, 1, 1); +ga_init2(®stack, 1, 64); /* Init the backpos table empty. */ ga_init2(&backpos, sizeof(backpos_T), 10); --- misc2.c (revision 136) +++ misc2.c (working copy) @@ -1905,6 +1905,7 @@ { if (n < gap->ga_growsize) n = gap->ga_growsize; +gap->ga_growsize += (gap->ga_growsize >> 2); len = gap->ga_itemsize * (gap->ga_len + n); pp = alloc_clear((unsigned)len); if (pp == NULL) With this change I can see serious performance improvements, but I'm not sure if they are safe. Bram, does it look making any sense? -- Alexei Alexandrov