Hi George V. Reilly, you wrote: > > How did you measure the time in EnterCriticalSection and > LeaveCriticalSection? >
I discovered the problem by using a performance tuning tool which uses sampling approach to get statistical profile data. The intrusivity of this class of tools is very low. I verified that the problem exists by compiling Vim with unlocked getc() and measuring the difference (without any tool, just by $ time ...). > > If there's no lock contention, these routines are little more than > InterlockedIncrement and InterlockedDecrement, without a kernel transition > or blocking. > You're absolutely right, it doesn't need to switch the context when the CS is free. But InterlockedIncrement/Decrement is not that cheap. On uniprocessor machine it takes about 200 cycles of CPU cycles per atomic operation. Thus, EnterCS/LeaveCS pair will take about 400 cycles of CPU. The program which confirms these numbers is attached. So it means that to read 1 Mbyte of data with locking getc (this is roughly the size of Russian + English spl files) we need to pay 400e6 cycles for these useless attempts to syncronize. Given the frequency of my machine 2.4 GHz = 2400e6 we get that 400e6 means 1/6 of second which is 0.17 seconds - exactly the speedup I observed. And remember that you need to pay this price on every Vim startup. > > > In other words, if you're seeing significant time in Enter/LeaveCS, I > can think of two causes. Either your measurement tool has perturbed the > results, or there really is some multithreaded lock contention. The > former seems more likely, as Vim is single-threaded, but who knows what > some DLLs in the Vim process might be doing. > No, there isn't any contention. The critical section in Microsoft multi-threaded CRT is per-FILE* so it's impossible that any guy competes with you unless you give them the FILE *. As far as I can say, descriptor to opened spell file is absolutetly private inside spell.c Also, the numbers above show that the overhead is exactly this without any contention. If there were competition, the overhead would be much bigger. > > > I would be vary wary of using the _getc_nolock macro until we understand > why you are seeing those results. > > -- Alexei Alexandrov