Re: [dev] x86 osl/interlck.h performance

2006-05-30 Thread Jens-Heiner Rechtien
Hi, between SRC680 m164 and SRC680 m170 some important performance improvements have been integrated, most notably is the empty string no longer reference counted. This has significantly reduced the number of reference counter calls. I redid the measurement to see if there is still a

Re: [dev] x86 osl/interlck.h performance

2006-05-30 Thread Niklas Nebel
Jens-Heiner Rechtien wrote: BTW column 1 and 2 are directly comparable to the columns below, a 23% improvement from m164 to m170, wow! A large part of that might be due to issue 64109, which was introduced in m162 and fixed in m167. Niklas

Re: [dev] x86 osl/interlck.h performance

2006-05-15 Thread Kay Ramme - Sun Germany - Hamburg
Hi Ross, Ross Johnson wrote: Jens-Heiner Rechtien wrote: Heiner is on vacation this week, so, I jump in ... ;-) Great result for older machines, which is, I assume, where any improvement is needed most. I'm curious as to why the call overhead is Yep, this is obviously more important for

Re: [dev] x86 osl/interlck.h performance

2006-05-12 Thread Jens-Heiner Rechtien
Hi, I've done some additional very simple minded measurements to estimate the effects of inling the reference counters and the potential overhead for checking if we are on a SMP system. I got the following numbers: I: inlining NOI:no-inlining SMPC: SMP check NOSMPC: no SMP check

Re: [dev] x86 osl/interlck.h performance

2006-05-12 Thread Ross Johnson
Jens-Heiner Rechtien wrote: Hi, I've done some additional very simple minded measurements to estimate the effects of inling the reference counters and the potential overhead for checking if we are on a SMP system. I got the following numbers: I: inlining NOI:no-inlining SMPC:

Re: [dev] x86 osl/interlck.h performance

2006-04-27 Thread Cor Nouws
Kay Ramme - Sun Germany - Hamburg wrote: 8% would really be a great improvement, Indeed! The tests by Jens-Hiener recorded total time: starting SO - loading ods - closing SO. So when SO is running, and the user only loads the document :-) Good news - thanks! Cor -- Cor Nouws

Re: [dev] x86 osl/interlck.h performance

2006-04-26 Thread Jens-Heiner Rechtien
Thorsten Behrens wrote: Jens-Heiner Rechtien [EMAIL PROTECTED] writes: BTW, on newer processors (P4, Xeon etc) the lock prefix shouldn't be that expensive, because if the target memory of the instruction is cacheable the CPU will not assert the Lock# signal (which locks the bus) but only lock

Re: [dev] x86 osl/interlck.h performance

2006-04-26 Thread Jens-Heiner Rechtien
Hi Ross, thanks for your numbers. So it looks like the lock prefix inside the reference counters will have on older processors - exactly where it's not needed at all - an impact which dwarfs even the costs for not inlining the reference counter. I'll have a look at it. Heiner Ross Johnson

Re: [dev] x86 osl/interlck.h performance

2006-04-26 Thread Kay Ramme - Sun Germany - Hamburg
Thorsten, Thorsten Behrens wrote: Apart from that, with the integration of the UNO threading framework, there's opportunity to bin thread-safe implementations at tons of places, isn't there? Yes and no, with the introduction of the UNO threading framework (see

Re: [dev] x86 osl/interlck.h performance

2006-04-26 Thread Kay Ramme - Sun Germany - Hamburg
Kay Ramme - Sun Germany - Hamburg wrote: Unfortunately, many of these in-/decrements are generated on behalf of the rtl strings, which are not yet planned to be replaced with a thread unsafe version (despite the fact that this is certainly possible). Regarding the (rtl, tools) strings, I think

Re: [dev] x86 osl/interlck.h performance

2006-04-26 Thread Jens-Heiner Rechtien
Hi, I did some measurements with a copy of SRC680 m164 and one of the more pathological calc documents, and found that the lock prefix indeed imposes a significant overhead of about 8% on a non HT 1.8 GHz Pentium IV. (The tests included starting StarOffice, loading the document and closing

Re: [dev] x86 osl/interlck.h performance

2006-04-25 Thread Thorsten Behrens
Jens-Heiner Rechtien [EMAIL PROTECTED] writes: BTW, on newer processors (P4, Xeon etc) the lock prefix shouldn't be that expensive, because if the target memory of the instruction is cacheable the CPU will not assert the Lock# signal (which locks the bus) but only lock the affected cache

Re: [dev] x86 osl/interlck.h performance

2006-04-24 Thread Jens-Heiner Rechtien
Ross Johnson wrote: On Fri, 2006-04-21 at 18:32 +0200, Jens-Heiner Rechtien wrote: Ross Johnson wrote: On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote: Hi all, Someone recently mentioned that osl_increment/decrementInterlockedCount would show up as top scorers with certain

Re: [dev] x86 osl/interlck.h performance

2006-04-24 Thread Jens-Heiner Rechtien
Ross Johnson wrote: On Fri, 2006-04-21 at 18:22 +0200, Jens-Heiner Rechtien wrote: I can't see what we could do about the costs of the lock instruction on x86. I mean, if we need an atomic increment/decrement for our reference counter we can't work with non-atomic instructions here,

Re: [dev] x86 osl/interlck.h performance

2006-04-21 Thread Thorsten Behrens
Stephan Bergmann [EMAIL PROTECTED] writes: So, depending on CPU type, the version with LOCK is 2--8 times slower than the version without LOCK. Would be interesting to see whether this has any actual impact on overall OOo performance. Hm. First off, I'd try to inline those asm snippets

Re: [dev] x86 osl/interlck.h performance

2006-04-21 Thread Ross Johnson
On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote: Hi all, Someone recently mentioned that osl_increment/decrementInterlockedCount would show up as top scorers with certain profiling tools (vtune?). That got me thinking. On both Linux x86 and Windows x86, those functions are

Re: [dev] x86 osl/interlck.h performance

2006-04-21 Thread Jens-Heiner Rechtien
Thorsten Behrens wrote: Stephan Bergmann [EMAIL PROTECTED] writes: So, depending on CPU type, the version with LOCK is 2--8 times slower than the version without LOCK. Would be interesting to see whether this has any actual impact on overall OOo performance. Hm. First off, I'd try to inline

Re: [dev] x86 osl/interlck.h performance

2006-04-21 Thread Daniel Boelzle
I also found from timing tests using hand-optimised assembler that calls to the Win32 API Interlocked routines appeared to be optimised when the code is compiled by MSVC, but not GCC (say). It was as though MSVC was emitting optimised assembler on the fly instead of calling the routines in

Re: [dev] x86 osl/interlck.h performance

2006-04-21 Thread Ross Johnson
On Fri, 2006-04-21 at 18:32 +0200, Jens-Heiner Rechtien wrote: Ross Johnson wrote: On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote: Hi all, Someone recently mentioned that osl_increment/decrementInterlockedCount would show up as top scorers with certain profiling tools