Hi,
between SRC680 m164 and SRC680 m170 some important performance
improvements have been integrated, most notably is the empty string no
longer reference counted. This has significantly reduced the number of
reference counter calls. I redid the measurement to see if there is
still a
Jens-Heiner Rechtien wrote:
BTW column 1 and 2 are directly comparable to the columns below, a 23%
improvement from m164 to m170, wow!
A large part of that might be due to issue 64109, which was introduced
in m162 and fixed in m167.
Niklas
Hi Ross,
Ross Johnson wrote:
Jens-Heiner Rechtien wrote:
Heiner is on vacation this week, so, I jump in ... ;-)
Great result for older machines, which is, I assume, where any
improvement is needed most. I'm curious as to why the call overhead is
Yep, this is obviously more important for
Hi,
I've done some additional very simple minded measurements to estimate
the effects of inling the reference counters and the potential overhead
for checking if we are on a SMP system. I got the following numbers:
I: inlining
NOI:no-inlining
SMPC: SMP check
NOSMPC: no SMP check
Jens-Heiner Rechtien wrote:
Hi,
I've done some additional very simple minded measurements to estimate
the effects of inling the reference counters and the potential
overhead for checking if we are on a SMP system. I got the following
numbers:
I: inlining
NOI:no-inlining
SMPC:
Kay Ramme - Sun Germany - Hamburg wrote:
8% would really be a great improvement,
Indeed! The tests by Jens-Hiener recorded total time:
starting SO - loading ods - closing SO.
So when SO is running, and the user only loads the document :-)
Good news - thanks!
Cor
--
Cor Nouws
Thorsten Behrens wrote:
Jens-Heiner Rechtien [EMAIL PROTECTED] writes:
BTW, on newer processors (P4, Xeon etc) the lock prefix shouldn't be
that expensive, because if the target memory of the instruction is
cacheable the CPU will not assert the Lock# signal (which locks the
bus) but only lock
Hi Ross,
thanks for your numbers. So it looks like the lock prefix inside the
reference counters will have on older processors - exactly where it's
not needed at all - an impact which dwarfs even the costs for not
inlining the reference counter. I'll have a look at it.
Heiner
Ross Johnson
Thorsten,
Thorsten Behrens wrote:
Apart from that, with the integration of the UNO threading framework,
there's opportunity to bin thread-safe implementations at tons of
places, isn't there?
Yes and no, with the introduction of the UNO threading framework (see
Kay Ramme - Sun Germany - Hamburg wrote:
Unfortunately, many of these in-/decrements are generated on behalf of
the rtl strings, which are not yet planned to be replaced with a thread
unsafe version (despite the fact that this is certainly possible).
Regarding the (rtl, tools) strings, I think
Hi,
I did some measurements with a copy of SRC680 m164 and one of the more
pathological calc documents, and found that the lock prefix indeed
imposes a significant overhead of about 8% on a non HT 1.8 GHz Pentium IV.
(The tests included starting StarOffice, loading the document and
closing
Jens-Heiner Rechtien [EMAIL PROTECTED] writes:
BTW, on newer processors (P4, Xeon etc) the lock prefix shouldn't be
that expensive, because if the target memory of the instruction is
cacheable the CPU will not assert the Lock# signal (which locks the
bus) but only lock the affected cache
Ross Johnson wrote:
On Fri, 2006-04-21 at 18:32 +0200, Jens-Heiner Rechtien wrote:
Ross Johnson wrote:
On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote:
Hi all,
Someone recently mentioned that osl_increment/decrementInterlockedCount
would show up as top scorers with certain
Ross Johnson wrote:
On Fri, 2006-04-21 at 18:22 +0200, Jens-Heiner Rechtien wrote:
I can't see what we could do about the costs of the lock instruction
on x86. I mean, if we need an atomic increment/decrement for our
reference counter we can't work with non-atomic instructions here,
Hi all,
Someone recently mentioned that osl_increment/decrementInterlockedCount
would show up as top scorers with certain profiling tools (vtune?).
That got me thinking. On both Linux x86 and Windows x86, those
functions are implemented in assembler, effectively consisting of a
Stephan Bergmann [EMAIL PROTECTED] writes:
So, depending on CPU type, the version with LOCK is 2--8 times slower
than the version without LOCK. Would be interesting to see whether
this has any actual impact on overall OOo performance.
Hm. First off, I'd try to inline those asm snippets
On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote:
Hi all,
Someone recently mentioned that osl_increment/decrementInterlockedCount
would show up as top scorers with certain profiling tools (vtune?).
That got me thinking. On both Linux x86 and Windows x86, those
functions are
Thorsten Behrens wrote:
Stephan Bergmann [EMAIL PROTECTED] writes:
So, depending on CPU type, the version with LOCK is 2--8 times slower
than the version without LOCK. Would be interesting to see whether
this has any actual impact on overall OOo performance.
Hm. First off, I'd try to inline
I also found from timing tests using hand-optimised assembler that calls
to the Win32 API Interlocked routines appeared to be optimised when the
code is compiled by MSVC, but not GCC (say). It was as though MSVC was
emitting optimised assembler on the fly instead of calling the routines
in
On Fri, 2006-04-21 at 18:32 +0200, Jens-Heiner Rechtien wrote:
Ross Johnson wrote:
On Fri, 2006-04-21 at 15:09 +0200, Stephan Bergmann wrote:
Hi all,
Someone recently mentioned that osl_increment/decrementInterlockedCount
would show up as top scorers with certain profiling tools
20 matches
Mail list logo