Bruce,
On Wed, Jun 26, 2013 at 11:42:39AM +1000, Bruce Evans wrote:
B Anyway, as Gleb said, there is no point in
B optimizing the i386 kernel.
B
B I said that there is every point in optimizing the i386 kernel. This
B applies even more to other 32-bit arches. Some CPUs are much slower
B
On Wed, 26 Jun 2013, Gleb Smirnoff wrote:
On Wed, Jun 26, 2013 at 11:42:39AM +1000, Bruce Evans wrote:
B Anyway, as Gleb said, there is no point in
B optimizing the i386 kernel.
B
B I said that there is every point in optimizing the i386 kernel. This
B applies even more to other 32-bit
On Tue, 25 Jun 2013, Konstantin Belousov wrote:
Updates to the counter cannot be done from the interrupt context.
This is fragile, however. It prevents using counters for things like
counting interrupts. Most interrupt counting is now done directlyly
and doesn't use PCPU_INC().
On Wed, 26 Jun 2013, Gleb Smirnoff wrote:
On Wed, Jun 26, 2013 at 11:42:39AM +1000, Bruce Evans wrote:
B Anyway, as Gleb said, there is no point in
B optimizing the i386 kernel.
B
B I said that there is every point in optimizing the i386 kernel. This
B applies even more to other 32-bit
On Tue, Jun 25, 2013 at 12:45:36PM +1000, Bruce Evans wrote:
On Mon, 24 Jun 2013, Konstantin Belousov wrote:
On Sun, Jun 23, 2013 at 07:57:57PM +1000, Bruce Evans wrote:
The case that can't be fixed by rereading the counters is when fetching
code runs in between the stores. If the stores
On Tue, 25 Jun 2013, Konstantin Belousov wrote:
On Tue, Jun 25, 2013 at 12:45:36PM +1000, Bruce Evans wrote:
On Mon, 24 Jun 2013, Konstantin Belousov wrote:
...
The following is the prototype for the x86. The other 64bit
architectures are handled exactly like amd64. For 32bit !x86 arches,
On Mon, Jun 24, 2013 at 11:16:33PM +1000, Bruce Evans wrote:
B K This is quite interesting idea, but I still did not decided if it
B K acceptable. The issue is that we could add the carry to the other
B K processor counter, if the preemption kicks in at right time between
B K two
On Tue, 25 Jun 2013, Gleb Smirnoff wrote:
On Mon, Jun 24, 2013 at 11:16:33PM +1000, Bruce Evans wrote:
B K This is quite interesting idea, but I still did not decided if it
B K acceptable. The issue is that we could add the carry to the other
B K processor counter, if the preemption kicks
On Tue, Jun 25, 2013 at 08:14:41PM +1000, Bruce Evans wrote:
On Tue, 25 Jun 2013, Konstantin Belousov wrote:
Updates to the counter cannot be done from the interrupt context.
This is fragile, however. It prevents using counters for things like
counting interrupts. Most interrupt
On Tue, 25 Jun 2013, Konstantin Belousov wrote:
On Tue, Jun 25, 2013 at 08:14:41PM +1000, Bruce Evans wrote:
On Tue, 25 Jun 2013, Konstantin Belousov wrote:
Updates to the counter cannot be done from the interrupt context.
This is fragile, however. It prevents using counters for things
On Sun, Jun 23, 2013 at 10:33:43AM +0300, Konstantin Belousov wrote:
K On Sat, Jun 22, 2013 at 06:58:15PM +1000, Bruce Evans wrote:
K So the i386 version be simply addl; adcl to memory. Each store in
K this is atomic at the per-CPU level. If there is no carry, then the
K separate stores are
Bruce,
did you run your benchmarks in userland or in kernel? How many
parallel threads were updating the same counter?
Can you please share your benchmarks?
--
Totus tuus, Glebius.
___
svn-src-head@freebsd.org mailing list
On Mon, 24 Jun 2013, Gleb Smirnoff wrote:
did you run your benchmarks in userland or in kernel? How many
parallel threads were updating the same counter?
Can you please share your benchmarks?
Only userland, with 1 thread.
I don't have any more benchmarks than the test program in previous
On Mon, 24 Jun 2013, Gleb Smirnoff wrote:
On Sun, Jun 23, 2013 at 10:33:43AM +0300, Konstantin Belousov wrote:
K On Sat, Jun 22, 2013 at 06:58:15PM +1000, Bruce Evans wrote:
K So the i386 version be simply addl; adcl to memory. Each store in
K this is atomic at the per-CPU level. If there
[snipping everything about counter64, atomic ops, cycles, etc.]
I wonder if the idea explained in this paper:
http://static.usenix.org/event/usenix03/tech/freenix03/full_papers/mcgarry/mcgarry_html/
Which seems to be used in FreeBSD for some ARM atomics:
On Sun, Jun 23, 2013 at 07:57:57PM +1000, Bruce Evans wrote:
The case that can't be fixed by rereading the counters is when fetching
code runs in between the stores. If the stores are on a another CPU
that is currently executing them, then we can keep checking that the
counters don't change
On Mon, 24 Jun 2013, Konstantin Belousov wrote:
On Sun, Jun 23, 2013 at 07:57:57PM +1000, Bruce Evans wrote:
The case that can't be fixed by rereading the counters is when fetching
code runs in between the stores. If the stores are on a another CPU
that is currently executing them, then we
On Tue, 25 Jun 2013, I wrote:
My current best design:
- use ordinary mutexes to protect counter fetches in non-per-CPU contexts.
- use native-sized or always 32-bit counters. Counter updates are done
by a single addl on i386. Fix pcpu.h on arches other than amd64 and
i386 and use the same
On 25/06/2013, at 12:54, Bruce Evans b...@optusnet.com.au wrote:
- run a daemon every few minutes to fetch all the counters, so that
the native-sized counters are in no danger of overflowing on systems
that don't run statistics programs often enough to fetch the counters
to actually use.
On Sat, Jun 22, 2013 at 01:37:58PM +1000, Bruce Evans wrote:
On Sat, 22 Jun 2013, I wrote:
...
Here are considerably expanded tests, with noninline tests dropped.
Summary of times on Athlon64:
simple increment: 4-7 cycles (1)
simple increment preceded
On Sat, Jun 22, 2013 at 06:58:15PM +1000, Bruce Evans wrote:
So the i386 version be simply addl; adcl to memory. Each store in
this is atomic at the per-CPU level. If there is no carry, then the
separate stores are equivalent to adding separate nonnegative values and
the counter value is
On Sun, 23 Jun 2013, Konstantin Belousov wrote:
On Sat, Jun 22, 2013 at 01:37:58PM +1000, Bruce Evans wrote:
On Sat, 22 Jun 2013, I wrote:
...
Here are considerably expanded tests, with noninline tests dropped.
Summary of times on Athlon64:
simple increment:
On Sun, 23 Jun 2013, Konstantin Belousov wrote:
On Sat, Jun 22, 2013 at 06:58:15PM +1000, Bruce Evans wrote:
So the i386 version be simply addl; adcl to memory. Each store in
this is atomic at the per-CPU level. If there is no carry, then the
separate stores are equivalent to adding separate
On Sun, 23 Jun 2013, I wrote:
I thought of lots of variations, but couldn't find one that works perfectly.
One idea (that goes with the sign check on the low 32 bits) is to use a
misaligned add to memory to copy the 31st bit as a carry bit to the the
high word. The value of the counter is
On Sat, 22 Jun 2013, I wrote:
On Sat, 22 Jun 2013, I wrote:
...
Here are considerably expanded tests, with noninline tests dropped.
Summary of times on Athlon64:
simple increment: 4-7 cycles (1)
simple increment preceded by feature test: 5-8 cycles (1)
On Fri, Jun 21, 2013 at 12:15:24PM +1000, Lawrence Stewart wrote:
Hi Kostik,
On 06/21/13 00:30, Konstantin Belousov wrote:
Author: kib
Date: Thu Jun 20 14:30:04 2013
New Revision: 252032
URL: http://svnweb.freebsd.org/changeset/base/252032
Log:
Allow immediate operand.
Bruce,
On Fri, Jun 21, 2013 at 09:04:34AM +1000, Bruce Evans wrote:
B The i386 version of the counter asm doesn't support the immediate
B constraint for technical reasons. 64 bit counters are too large and
B slow to use on i386, especially when they are implemented as they are
B without
On Fri, 21 Jun 2013, Gleb Smirnoff wrote:
On Fri, Jun 21, 2013 at 09:04:34AM +1000, Bruce Evans wrote:
B The i386 version of the counter asm doesn't support the immediate
B constraint for technical reasons. 64 bit counters are too large and
B slow to use on i386, especially when they are
Bruce,
On Fri, Jun 21, 2013 at 09:02:36PM +1000, Bruce Evans wrote:
B Not if it is a 32-bit increment on 32-bit systems, as it should be.
B
B I said to use a daemon to convert small (16 or 32 bit) counters into
B larger (32 or 64 bit) ones. It is almost as efficient to call the
B accumulation
On Fri, 21 Jun 2013, Gleb Smirnoff wrote:
On Fri, Jun 21, 2013 at 09:02:36PM +1000, Bruce Evans wrote:
B Not if it is a 32-bit increment on 32-bit systems, as it should be.
B
B I said to use a daemon to convert small (16 or 32 bit) counters into
B larger (32 or 64 bit) ones. It is almost as
On Sat, 22 Jun 2013, I wrote:
...
Here are considerably expanded tests, with noninline tests dropped.
Summary of times on Athlon64:
simple increment: 4-7 cycles (1)
simple increment preceded by feature test: 5-8 cycles (1)
simple 32-bit increment:
Author: kib
Date: Thu Jun 20 14:30:04 2013
New Revision: 252032
URL: http://svnweb.freebsd.org/changeset/base/252032
Log:
Allow immediate operand.
Sponsored by: The FreeBSD Foundation
Modified:
head/sys/amd64/include/counter.h
Modified: head/sys/amd64/include/counter.h
On Thu, 20 Jun 2013, Konstantin Belousov wrote:
Log:
Allow immediate operand.
..
Modified: head/sys/amd64/include/counter.h
==
--- head/sys/amd64/include/counter.hThu Jun 20 14:20:03 2013
(r252031)
+++
On Fri, 21 Jun 2013, I wrote:
On Thu, 20 Jun 2013, Konstantin Belousov wrote:
...
@@ -44,7 +44,7 @@ counter_u64_add(counter_u64_t c, int64_t
...
The i386 version of the counter asm doesn't support the immediate
constraint for technical reasons. 64 bit counters are too large and
slow to use
On Fri, 21 Jun 2013, Bruce Evans wrote:
On Fri, 21 Jun 2013, I wrote:
On Thu, 20 Jun 2013, Konstantin Belousov wrote:
...
@@ -44,7 +44,7 @@ counter_u64_add(counter_u64_t c, int64_t
...
The i386 version of the counter asm doesn't support the immediate
constraint for technical reasons. 64
Hi Kostik,
On 06/21/13 00:30, Konstantin Belousov wrote:
Author: kib
Date: Thu Jun 20 14:30:04 2013
New Revision: 252032
URL: http://svnweb.freebsd.org/changeset/base/252032
Log:
Allow immediate operand.
Sponsored by: The FreeBSD Foundation
Modified:
36 matches
Mail list logo