On 29.06.2013 05:35, Konstantin Belousov wrote:
On Sat, Jun 29, 2013 at 01:15:19AM +0300, Alexander Motin wrote:
On 28.06.2013 09:57, Konstantin Belousov wrote:
On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote:
While doing some profiles of GEOM/CAM IOPS scalability, on some
On Sat, Jun 29, 2013 at 10:06:11AM +0300, Alexander Motin wrote:
I understand that lock attempt will steal cache line from lock owner.
What I don't very understand is why avoiding it helps performance in
this case. Indeed, having mutex on own cache line will not let other
cores to steal
On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote:
Hi.
While doing some profiles of GEOM/CAM IOPS scalability, on some test
patterns I've noticed serious congestion with spinning on global
pbuf_mtx mutex inside getpbuf() and relpbuf(). Since that code is
already very
On 28.06.2013 09:57, Konstantin Belousov wrote:
On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote:
While doing some profiles of GEOM/CAM IOPS scalability, on some test
patterns I've noticed serious congestion with spinning on global
pbuf_mtx mutex inside getpbuf() and relpbuf().
.. i'd rather you narrow down _why_ it's performing better before committing it.
Otherwise it may just creep up again after someone does another change
in an unrelated part of the kernel.
You're using instructions-retired; how about using l1/l2 cache loads,
stores, etc? There's a lot more CPU
On 28.06.2013 18:14, Adrian Chadd wrote:
.. i'd rather you narrow down _why_ it's performing better before committing it.
If you have good guesses -- they are welcome. All those functions are so
small, that it is hard to imagine how congestion may happen there at
all. I have strong feeling
On 28 June 2013 08:37, Alexander Motin m...@freebsd.org wrote:
Otherwise it may just creep up again after someone does another change
in an unrelated part of the kernel.
Big win or small, TAILQ is still heavier then STAILQ, while it is not needed
there at all.
You can't make that assumption.
On Fri, Jun 28, 2013 at 08:14:42AM -0700, Adrian Chadd wrote:
.. i'd rather you narrow down _why_ it's performing better before committing
it.
Otherwise it may just creep up again after someone does another change
in an unrelated part of the kernel.
Or penalize some other set of machines
On Fri, Jun 28, 2013 at 8:56 AM, Adrian Chadd adr...@freebsd.org wrote:
On 28 June 2013 08:37, Alexander Motin m...@freebsd.org wrote:
Otherwise it may just creep up again after someone does another change
in an unrelated part of the kernel.
Big win or small, TAILQ is still heavier then
On 28.06.2013 18:56, Adrian Chadd wrote:
On 28 June 2013 08:37, Alexander Motin m...@freebsd.org wrote:
Otherwise it may just creep up again after someone does another change
in an unrelated part of the kernel.
Big win or small, TAILQ is still heavier then STAILQ, while it is not needed
there
On 28 June 2013 09:18, m...@freebsd.org wrote:
You can't make that assumption. I bet that if both pointers are in the
_same_ cache line, the overhead of maintaining a double linked list is
trivial.
No, it's not. A singly-linked SLIST only needs to modify the head of the
list and the
On 28.06.2013 09:57, Konstantin Belousov wrote:
On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote:
While doing some profiles of GEOM/CAM IOPS scalability, on some test
patterns I've noticed serious congestion with spinning on global
pbuf_mtx mutex inside getpbuf() and relpbuf().
On 28 June 2013 15:15, Alexander Motin m...@freebsd.org wrote:
I think it indeed may be a cache trashing. I've made some profiling for
getpbuf()/relpbuf() and found interesting results. With patched kernel using
SLIST profiling shows mostly one point of RESOURCE_STALLS.ANY in relpbuf()
--
On Sat, Jun 29, 2013 at 01:15:19AM +0300, Alexander Motin wrote:
On 28.06.2013 09:57, Konstantin Belousov wrote:
On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote:
While doing some profiles of GEOM/CAM IOPS scalability, on some test
patterns I've noticed serious congestion
Hi.
While doing some profiles of GEOM/CAM IOPS scalability, on some test
patterns I've noticed serious congestion with spinning on global
pbuf_mtx mutex inside getpbuf() and relpbuf(). Since that code is
already very simple, I've tried to optimize probably the only thing
possible there:
15 matches
Mail list logo