Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Gleb Smirnoff
On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
A  Doesn't this padding to cache line size only help x86 processors in an
A  SMP kernel?  I was expecting to see some #ifdef SMP so that we don't pay
A  a big price for no gain in small-memory ARM systems and such.  But maybe
A  I'm misunderstanding the reason for the padding.
A 
A I didn't want to do this because this would be meaning that SMP option
A may become a completely killer for modules/kernel ABI compatibility.

Do we support loading non-SMP modules on SMP kernel and vice versa?

-- 
Totus tuus, Glebius.
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
 On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
 A  Doesn't this padding to cache line size only help x86 processors in an
 A  SMP kernel?  I was expecting to see some #ifdef SMP so that we don't
 pay
 A  a big price for no gain in small-memory ARM systems and such.  But
 maybe
 A  I'm misunderstanding the reason for the padding.
 A
 A I didn't want to do this because this would be meaning that SMP option
 A may become a completely killer for modules/kernel ABI compatibility.

 Do we support loading non-SMP modules on SMP kernel and vice versa?

Actually that's my point, we do.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On 10/31/12, Andre Oppermann an...@freebsd.org wrote:
 On 31.10.2012 19:10, Attilio Rao wrote:
 On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:
 Author: attilio
 Date: Wed Oct 31 18:07:18 2012
 New Revision: 242402
 URL: http://svn.freebsd.org/changeset/base/242402

 Log:
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.

 Interested developers can now dig and look for other mutexes to
 convert and just do it.
 Please, however, try to enclose a description about the benchmark
 which lead you believe the necessity to pad the mutex and possibly
 some numbers, in particular when the lock belongs to structures or the
 ABI itself.

 Next steps involve porting the same mtx(9) changes to rwlock(9) and
 port pvh global pmap lock to rwlock_padalign.

 I'd say for an rwlock you can make it unconditional.  The very purpose
 of it is to be aquired by multiple CPU's causing cache line dirtying
 for every concurrent reader.  Rwlocks are only ever used because multiple
 concurrent readers are expected.

I thought about it, but I think the same arguments as for mutexes remains.
The real problem is that having default rwlocks pad-aligned will put
showstoppers for their usage in sensitive structures. For example, I
have plans to use them in vm_object at some point to replace
VM_OBJECT_LOCK and I do want to avoid the extra-bloat for such
structures.

Also, please keep in mind that there is no direct relation between
read acquisition and high contention with the latter being the
real reason for having pad-aligned locks.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Ian Lepore
On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
 On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
  On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
  A  Doesn't this padding to cache line size only help x86 processors in an
  A  SMP kernel?  I was expecting to see some #ifdef SMP so that we don't
  pay
  A  a big price for no gain in small-memory ARM systems and such.  But
  maybe
  A  I'm misunderstanding the reason for the padding.
  A
  A I didn't want to do this because this would be meaning that SMP option
  A may become a completely killer for modules/kernel ABI compatibility.
 
  Do we support loading non-SMP modules on SMP kernel and vice versa?
 
 Actually that's my point, we do.
 
 Attilio
 
 

Well we've got other similar problems lurking then.  What about a module
compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
it gets run on a different arm system whose kernel is compiled with
#define CACHE_LINE_SIZE 64?

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On Thu, Nov 1, 2012 at 2:01 PM, Ian Lepore
free...@damnhippie.dyndns.org wrote:
 On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
 On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
  On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
  A  Doesn't this padding to cache line size only help x86 processors in an
  A  SMP kernel?  I was expecting to see some #ifdef SMP so that we don't
  pay
  A  a big price for no gain in small-memory ARM systems and such.  But
  maybe
  A  I'm misunderstanding the reason for the padding.
  A
  A I didn't want to do this because this would be meaning that SMP option
  A may become a completely killer for modules/kernel ABI compatibility.
 
  Do we support loading non-SMP modules on SMP kernel and vice versa?

 Actually that's my point, we do.

 Attilio



 Well we've got other similar problems lurking then.  What about a module
 compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
 it gets run on a different arm system whose kernel is compiled with
 #define CACHE_LINE_SIZE 64?

That should not happen. Is that a real case where you build a module
for an ARM family and want to run against a kernel compiled for
another?

CACHE_LINE_SIZE must not change during a STABLE release lifetime, of
course, for the same arch.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On Thu, Nov 1, 2012 at 2:05 PM, Attilio Rao atti...@freebsd.org wrote:
 On Thu, Nov 1, 2012 at 2:01 PM, Ian Lepore
 free...@damnhippie.dyndns.org wrote:
 On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
 On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
  On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
  A  Doesn't this padding to cache line size only help x86 processors in 
  an
  A  SMP kernel?  I was expecting to see some #ifdef SMP so that we don't
  pay
  A  a big price for no gain in small-memory ARM systems and such.  But
  maybe
  A  I'm misunderstanding the reason for the padding.
  A
  A I didn't want to do this because this would be meaning that SMP option
  A may become a completely killer for modules/kernel ABI compatibility.
 
  Do we support loading non-SMP modules on SMP kernel and vice versa?

 Actually that's my point, we do.

 Attilio



 Well we've got other similar problems lurking then.  What about a module
 compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
 it gets run on a different arm system whose kernel is compiled with
 #define CACHE_LINE_SIZE 64?

 That should not happen. Is that a real case where you build a module
 for an ARM family and want to run against a kernel compiled for
 another?

Besides that, the ARM CACHE_LINE_SIZE is defined in the shared headers
so there is no way this can be a problem.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Andre Oppermann

On 01.11.2012 12:53, Attilio Rao wrote:

On 10/31/12, Andre Oppermann an...@freebsd.org wrote:

On 31.10.2012 19:10, Attilio Rao wrote:

On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:

Author: attilio
Date: Wed Oct 31 18:07:18 2012
New Revision: 242402
URL: http://svn.freebsd.org/changeset/base/242402

Log:
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.


Interested developers can now dig and look for other mutexes to
convert and just do it.
Please, however, try to enclose a description about the benchmark
which lead you believe the necessity to pad the mutex and possibly
some numbers, in particular when the lock belongs to structures or the
ABI itself.

Next steps involve porting the same mtx(9) changes to rwlock(9) and
port pvh global pmap lock to rwlock_padalign.


I'd say for an rwlock you can make it unconditional.  The very purpose
of it is to be aquired by multiple CPU's causing cache line dirtying
for every concurrent reader.  Rwlocks are only ever used because multiple
concurrent readers are expected.


I thought about it, but I think the same arguments as for mutexes remains.
The real problem is that having default rwlocks pad-aligned will put
showstoppers for their usage in sensitive structures. For example, I
have plans to use them in vm_object at some point to replace
VM_OBJECT_LOCK and I do want to avoid the extra-bloat for such
structures.

Also, please keep in mind that there is no direct relation between
read acquisition and high contention with the latter being the
real reason for having pad-aligned locks.


I do not agree.  If there is no contention then there is no need for
a rwlock, a normal mutex would be sufficient.  A rwlock is used when
multiple concurrent readers are expected.  Each read lock and unlock
dirties the cache line for all other CPU's.

Please note that I don't want to prevent you from doing the work all
over for rwlocks.  It's just that the use case for a non-padded rwlock
is very narrow.

--
Andre

___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Ian Lepore
On Thu, 2012-11-01 at 14:07 +, Attilio Rao wrote:
 On Thu, Nov 1, 2012 at 2:05 PM, Attilio Rao atti...@freebsd.org wrote:
  On Thu, Nov 1, 2012 at 2:01 PM, Ian Lepore
  free...@damnhippie.dyndns.org wrote:
  On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
  On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
   On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
   A  Doesn't this padding to cache line size only help x86 processors 
   in an
   A  SMP kernel?  I was expecting to see some #ifdef SMP so that we 
   don't
   pay
   A  a big price for no gain in small-memory ARM systems and such.  But
   maybe
   A  I'm misunderstanding the reason for the padding.
   A
   A I didn't want to do this because this would be meaning that SMP 
   option
   A may become a completely killer for modules/kernel ABI compatibility.
  
   Do we support loading non-SMP modules on SMP kernel and vice versa?
 
  Actually that's my point, we do.
 
  Attilio
 
 
 
  Well we've got other similar problems lurking then.  What about a module
  compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
  it gets run on a different arm system whose kernel is compiled with
  #define CACHE_LINE_SIZE 64?
 
  That should not happen. Is that a real case where you build a module
  for an ARM family and want to run against a kernel compiled for
  another?
 
 Besides that, the ARM CACHE_LINE_SIZE is defined in the shared headers
 so there is no way this can be a problem.

I've been under the impression that in the ARM and MIPS worlds, the
cache line size can change from one family/series of chips to another,
just as support for SMP can change from one family to another.  If I'm
not mistaken in that assumption, then there can't be something like a
generic arm module that will run on any arm kernel regardless of how the
kernel was built, not if compile-time constants get cooked into the
binaries in a way that affects the ABI/KBI.

Back from some quick googling... yep, arm cortex-a8 processors have a
64-byte cache line size.  Maybe we don't support those yet, which is why
the value appears to be constant in arm param.h right now.

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On 11/1/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
 On Thu, 2012-11-01 at 14:07 +, Attilio Rao wrote:
 On Thu, Nov 1, 2012 at 2:05 PM, Attilio Rao atti...@freebsd.org wrote:
  On Thu, Nov 1, 2012 at 2:01 PM, Ian Lepore
  free...@damnhippie.dyndns.org wrote:
  On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
  On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
   On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
   A  Doesn't this padding to cache line size only help x86
   processors in an
   A  SMP kernel?  I was expecting to see some #ifdef SMP so that we
   don't
   pay
   A  a big price for no gain in small-memory ARM systems and such.
   But
   maybe
   A  I'm misunderstanding the reason for the padding.
   A
   A I didn't want to do this because this would be meaning that SMP
   option
   A may become a completely killer for modules/kernel ABI
   compatibility.
  
   Do we support loading non-SMP modules on SMP kernel and vice versa?
 
  Actually that's my point, we do.
 
  Attilio
 
 
 
  Well we've got other similar problems lurking then.  What about a
  module
  compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
  it gets run on a different arm system whose kernel is compiled with
  #define CACHE_LINE_SIZE 64?
 
  That should not happen. Is that a real case where you build a module
  for an ARM family and want to run against a kernel compiled for
  another?

 Besides that, the ARM CACHE_LINE_SIZE is defined in the shared headers
 so there is no way this can be a problem.

 I've been under the impression that in the ARM and MIPS worlds, the
 cache line size can change from one family/series of chips to another,
 just as support for SMP can change from one family to another.  If I'm
 not mistaken in that assumption, then there can't be something like a
 generic arm module that will run on any arm kernel regardless of how the
 kernel was built, not if compile-time constants get cooked into the
 binaries in a way that affects the ABI/KBI.

I'm far from being an ARM expert so I trust what you say.
This only means you cannot build a module for a family and expect to
retain ABI compatibility among all the ARM families. If cache-lines
are different I don't think there is much we can do, which has nothing
to do with pad-align locking.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Ian Lepore
On Thu, 2012-11-01 at 14:43 +, Attilio Rao wrote:
 On 11/1/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
  On Thu, 2012-11-01 at 14:07 +, Attilio Rao wrote:
  On Thu, Nov 1, 2012 at 2:05 PM, Attilio Rao atti...@freebsd.org wrote:
   On Thu, Nov 1, 2012 at 2:01 PM, Ian Lepore
   free...@damnhippie.dyndns.org wrote:
   On Thu, 2012-11-01 at 10:42 +, Attilio Rao wrote:
   On 11/1/12, Gleb Smirnoff gleb...@freebsd.org wrote:
On Wed, Oct 31, 2012 at 06:33:51PM +, Attilio Rao wrote:
A  Doesn't this padding to cache line size only help x86
processors in an
A  SMP kernel?  I was expecting to see some #ifdef SMP so that we
don't
pay
A  a big price for no gain in small-memory ARM systems and such.
But
maybe
A  I'm misunderstanding the reason for the padding.
A
A I didn't want to do this because this would be meaning that SMP
option
A may become a completely killer for modules/kernel ABI
compatibility.
   
Do we support loading non-SMP modules on SMP kernel and vice versa?
  
   Actually that's my point, we do.
  
   Attilio
  
  
  
   Well we've got other similar problems lurking then.  What about a
   module
   compiled on an arm system that had #define CACHE_LINE_SIZE 32 and then
   it gets run on a different arm system whose kernel is compiled with
   #define CACHE_LINE_SIZE 64?
  
   That should not happen. Is that a real case where you build a module
   for an ARM family and want to run against a kernel compiled for
   another?
 
  Besides that, the ARM CACHE_LINE_SIZE is defined in the shared headers
  so there is no way this can be a problem.
 
  I've been under the impression that in the ARM and MIPS worlds, the
  cache line size can change from one family/series of chips to another,
  just as support for SMP can change from one family to another.  If I'm
  not mistaken in that assumption, then there can't be something like a
  generic arm module that will run on any arm kernel regardless of how the
  kernel was built, not if compile-time constants get cooked into the
  binaries in a way that affects the ABI/KBI.
 
 I'm far from being an ARM expert so I trust what you say.
 This only means you cannot build a module for a family and expect to
 retain ABI compatibility among all the ARM families. If cache-lines
 are different I don't think there is much we can do, which has nothing
 to do with pad-align locking.
 

I do a lot of work with armv4 and recently v5 chips, but nothing with
the v6/v7 stuff yet, so I'm not really an expert on these issues either.
I've heard some talk from the folks working on arm v6/v7 support about
things like unified kernels and an arm GENERIC kernel config, but I'm
pretty hazy myself on how that vision is shaping up.

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Attilio Rao
On 11/1/12, Andre Oppermann an...@freebsd.org wrote:
 On 01.11.2012 12:53, Attilio Rao wrote:
 On 10/31/12, Andre Oppermann an...@freebsd.org wrote:
 On 31.10.2012 19:10, Attilio Rao wrote:
 On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org
 wrote:
 Author: attilio
 Date: Wed Oct 31 18:07:18 2012
 New Revision: 242402
 URL: http://svn.freebsd.org/changeset/base/242402

 Log:
 Rework the known mutexes to benefit about staying on their own
 cache line in order to avoid manual frobbing but using
 struct mtx_padalign.

 Interested developers can now dig and look for other mutexes to
 convert and just do it.
 Please, however, try to enclose a description about the benchmark
 which lead you believe the necessity to pad the mutex and possibly
 some numbers, in particular when the lock belongs to structures or the
 ABI itself.

 Next steps involve porting the same mtx(9) changes to rwlock(9) and
 port pvh global pmap lock to rwlock_padalign.

 I'd say for an rwlock you can make it unconditional.  The very purpose
 of it is to be aquired by multiple CPU's causing cache line dirtying
 for every concurrent reader.  Rwlocks are only ever used because
 multiple
 concurrent readers are expected.

 I thought about it, but I think the same arguments as for mutexes
 remains.
 The real problem is that having default rwlocks pad-aligned will put
 showstoppers for their usage in sensitive structures. For example, I
 have plans to use them in vm_object at some point to replace
 VM_OBJECT_LOCK and I do want to avoid the extra-bloat for such
 structures.

 Also, please keep in mind that there is no direct relation between
 read acquisition and high contention with the latter being the
 real reason for having pad-aligned locks.

 I do not agree.  If there is no contention then there is no need for
 a rwlock, a normal mutex would be sufficient.  A rwlock is used when
 multiple concurrent readers are expected.  Each read lock and unlock
 dirties the cache line for all other CPU's.

 Please note that I don't want to prevent you from doing the work all
 over for rwlocks.  It's just that the use case for a non-padded rwlock
 is very narrow.

So here is the patch for adding the decoupling infrastructure to
rwlock and add the padalign type:
http://www.freebsd.org/~attilio/rwlock_decoupled_padalign.patch

I've tested by converting some rwlocks in the system and everything
looks good to me.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-11-01 Thread Tim Kientzle
On Nov 1, 2012, at 7:37 AM, Ian Lepore wrote:
 
 Back from some quick googling... yep, arm cortex-a8 processors have a
 64-byte cache line size.  Maybe we don't support those yet, which is why
 the value appears to be constant in arm param.h right now.

Beaglebone runs a Cortex-A8.  There's a lot of folks playing with those.

Tim

___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Attilio Rao
On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:
 Author: attilio
 Date: Wed Oct 31 18:07:18 2012
 New Revision: 242402
 URL: http://svn.freebsd.org/changeset/base/242402

 Log:
   Rework the known mutexes to benefit about staying on their own
   cache line in order to avoid manual frobbing but using
   struct mtx_padalign.

Interested developers can now dig and look for other mutexes to
convert and just do it.
Please, however, try to enclose a description about the benchmark
which lead you believe the necessity to pad the mutex and possibly
some numbers, in particular when the lock belongs to structures or the
ABI itself.

Next steps involve porting the same mtx(9) changes to rwlock(9) and
port pvh global pmap lock to rwlock_padalign.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Ian Lepore
On Wed, 2012-10-31 at 18:10 +, Attilio Rao wrote:
 On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:
  Author: attilio
  Date: Wed Oct 31 18:07:18 2012
  New Revision: 242402
  URL: http://svn.freebsd.org/changeset/base/242402
 
  Log:
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.
 
 Interested developers can now dig and look for other mutexes to
 convert and just do it.
 Please, however, try to enclose a description about the benchmark
 which lead you believe the necessity to pad the mutex and possibly
 some numbers, in particular when the lock belongs to structures or the
 ABI itself.
 
 Next steps involve porting the same mtx(9) changes to rwlock(9) and
 port pvh global pmap lock to rwlock_padalign.
 
 Thanks,
 Attilio
 
 

Doesn't this padding to cache line size only help x86 processors in an
SMP kernel?  I was expecting to see some #ifdef SMP so that we don't pay
a big price for no gain in small-memory ARM systems and such.  But maybe
I'm misunderstanding the reason for the padding.

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Attilio Rao
On Wed, Oct 31, 2012 at 6:26 PM, Ian Lepore
free...@damnhippie.dyndns.org wrote:
 On Wed, 2012-10-31 at 18:10 +, Attilio Rao wrote:
 On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:
  Author: attilio
  Date: Wed Oct 31 18:07:18 2012
  New Revision: 242402
  URL: http://svn.freebsd.org/changeset/base/242402
 
  Log:
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.

 Interested developers can now dig and look for other mutexes to
 convert and just do it.
 Please, however, try to enclose a description about the benchmark
 which lead you believe the necessity to pad the mutex and possibly
 some numbers, in particular when the lock belongs to structures or the
 ABI itself.

 Next steps involve porting the same mtx(9) changes to rwlock(9) and
 port pvh global pmap lock to rwlock_padalign.

 Thanks,
 Attilio



 Doesn't this padding to cache line size only help x86 processors in an
 SMP kernel?  I was expecting to see some #ifdef SMP so that we don't pay
 a big price for no gain in small-memory ARM systems and such.  But maybe
 I'm misunderstanding the reason for the padding.

I didn't want to do this because this would be meaning that SMP option
may become a completely killer for modules/kernel ABI compatibility.

Also, if you look at the modified list of locks I don't think they
should be too much, I hardly believe ARM UP is going to hurt that much
from loosing some padding in tdq structures or callout.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Adrian Chadd
On 31 October 2012 11:33, Attilio Rao atti...@freebsd.org wrote:

 Doesn't this padding to cache line size only help x86 processors in an
 SMP kernel?  I was expecting to see some #ifdef SMP so that we don't pay
 a big price for no gain in small-memory ARM systems and such.  But maybe
 I'm misunderstanding the reason for the padding.

 I didn't want to do this because this would be meaning that SMP option
 may become a completely killer for modules/kernel ABI compatibility.

Right, but you didn't make it configurable for us embedded peeps who
still care about memory usage.

 Also, if you look at the modified list of locks I don't think they
 should be too much, I hardly believe ARM UP is going to hurt that much
 from loosing some padding in tdq structures or callout.

There's a few million more embedded MIPS boards out there with
16mb/32mb of RAM than target PCs for FreeBSD.

Would you mind making the padding part configurable and just default
it to do the padding ?
That way for the atheros MIPS builds I can turn it off and save on the
memory overhead.

Thanks,


Adrian
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Attilio Rao
On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:
 On 31 October 2012 11:33, Attilio Rao atti...@freebsd.org wrote:

 Doesn't this padding to cache line size only help x86 processors in an
 SMP kernel?  I was expecting to see some #ifdef SMP so that we don't pay
 a big price for no gain in small-memory ARM systems and such.  But maybe
 I'm misunderstanding the reason for the padding.

 I didn't want to do this because this would be meaning that SMP option
 may become a completely killer for modules/kernel ABI compatibility.

 Right, but you didn't make it configurable for us embedded peeps who
 still care about memory usage.

How is this possible without breaking the module/kernel ABI?

 Also, if you look at the modified list of locks I don't think they
 should be too much, I hardly believe ARM UP is going to hurt that much
 from loosing some padding in tdq structures or callout.

 There's a few million more embedded MIPS boards out there with
 16mb/32mb of RAM than target PCs for FreeBSD.

 Would you mind making the padding part configurable and just default
 it to do the padding ?
 That way for the atheros MIPS builds I can turn it off and save on the
 memory overhead.

This means that we will need to put a black cross on the compatibility
between modules and kernel very likely. Changable size locks based on
the options are a very bad idea and if they change the size based on
the availability of SMP or a main option like that (ie. differently
from debugging options) are even a worse idea.

Besides the mtx_padalign mutexes should be not be used often in
general, that's why I ask people to justify their introduction every
time.
And as last point, please consider that right now the locks converted
are the same that were before, so there is no loss for mips or
similar.

All that assuming you can actually prove a real performance loss even
in the new cases.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Peter Jeremy
On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:
On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:
 Right, but you didn't make it configurable for us embedded peeps who
 still care about memory usage.

How is this possible without breaking the module/kernel ABI?

Memory usage may override ABI compatibility in an embedded environment.

All that assuming you can actually prove a real performance loss even
in the new cases.

The issue with padding on embedded systems is memory utilisation rather
than performance.

-- 
Peter Jeremy


pgpXpfkUISFui.pgp
Description: PGP signature


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Ian Lepore
On Thu, 2012-11-01 at 06:30 +1100, Peter Jeremy wrote:
 On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:
 On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:
  Right, but you didn't make it configurable for us embedded peeps who
  still care about memory usage.
 
 How is this possible without breaking the module/kernel ABI?
 
 Memory usage may override ABI compatibility in an embedded environment.
 
 All that assuming you can actually prove a real performance loss even
 in the new cases.
 
 The issue with padding on embedded systems is memory utilisation rather
 than performance.
 

There are potential performance hits too, in that embedded systems tend
to have tiny caches (16K L1 with no L2, that sort of thing), so
purposely padding things so that large parts of a cache line aren't used
for anything wastes a scarce resource.

That said, I think a point Attilio was trying to make is that we won't
see a large hit because this doesn't affect a large number of mutex
instances.  I'm willing to accept his expert advice on that, not in
small part because I'm not sure how I'd go about disputing it. :)

I'm really busy with $work right now, but things should calm down in a
couple weeks, and I'd be willing to do some measurements on arm systems
then, if I can get some help on how to generate useful data.

-- Ian


___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Jim Harris
On Wed, Oct 31, 2012 at 12:30 PM, Peter Jeremy pe...@rulingia.com wrote:

 On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:
 On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:
  Right, but you didn't make it configurable for us embedded peeps who
  still care about memory usage.
 
 How is this possible without breaking the module/kernel ABI?

 Memory usage may override ABI compatibility in an embedded environment.

 All that assuming you can actually prove a real performance loss even
 in the new cases.

 The issue with padding on embedded systems is memory utilisation rather
 than performance.


Agree that for embedded systems, we need to be careful about proliferating
this throughout the entire kernel.

But for the usages thus far, Attilio is right that they should not affect
UP.  The ULE and callout changes made very recently are on per-CPU data
structures, so for UP, that's padding just one mutex each.

For the vpglock-mtx_padalign conversion, this is functionally a nop.
vpglock was already doing this padding.

-Jim
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Andre Oppermann

On 31.10.2012 20:40, Ian Lepore wrote:

On Thu, 2012-11-01 at 06:30 +1100, Peter Jeremy wrote:

On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:

On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:

Right, but you didn't make it configurable for us embedded peeps who
still care about memory usage.


How is this possible without breaking the module/kernel ABI?


Memory usage may override ABI compatibility in an embedded environment.


All that assuming you can actually prove a real performance loss even
in the new cases.


The issue with padding on embedded systems is memory utilisation rather
than performance.



There are potential performance hits too, in that embedded systems tend
to have tiny caches (16K L1 with no L2, that sort of thing), so
purposely padding things so that large parts of a cache line aren't used
for anything wastes a scarce resource.


You can define CACHE_LINE_SIZE to 0 on those platforms.
Or to make it even more granular there could be a CACHE_LINE_SIZE_LOCKS
that is used for lock padding.

--
Andre


That said, I think a point Attilio was trying to make is that we won't
see a large hit because this doesn't affect a large number of mutex
instances.  I'm willing to accept his expert advice on that, not in
small part because I'm not sure how I'd go about disputing it. :)

I'm really busy with $work right now, but things should calm down in a
couple weeks, and I'd be willing to do some measurements on arm systems
then, if I can get some help on how to generate useful data.

-- Ian






___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Adrian Chadd
We definitely can't define CACHE_LINE_SIZE=0 on those platforms, as
there are other reasons you need to know the cache line size. :)



Adrian
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Andre Oppermann

On 31.10.2012 19:10, Attilio Rao wrote:

On Wed, Oct 31, 2012 at 6:07 PM, Attilio Rao atti...@freebsd.org wrote:

Author: attilio
Date: Wed Oct 31 18:07:18 2012
New Revision: 242402
URL: http://svn.freebsd.org/changeset/base/242402

Log:
   Rework the known mutexes to benefit about staying on their own
   cache line in order to avoid manual frobbing but using
   struct mtx_padalign.


Interested developers can now dig and look for other mutexes to
convert and just do it.
Please, however, try to enclose a description about the benchmark
which lead you believe the necessity to pad the mutex and possibly
some numbers, in particular when the lock belongs to structures or the
ABI itself.

Next steps involve porting the same mtx(9) changes to rwlock(9) and
port pvh global pmap lock to rwlock_padalign.


I'd say for an rwlock you can make it unconditional.  The very purpose
of it is to be aquired by multiple CPU's causing cache line dirtying
for every concurrent reader.  Rwlocks are only ever used because multiple
concurrent readers are expected.

--
Andre

___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Attilio Rao
On Wed, Oct 31, 2012 at 8:31 PM, Andre Oppermann an...@freebsd.org wrote:
 On 31.10.2012 20:40, Ian Lepore wrote:

 On Thu, 2012-11-01 at 06:30 +1100, Peter Jeremy wrote:

 On 2012-Oct-31 18:57:37 +, Attilio Rao atti...@freebsd.org wrote:

 On 10/31/12, Adrian Chadd adr...@freebsd.org wrote:

 Right, but you didn't make it configurable for us embedded peeps who
 still care about memory usage.


 How is this possible without breaking the module/kernel ABI?


 Memory usage may override ABI compatibility in an embedded environment.

 All that assuming you can actually prove a real performance loss even
 in the new cases.


 The issue with padding on embedded systems is memory utilisation rather
 than performance.


 There are potential performance hits too, in that embedded systems tend
 to have tiny caches (16K L1 with no L2, that sort of thing), so
 purposely padding things so that large parts of a cache line aren't used
 for anything wastes a scarce resource.


 You can define CACHE_LINE_SIZE to 0 on those platforms.
 Or to make it even more granular there could be a CACHE_LINE_SIZE_LOCKS
 that is used for lock padding.

I think that this is a bright idea, albeit under the condition that
just like CACHE_LINE_SIZE it won't change during STABLE branches
timeframe and that it must not be dependent by SMP option.

What do you think about this patch?:
http://www.freebsd.org/~attilio/cache_line_size_locks.patch

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Jim Harris
On Wed, Oct 31, 2012 at 3:55 PM, Attilio Rao atti...@freebsd.org wrote:

 On Wed, Oct 31, 2012 at 8:31 PM, Andre Oppermann an...@freebsd.org
 wrote:
  You can define CACHE_LINE_SIZE to 0 on those platforms.
  Or to make it even more granular there could be a CACHE_LINE_SIZE_LOCKS
  that is used for lock padding.

 I think that this is a bright idea, albeit under the condition that
 just like CACHE_LINE_SIZE it won't change during STABLE branches
 timeframe and that it must not be dependent by SMP option.

 What do you think about this patch?:
 http://www.freebsd.org/~attilio/cache_line_size_locks.patch


Should CACHE_LINE_SIZE_LOCKS still be defined as CACHE_LINE_SIZE on arm,
mips, etc. if SMP is enabled?  This would ensure the padding that used to
be there in vpglock doesn't go away.

I'm also wondering if this should be named something different, perhaps
LOCK_ALIGNMENT.

-Jim
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r242402 - in head/sys: kern vm

2012-10-31 Thread Attilio Rao
On Wed, Oct 31, 2012 at 11:25 PM, Jim Harris jim.har...@gmail.com wrote:


 On Wed, Oct 31, 2012 at 3:55 PM, Attilio Rao atti...@freebsd.org wrote:

 On Wed, Oct 31, 2012 at 8:31 PM, Andre Oppermann an...@freebsd.org
 wrote:
  You can define CACHE_LINE_SIZE to 0 on those platforms.
  Or to make it even more granular there could be a CACHE_LINE_SIZE_LOCKS
  that is used for lock padding.

 I think that this is a bright idea, albeit under the condition that
 just like CACHE_LINE_SIZE it won't change during STABLE branches
 timeframe and that it must not be dependent by SMP option.

 What do you think about this patch?:
 http://www.freebsd.org/~attilio/cache_line_size_locks.patch


 Should CACHE_LINE_SIZE_LOCKS still be defined as CACHE_LINE_SIZE on arm,
 mips, etc. if SMP is enabled?  This would ensure the padding that used to be
 there in vpglock doesn't go away.

As first thing, I'm strongly against having SMP-dependant lock sizes,
as said so I won't be happy with whatever patch that changes lock
sizes based on the SMP option presence or not.
Said that, I'm not really sure if pad-aligned locks have the same
performance weight on !Intel architectures. I suspect not.

If this is not the case (then pad-aligned are important on some
architectures where I used the 1 value) I just say to go and use
CACHE_ALIGN_SIZE for them.

I think the whole point of this patch was to prevent !Intel (or most
of them, namely MIPS and ARM) architectures to avoid the pad-aligned
effect at all, otherwise this patch looks moot to me.

Said that, this changes completely the meaning of pad-align locks. If
this patch goes in it switches from mutexes padded and aligned to the
cache line size to mutexes which can be padded and aligned if the
architecture can have a real benefit from doing so.

 I'm also wondering if this should be named something different, perhaps
 LOCK_ALIGNMENT.

I don't really mind whatever name you are happier with, so just pick up one.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org