[PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-20 Thread Nathan Fontenot
This is a re-send of the remaining patches that did not make it
into the last kernel release for de-coupling sysfs memory
directories from memory sections.  The first three patches of the
previous set went in, and this is the remaining patches that
need to be applied.

The patches decouple the concept that a single memory section corresponds
to a single directory in /sys/devices/system/memory/.  On systems
with large amounts of memory (1+ TB) there are performance issues
related to creating the large number of sysfs directories.  For
a powerpc machine with 1 TB of memory we are creating 63,000+
directories.  This is resulting in boot times of around 45-50
minutes for systems with 1 TB of memory and 8+ hours for systems
with 2 TB of memory.  With this patch set applied I am now seeing
boot times of 5 minutes or less.

The root of this issue is in sysfs directory creation. Every time
a directory is created a string compare is done against sibling
directories ( see sysfs_find_dirent() ) to ensure we do not create 
duplicates.  The list of directory nodes in sysfs is kept as an
unsorted list which results in this being an exponentially longer
operation as the number of directories are created.

The solution solved by this patch set is to allow a single
directory in sysfs to span multiple memory sections.  This is
controlled by an optional architecturally defined function
memory_block_size_bytes().  The default definition of this
routine returns a memory block size equal to the memory section
size. This maintains the current layout of sysfs memory
directories as it appears to userspace to remain the same as it
is today.

For architectures that define their own version of this routine,
as is done for powerpc and x86 in this patchset, the view in userspace
would change such that each memoryXXX directory would span
multiple memory sections.  The number of sections spanned would
depend on the value reported by memory_block_size_bytes.

-Nathan Fontenot
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-20 Thread Greg KH
On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote:
 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against sibling
 directories ( see sysfs_find_dirent() ) to ensure we do not create 
 duplicates.  The list of directory nodes in sysfs is kept as an
 unsorted list which results in this being an exponentially longer
 operation as the number of directories are created.

Again, are you sure about this?  I thought we resolved this issue in the
past, but you were going to check it.  Did you?

thanks,

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-20 Thread Nathan Fontenot
On 01/20/2011 10:45 AM, Greg KH wrote:
 On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote:
 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against sibling
 directories ( see sysfs_find_dirent() ) to ensure we do not create 
 duplicates.  The list of directory nodes in sysfs is kept as an
 unsorted list which results in this being an exponentially longer
 operation as the number of directories are created.
 
 Again, are you sure about this?  I thought we resolved this issue in the
 past, but you were going to check it.  Did you?
 

Yes, the string compare is still present in the sysfs code.  There was
discussion around this sometime last year when I sent a patch out that
stored the directory entries in something other than a linked list.
That patch was rejected but it was agreed that something should be done.

-Nathan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-20 Thread Dave Hansen
On Thu, 2011-01-20 at 08:45 -0800, Greg KH wrote:
 On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote:
  The root of this issue is in sysfs directory creation. Every time
  a directory is created a string compare is done against sibling
  directories ( see sysfs_find_dirent() ) to ensure we do not create 
  duplicates.  The list of directory nodes in sysfs is kept as an
  unsorted list which results in this being an exponentially longer
  operation as the number of directories are created.
 
 Again, are you sure about this?  I thought we resolved this issue in the
 past, but you were going to check it.  Did you?

Just to be clear, simply reducing the number of kobjects can make these
patches worthwhile on their own.  I originally figured that the
SECTION_SIZE would go up over time as systems got larger, and _that_
would keep the number of sections and number of sysfs objects down.
Well, that turned out to be wrong, and we're eating up a ton of memory
now.  We can't fix the SECTION_SIZE easily, but we can reduce the number
of kobjects that we need to track the sections.  *That* is the main
benefit I see from these patches.

I think there's a problem worth fixing, even ignoring the directory
creation issue (if it still exists).

-- Dave

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-20 Thread Greg KH
On Thu, Jan 20, 2011 at 10:51:44AM -0600, Nathan Fontenot wrote:
 On 01/20/2011 10:45 AM, Greg KH wrote:
  On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote:
  The root of this issue is in sysfs directory creation. Every time
  a directory is created a string compare is done against sibling
  directories ( see sysfs_find_dirent() ) to ensure we do not create 
  duplicates.  The list of directory nodes in sysfs is kept as an
  unsorted list which results in this being an exponentially longer
  operation as the number of directories are created.
  
  Again, are you sure about this?  I thought we resolved this issue in the
  past, but you were going to check it.  Did you?
  
 
 Yes, the string compare is still present in the sysfs code.  There was
 discussion around this sometime last year when I sent a patch out that
 stored the directory entries in something other than a linked list.
 That patch was rejected but it was agreed that something should be done.

Ah, ok, thanks for verifying.

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-10 Thread Nathan Fontenot
This is a re-send of the remaining patches that did not make it
into the last kernel release for de-coupling sysfs memory
directories from memory sections.  The first three patches of the
previous set went in, and this is the remaining patches that
need to be applied.

The patches decouple the concept that a single memory
section corresponds to a single directory in 
/sys/devices/system/memory/.  On systems
with large amounts of memory (1+ TB) there are performance issues
related to creating the large number of sysfs directories.  For
a powerpc machine with 1 TB of memory we are creating 63,000+
directories.  This is resulting in boot times of around 45-50
minutes for systems with 1 TB of memory and 8 hours for systems
with 2 TB of memory.  With this patch set applied I am now seeing
boot times of 5 minutes or less.

The root of this issue is in sysfs directory creation. Every time
a directory is created a string compare is done against all sibling
directories to ensure we do not create duplicates.  The list of
directory nodes in sysfs is kept as an unsorted list which results
in this being an exponentially longer operation as the number of
directories are created.

The solution solved by this patch set is to allow a single
directory in sysfs to span multiple memory sections.  This is
controlled by an optional architecturally defined function
memory_block_size_bytes().  The default definition of this
routine returns a memory block size equal to the memory section
size. This maintains the current layout of sysfs memory
directories as it appears to userspace to remain the same as it
is today.

For architectures that define their own version of this routine,
as is done for powerpc and x86 in this patchset, the view in userspace
would change such that each memoryXXX directory would span
multiple memory sections.  The number of sections spanned would
depend on the value reported by memory_block_size_bytes.

-Nathan Fontenot
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-10 Thread Greg KH
On Mon, Jan 10, 2011 at 12:08:56PM -0600, Nathan Fontenot wrote:
 This is a re-send of the remaining patches that did not make it
 into the last kernel release for de-coupling sysfs memory
 directories from memory sections.  The first three patches of the
 previous set went in, and this is the remaining patches that
 need to be applied.

Well, it's a bit late right now, as we are merging stuff that is already
in our trees, and we are busy with that, so this is likely to be ignored
until after .38-rc1 is out.

So, care to resend this after .38-rc1 is out so people can pay attention
to it?


 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against all sibling
 directories to ensure we do not create duplicates.  The list of
 directory nodes in sysfs is kept as an unsorted list which results
 in this being an exponentially longer operation as the number of
 directories are created.

Are you sure this is still an issue?  I thought we solved this last
kernel or so with a simple patch?

thanks,

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-10 Thread Nathan Fontenot
On 01/10/2011 12:44 PM, Greg KH wrote:
 On Mon, Jan 10, 2011 at 12:08:56PM -0600, Nathan Fontenot wrote:
 This is a re-send of the remaining patches that did not make it
 into the last kernel release for de-coupling sysfs memory
 directories from memory sections.  The first three patches of the
 previous set went in, and this is the remaining patches that
 need to be applied.
 
 Well, it's a bit late right now, as we are merging stuff that is already
 in our trees, and we are busy with that, so this is likely to be ignored
 until after .38-rc1 is out.
 
 So, care to resend this after .38-rc1 is out so people can pay attention
 to it?

I was afraid of this. I didn't get a chance to get it out sooner but thought
I would send it out anyway.

 
 
 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against all sibling
 directories to ensure we do not create duplicates.  The list of
 directory nodes in sysfs is kept as an unsorted list which results
 in this being an exponentially longer operation as the number of
 directories are created.
 
 Are you sure this is still an issue?  I thought we solved this last
 kernel or so with a simple patch?

I'll go back and look at this again.

thanks,
-Nathan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections

2011-01-10 Thread Robin Holt
  The root of this issue is in sysfs directory creation. Every time
  a directory is created a string compare is done against all sibling
  directories to ensure we do not create duplicates.  The list of
  directory nodes in sysfs is kept as an unsorted list which results
  in this being an exponentially longer operation as the number of
  directories are created.
  
  Are you sure this is still an issue?  I thought we solved this last
  kernel or so with a simple patch?
 
 I'll go back and look at this again.

What I recall fixing is the symbolic linking from the node* to the
memory section.  In that case, we cached the most recent mem section
and since they always were added sequentially, the cache saved a rescan.

Of course, I could be remembering something completely unrelated.

Robin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev