[PATCH 0/4] De-couple sysfs memory directories from memory sections
This is a re-send of the remaining patches that did not make it into the last kernel release for de-coupling sysfs memory directories from memory sections. The first three patches of the previous set went in, and this is the remaining patches that need to be applied. The patches decouple the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are performance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8+ hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against sibling directories ( see sysfs_find_dirent() ) to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc and x86 in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. -Nathan Fontenot ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote: The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against sibling directories ( see sysfs_find_dirent() ) to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Again, are you sure about this? I thought we resolved this issue in the past, but you were going to check it. Did you? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On 01/20/2011 10:45 AM, Greg KH wrote: On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote: The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against sibling directories ( see sysfs_find_dirent() ) to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Again, are you sure about this? I thought we resolved this issue in the past, but you were going to check it. Did you? Yes, the string compare is still present in the sysfs code. There was discussion around this sometime last year when I sent a patch out that stored the directory entries in something other than a linked list. That patch was rejected but it was agreed that something should be done. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On Thu, 2011-01-20 at 08:45 -0800, Greg KH wrote: On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote: The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against sibling directories ( see sysfs_find_dirent() ) to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Again, are you sure about this? I thought we resolved this issue in the past, but you were going to check it. Did you? Just to be clear, simply reducing the number of kobjects can make these patches worthwhile on their own. I originally figured that the SECTION_SIZE would go up over time as systems got larger, and _that_ would keep the number of sections and number of sysfs objects down. Well, that turned out to be wrong, and we're eating up a ton of memory now. We can't fix the SECTION_SIZE easily, but we can reduce the number of kobjects that we need to track the sections. *That* is the main benefit I see from these patches. I think there's a problem worth fixing, even ignoring the directory creation issue (if it still exists). -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On Thu, Jan 20, 2011 at 10:51:44AM -0600, Nathan Fontenot wrote: On 01/20/2011 10:45 AM, Greg KH wrote: On Thu, Jan 20, 2011 at 10:36:40AM -0600, Nathan Fontenot wrote: The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against sibling directories ( see sysfs_find_dirent() ) to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Again, are you sure about this? I thought we resolved this issue in the past, but you were going to check it. Did you? Yes, the string compare is still present in the sysfs code. There was discussion around this sometime last year when I sent a patch out that stored the directory entries in something other than a linked list. That patch was rejected but it was agreed that something should be done. Ah, ok, thanks for verifying. greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/4] De-couple sysfs memory directories from memory sections
This is a re-send of the remaining patches that did not make it into the last kernel release for de-coupling sysfs memory directories from memory sections. The first three patches of the previous set went in, and this is the remaining patches that need to be applied. The patches decouple the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are performance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc and x86 in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. -Nathan Fontenot ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On Mon, Jan 10, 2011 at 12:08:56PM -0600, Nathan Fontenot wrote: This is a re-send of the remaining patches that did not make it into the last kernel release for de-coupling sysfs memory directories from memory sections. The first three patches of the previous set went in, and this is the remaining patches that need to be applied. Well, it's a bit late right now, as we are merging stuff that is already in our trees, and we are busy with that, so this is likely to be ignored until after .38-rc1 is out. So, care to resend this after .38-rc1 is out so people can pay attention to it? The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Are you sure this is still an issue? I thought we solved this last kernel or so with a simple patch? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
On 01/10/2011 12:44 PM, Greg KH wrote: On Mon, Jan 10, 2011 at 12:08:56PM -0600, Nathan Fontenot wrote: This is a re-send of the remaining patches that did not make it into the last kernel release for de-coupling sysfs memory directories from memory sections. The first three patches of the previous set went in, and this is the remaining patches that need to be applied. Well, it's a bit late right now, as we are merging stuff that is already in our trees, and we are busy with that, so this is likely to be ignored until after .38-rc1 is out. So, care to resend this after .38-rc1 is out so people can pay attention to it? I was afraid of this. I didn't get a chance to get it out sooner but thought I would send it out anyway. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Are you sure this is still an issue? I thought we solved this last kernel or so with a simple patch? I'll go back and look at this again. thanks, -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] De-couple sysfs memory directories from memory sections
The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. Are you sure this is still an issue? I thought we solved this last kernel or so with a simple patch? I'll go back and look at this again. What I recall fixing is the symbolic linking from the node* to the memory section. In that case, we cached the most recent mem section and since they always were added sequentially, the cache saved a rescan. Of course, I could be remembering something completely unrelated. Robin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev