Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-23 Thread Ingo Molnar

* Borislav Petkov  wrote:

> On Tue, Jul 16, 2013 at 05:55:02PM +0900, Joonsoo Kim wrote:
> > How about executing a perf in usermodehelper and collecting output
> > in tmpfs? Using this approach, we can start a perf after rootfs
> > initialization,
> 
> What for if we can start logging to buffers much earlier? *Reading*
> from those buffers can be done much later, at our own leisure with full
> userspace up.

Yeah, agreed, I think this needs to be more integrated into the kernel, so 
that people don't have to worry about "when does userspace start up the 
earliest" details.

Fundamentally all perf really needs here is some memory to initialize and 
buffer into.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-23 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

 On Tue, Jul 16, 2013 at 05:55:02PM +0900, Joonsoo Kim wrote:
  How about executing a perf in usermodehelper and collecting output
  in tmpfs? Using this approach, we can start a perf after rootfs
  initialization,
 
 What for if we can start logging to buffers much earlier? *Reading*
 from those buffers can be done much later, at our own leisure with full
 userspace up.

Yeah, agreed, I think this needs to be more integrated into the kernel, so 
that people don't have to worry about when does userspace start up the 
earliest details.

Fundamentally all perf really needs here is some memory to initialize and 
buffer into.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-22 Thread Robin Holt
On Fri, Jul 19, 2013 at 04:51:49PM -0700, Yinghai Lu wrote:
> On Wed, Jul 17, 2013 at 2:30 AM, Robin Holt  wrote:
> > On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
> >> >With this patch, we did boot a 16TiB machine.  Without the patches,
> >> >the v3.10 kernel with the same configuration took 407 seconds for
> >> >free_all_bootmem.  With the patches and operating on 2MiB pages instead
> >> >of 1GiB, it took 26 seconds so performance was improved.  I have no feel
> >> >for how the 1GiB chunk size will perform.
> >>
> >> How to test how much time spend on free_all_bootmem?
> >
> > We had put a pr_emerg at the beginning and end of free_all_bootmem and
> > then used a modified version of script which record the time in uSecs
> > at the beginning of each line of output.
> 
> used two patches, found 3TiB system will take 100s before slub is ready.
> 
> about three portions:
> 1. sparse vmemap buf allocation, it is with bootmem wrapper, so clear those
> struct page area take about 30s.
> 2. memmap_init_zone: take about 25s
> 3. mem_init/free_all_bootmem about 30s.
> 
> so still wonder why 16TiB will need hours.

I don't know where you got the figure of hours for memory initialization.
That is likely for a 32TiB boot and includes the entire boot, not just
getting the memory allocator initialized.

For a 16 TiB boot:
1) 344
2) 1151
3) 407

I hope that illustrates why we chose to address the memmap_init_zone first
which had the nice side effect of also impacting the free_all_bootmem
slowdown.

With these patches, those numbers are currently:
1) 344
2) 49
3) 26

> also your patches looks like only address 2 and 3.

Right, but I thought that was the normal way to do things.  Address
one thing at a time and work toward a better kernel.  I don't see a
relationship between the work we are doing here and the sparse vmemmap
buffer allocation.  Have I missed something?

Did you happen to time a boot with these patches applied to see how
long it took and how much impact they had on a smaller config?

Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-22 Thread Robin Holt
On Fri, Jul 19, 2013 at 04:51:49PM -0700, Yinghai Lu wrote:
 On Wed, Jul 17, 2013 at 2:30 AM, Robin Holt h...@sgi.com wrote:
  On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
  With this patch, we did boot a 16TiB machine.  Without the patches,
  the v3.10 kernel with the same configuration took 407 seconds for
  free_all_bootmem.  With the patches and operating on 2MiB pages instead
  of 1GiB, it took 26 seconds so performance was improved.  I have no feel
  for how the 1GiB chunk size will perform.
 
  How to test how much time spend on free_all_bootmem?
 
  We had put a pr_emerg at the beginning and end of free_all_bootmem and
  then used a modified version of script which record the time in uSecs
  at the beginning of each line of output.
 
 used two patches, found 3TiB system will take 100s before slub is ready.
 
 about three portions:
 1. sparse vmemap buf allocation, it is with bootmem wrapper, so clear those
 struct page area take about 30s.
 2. memmap_init_zone: take about 25s
 3. mem_init/free_all_bootmem about 30s.
 
 so still wonder why 16TiB will need hours.

I don't know where you got the figure of hours for memory initialization.
That is likely for a 32TiB boot and includes the entire boot, not just
getting the memory allocator initialized.

For a 16 TiB boot:
1) 344
2) 1151
3) 407

I hope that illustrates why we chose to address the memmap_init_zone first
which had the nice side effect of also impacting the free_all_bootmem
slowdown.

With these patches, those numbers are currently:
1) 344
2) 49
3) 26

 also your patches looks like only address 2 and 3.

Right, but I thought that was the normal way to do things.  Address
one thing at a time and work toward a better kernel.  I don't see a
relationship between the work we are doing here and the sparse vmemmap
buffer allocation.  Have I missed something?

Did you happen to time a boot with these patches applied to see how
long it took and how much impact they had on a smaller config?

Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-19 Thread Yinghai Lu
On Wed, Jul 17, 2013 at 2:30 AM, Robin Holt  wrote:
> On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
>> >With this patch, we did boot a 16TiB machine.  Without the patches,
>> >the v3.10 kernel with the same configuration took 407 seconds for
>> >free_all_bootmem.  With the patches and operating on 2MiB pages instead
>> >of 1GiB, it took 26 seconds so performance was improved.  I have no feel
>> >for how the 1GiB chunk size will perform.
>>
>> How to test how much time spend on free_all_bootmem?
>
> We had put a pr_emerg at the beginning and end of free_all_bootmem and
> then used a modified version of script which record the time in uSecs
> at the beginning of each line of output.

used two patches, found 3TiB system will take 100s before slub is ready.

about three portions:
1. sparse vmemap buf allocation, it is with bootmem wrapper, so clear those
struct page area take about 30s.
2. memmap_init_zone: take about 25s
3. mem_init/free_all_bootmem about 30s.

so still wonder why 16TiB will need hours.

also your patches looks like only address 2 and 3.

Yinghai


printk_time_tsc_0.patch
Description: Binary data


printk_time_tsc_1.patch
Description: Binary data


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-19 Thread Yinghai Lu
On Wed, Jul 17, 2013 at 2:30 AM, Robin Holt h...@sgi.com wrote:
 On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
 With this patch, we did boot a 16TiB machine.  Without the patches,
 the v3.10 kernel with the same configuration took 407 seconds for
 free_all_bootmem.  With the patches and operating on 2MiB pages instead
 of 1GiB, it took 26 seconds so performance was improved.  I have no feel
 for how the 1GiB chunk size will perform.

 How to test how much time spend on free_all_bootmem?

 We had put a pr_emerg at the beginning and end of free_all_bootmem and
 then used a modified version of script which record the time in uSecs
 at the beginning of each line of output.

used two patches, found 3TiB system will take 100s before slub is ready.

about three portions:
1. sparse vmemap buf allocation, it is with bootmem wrapper, so clear those
struct page area take about 30s.
2. memmap_init_zone: take about 25s
3. mem_init/free_all_bootmem about 30s.

so still wonder why 16TiB will need hours.

also your patches looks like only address 2 and 3.

Yinghai


printk_time_tsc_0.patch
Description: Binary data


printk_time_tsc_1.patch
Description: Binary data


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-17 Thread Robin Holt
On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
> On 07/12/2013 10:03 AM, Robin Holt wrote:
> >We have been working on this since we returned from shutdown and have
> >something to discuss now.  We restricted ourselves to 2MiB initialization
> >to keep the patch set a little smaller and more clear.
> >
> >First, I think I want to propose getting rid of the page flag.  If I knew
> >of a concrete way to determine that the page has not been initialized,
> >this patch series would look different.  If there is no definitive
> >way to determine that the struct page has been initialized aside from
> >checking the entire page struct is zero, then I think I would suggest
> >we change the page flag to indicate the page has been initialized.
> >
> >The heart of the problem as I see it comes from expand().  We nearly
> >always see a first reference to a struct page which is in the middle
> >of the 2MiB region.  Due to that access, the unlikely() check that was
> >originally proposed really ends up referencing a different page entirely.
> >We actually did not introduce an unlikely and refactor the patches to
> >make that unlikely inside a static inline function.  Also, given the
> >strong warning at the head of expand(), we did not feel experienced
> >enough to refactor it to make things always reference the 2MiB page
> >first.
> >
> >With this patch, we did boot a 16TiB machine.  Without the patches,
> >the v3.10 kernel with the same configuration took 407 seconds for
> >free_all_bootmem.  With the patches and operating on 2MiB pages instead
> >of 1GiB, it took 26 seconds so performance was improved.  I have no feel
> >for how the 1GiB chunk size will perform.
> 
> How to test how much time spend on free_all_bootmem?

We had put a pr_emerg at the beginning and end of free_all_bootmem and
then used a modified version of script which record the time in uSecs
at the beginning of each line of output.

Robin

> 
> >
> >I am on vacation for the next three days so I am sorry in advance for
> >my infrequent or non-existant responses.
> >
> >
> >Signed-off-by: Robin Holt 
> >Signed-off-by: Nate Zimmer 
> >To: "H. Peter Anvin" 
> >To: Ingo Molnar 
> >Cc: Linux Kernel 
> >Cc: Linux MM 
> >Cc: Rob Landley 
> >Cc: Mike Travis 
> >Cc: Daniel J Blueman 
> >Cc: Andrew Morton 
> >Cc: Greg KH 
> >Cc: Yinghai Lu 
> >Cc: Mel Gorman 
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majord...@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-17 Thread Robin Holt
On Wed, Jul 17, 2013 at 01:17:44PM +0800, Sam Ben wrote:
 On 07/12/2013 10:03 AM, Robin Holt wrote:
 We have been working on this since we returned from shutdown and have
 something to discuss now.  We restricted ourselves to 2MiB initialization
 to keep the patch set a little smaller and more clear.
 
 First, I think I want to propose getting rid of the page flag.  If I knew
 of a concrete way to determine that the page has not been initialized,
 this patch series would look different.  If there is no definitive
 way to determine that the struct page has been initialized aside from
 checking the entire page struct is zero, then I think I would suggest
 we change the page flag to indicate the page has been initialized.
 
 The heart of the problem as I see it comes from expand().  We nearly
 always see a first reference to a struct page which is in the middle
 of the 2MiB region.  Due to that access, the unlikely() check that was
 originally proposed really ends up referencing a different page entirely.
 We actually did not introduce an unlikely and refactor the patches to
 make that unlikely inside a static inline function.  Also, given the
 strong warning at the head of expand(), we did not feel experienced
 enough to refactor it to make things always reference the 2MiB page
 first.
 
 With this patch, we did boot a 16TiB machine.  Without the patches,
 the v3.10 kernel with the same configuration took 407 seconds for
 free_all_bootmem.  With the patches and operating on 2MiB pages instead
 of 1GiB, it took 26 seconds so performance was improved.  I have no feel
 for how the 1GiB chunk size will perform.
 
 How to test how much time spend on free_all_bootmem?

We had put a pr_emerg at the beginning and end of free_all_bootmem and
then used a modified version of script which record the time in uSecs
at the beginning of each line of output.

Robin

 
 
 I am on vacation for the next three days so I am sorry in advance for
 my infrequent or non-existant responses.
 
 
 Signed-off-by: Robin Holt h...@sgi.com
 Signed-off-by: Nate Zimmer nzim...@sgi.com
 To: H. Peter Anvin h...@zytor.com
 To: Ingo Molnar mi...@kernel.org
 Cc: Linux Kernel linux-kernel@vger.kernel.org
 Cc: Linux MM linux...@kvack.org
 Cc: Rob Landley r...@landley.net
 Cc: Mike Travis tra...@sgi.com
 Cc: Daniel J Blueman dan...@numascale-asia.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Greg KH gre...@linuxfoundation.org
 Cc: Yinghai Lu ying...@kernel.org
 Cc: Mel Gorman mgor...@suse.de
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Sam Ben

On 07/12/2013 10:03 AM, Robin Holt wrote:

We have been working on this since we returned from shutdown and have
something to discuss now.  We restricted ourselves to 2MiB initialization
to keep the patch set a little smaller and more clear.

First, I think I want to propose getting rid of the page flag.  If I knew
of a concrete way to determine that the page has not been initialized,
this patch series would look different.  If there is no definitive
way to determine that the struct page has been initialized aside from
checking the entire page struct is zero, then I think I would suggest
we change the page flag to indicate the page has been initialized.

The heart of the problem as I see it comes from expand().  We nearly
always see a first reference to a struct page which is in the middle
of the 2MiB region.  Due to that access, the unlikely() check that was
originally proposed really ends up referencing a different page entirely.
We actually did not introduce an unlikely and refactor the patches to
make that unlikely inside a static inline function.  Also, given the
strong warning at the head of expand(), we did not feel experienced
enough to refactor it to make things always reference the 2MiB page
first.

With this patch, we did boot a 16TiB machine.  Without the patches,
the v3.10 kernel with the same configuration took 407 seconds for
free_all_bootmem.  With the patches and operating on 2MiB pages instead
of 1GiB, it took 26 seconds so performance was improved.  I have no feel
for how the 1GiB chunk size will perform.


How to test how much time spend on free_all_bootmem?



I am on vacation for the next three days so I am sorry in advance for
my infrequent or non-existant responses.


Signed-off-by: Robin Holt 
Signed-off-by: Nate Zimmer 
To: "H. Peter Anvin" 
To: Ingo Molnar 
Cc: Linux Kernel 
Cc: Linux MM 
Cc: Rob Landley 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Cc: Andrew Morton 
Cc: Greg KH 
Cc: Yinghai Lu 
Cc: Mel Gorman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Borislav Petkov
On Tue, Jul 16, 2013 at 05:55:02PM +0900, Joonsoo Kim wrote:
> How about executing a perf in usermodehelper and collecting output
> in tmpfs? Using this approach, we can start a perf after rootfs
> initialization,

What for if we can start logging to buffers much earlier? *Reading*
from those buffers can be done much later, at our own leisure with full
userspace up.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Joonsoo Kim
On Fri, Jul 12, 2013 at 10:27:56AM +0200, Ingo Molnar wrote:
> 
> * Robin Holt  wrote:
> 
> > [...]
> > 
> > With this patch, we did boot a 16TiB machine.  Without the patches, the 
> > v3.10 kernel with the same configuration took 407 seconds for 
> > free_all_bootmem.  With the patches and operating on 2MiB pages instead 
> > of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
> > for how the 1GiB chunk size will perform.
> 
> That's pretty impressive.
> 
> It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
> something else being the current bottleneck, besides page init 
> granularity.
> 
> Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
> memory, and then hot-add that memory? That would allow easy profiling of 
> remaining overhead.
> 
> Side note:
> 
> Robert Richter and Boris Petkov are working on 'persistent events' support 
> for perf, which will eventually allow boot time profiling - I'm not sure 
> if the patches and the tooling support is ready enough yet for your 
> purposes.
> 
> Robert, Boris, the following workflow would be pretty intuitive:
> 
>  - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB
> 
>  - we'd get a single (cycles?) event running once the perf subsystem is up
>and running, with a sampling frequency of 1 KHz, sending profiling
>trace events to a sufficiently sized profiling buffer of 16 MB per
>CPU.
> 
>  - once the system reaches SYSTEM_RUNNING, profiling is stopped either
>automatically - or the user stops it via a new tooling command.
> 
>  - the profiling buffer is extracted into a regular perf.data via a
>special 'perf record' call or some other, new perf tooling 
>solution/variant.
> 
>[ Alternatively the kernel could attempt to construct a 'virtual'
>  perf.data from the persistent buffer, available via /sys/debug or
>  elsewhere in /sys - just like the kernel constructs a 'virtual' 
>  /proc/kcore, etc. That file could be copied or used directly. ]

Hello, Robert, Boris, Ingo.

How about executing a perf in usermodehelper and collecting output in
tmpfs? Using this approach, we can start a perf after rootfs
initialization, because we need a perf binary at least. But we can use
almost functionality of perf. If anyone have interest with
this approach, I will send patches implementing this idea.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Joonsoo Kim
On Fri, Jul 12, 2013 at 10:27:56AM +0200, Ingo Molnar wrote:
 
 * Robin Holt h...@sgi.com wrote:
 
  [...]
  
  With this patch, we did boot a 16TiB machine.  Without the patches, the 
  v3.10 kernel with the same configuration took 407 seconds for 
  free_all_bootmem.  With the patches and operating on 2MiB pages instead 
  of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
  for how the 1GiB chunk size will perform.
 
 That's pretty impressive.
 
 It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
 something else being the current bottleneck, besides page init 
 granularity.
 
 Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
 memory, and then hot-add that memory? That would allow easy profiling of 
 remaining overhead.
 
 Side note:
 
 Robert Richter and Boris Petkov are working on 'persistent events' support 
 for perf, which will eventually allow boot time profiling - I'm not sure 
 if the patches and the tooling support is ready enough yet for your 
 purposes.
 
 Robert, Boris, the following workflow would be pretty intuitive:
 
  - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB
 
  - we'd get a single (cycles?) event running once the perf subsystem is up
and running, with a sampling frequency of 1 KHz, sending profiling
trace events to a sufficiently sized profiling buffer of 16 MB per
CPU.
 
  - once the system reaches SYSTEM_RUNNING, profiling is stopped either
automatically - or the user stops it via a new tooling command.
 
  - the profiling buffer is extracted into a regular perf.data via a
special 'perf record' call or some other, new perf tooling 
solution/variant.
 
[ Alternatively the kernel could attempt to construct a 'virtual'
  perf.data from the persistent buffer, available via /sys/debug or
  elsewhere in /sys - just like the kernel constructs a 'virtual' 
  /proc/kcore, etc. That file could be copied or used directly. ]

Hello, Robert, Boris, Ingo.

How about executing a perf in usermodehelper and collecting output in
tmpfs? Using this approach, we can start a perf after rootfs
initialization, because we need a perf binary at least. But we can use
almost functionality of perf. If anyone have interest with
this approach, I will send patches implementing this idea.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Borislav Petkov
On Tue, Jul 16, 2013 at 05:55:02PM +0900, Joonsoo Kim wrote:
 How about executing a perf in usermodehelper and collecting output
 in tmpfs? Using this approach, we can start a perf after rootfs
 initialization,

What for if we can start logging to buffers much earlier? *Reading*
from those buffers can be done much later, at our own leisure with full
userspace up.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-16 Thread Sam Ben

On 07/12/2013 10:03 AM, Robin Holt wrote:

We have been working on this since we returned from shutdown and have
something to discuss now.  We restricted ourselves to 2MiB initialization
to keep the patch set a little smaller and more clear.

First, I think I want to propose getting rid of the page flag.  If I knew
of a concrete way to determine that the page has not been initialized,
this patch series would look different.  If there is no definitive
way to determine that the struct page has been initialized aside from
checking the entire page struct is zero, then I think I would suggest
we change the page flag to indicate the page has been initialized.

The heart of the problem as I see it comes from expand().  We nearly
always see a first reference to a struct page which is in the middle
of the 2MiB region.  Due to that access, the unlikely() check that was
originally proposed really ends up referencing a different page entirely.
We actually did not introduce an unlikely and refactor the patches to
make that unlikely inside a static inline function.  Also, given the
strong warning at the head of expand(), we did not feel experienced
enough to refactor it to make things always reference the 2MiB page
first.

With this patch, we did boot a 16TiB machine.  Without the patches,
the v3.10 kernel with the same configuration took 407 seconds for
free_all_bootmem.  With the patches and operating on 2MiB pages instead
of 1GiB, it took 26 seconds so performance was improved.  I have no feel
for how the 1GiB chunk size will perform.


How to test how much time spend on free_all_bootmem?



I am on vacation for the next three days so I am sorry in advance for
my infrequent or non-existant responses.


Signed-off-by: Robin Holt h...@sgi.com
Signed-off-by: Nate Zimmer nzim...@sgi.com
To: H. Peter Anvin h...@zytor.com
To: Ingo Molnar mi...@kernel.org
Cc: Linux Kernel linux-kernel@vger.kernel.org
Cc: Linux MM linux...@kvack.org
Cc: Rob Landley r...@landley.net
Cc: Mike Travis tra...@sgi.com
Cc: Daniel J Blueman dan...@numascale-asia.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Greg KH gre...@linuxfoundation.org
Cc: Yinghai Lu ying...@kernel.org
Cc: Mel Gorman mgor...@suse.de
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-15 Thread Robin Holt
On Fri, Jul 12, 2013 at 10:27:56AM +0200, Ingo Molnar wrote:
> 
> * Robin Holt  wrote:
> 
> > [...]
> > 
> > With this patch, we did boot a 16TiB machine.  Without the patches, the 
> > v3.10 kernel with the same configuration took 407 seconds for 
> > free_all_bootmem.  With the patches and operating on 2MiB pages instead 
> > of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
> > for how the 1GiB chunk size will perform.
> 
> That's pretty impressive.

And WRONG!

That is a 15x speedup in the freeing of memory at the free_all_bootmem
point.  That is _NOT_ the speedup from memmap_init_zone.  I forgot to
take that into account as Nate pointed out this morning in a hallway
discussion.  Before, on the 16TiB machine, memmap_init_zone took 1152
seconds.  After, it took 50.  If it were a straight 1/512th, we would
have expected that 1152 to be something more on the line of 2-3 so there
is still significant room for improvement.

Sorry for the confusion.

> It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
> something else being the current bottleneck, besides page init 
> granularity.
> 
> Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
> memory, and then hot-add that memory? That would allow easy profiling of 
> remaining overhead.

Nate and I will be working on other things for the next few hours hoping
there is a better answer to the first question we asked about there
being a way to test a page other than comparing against all zeroes to
see if it has been initialized.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-15 Thread Robin Holt
On Thu, Jul 11, 2013 at 09:03:51PM -0500, Robin Holt wrote:
> We have been working on this since we returned from shutdown and have
> something to discuss now.  We restricted ourselves to 2MiB initialization
> to keep the patch set a little smaller and more clear.
> 
> First, I think I want to propose getting rid of the page flag.  If I knew
> of a concrete way to determine that the page has not been initialized,
> this patch series would look different.  If there is no definitive
> way to determine that the struct page has been initialized aside from
> checking the entire page struct is zero, then I think I would suggest
> we change the page flag to indicate the page has been initialized.

Ingo or HPA,

Did I implement this wrong or is there a way to get rid of the page flag
which is not going to impact normal operation?  I don't want to put too
much more effort into this until I know we are stuck going this direction.
Currently, the expand() function has a relatively expensive checked
against the 2MiB aligned pfn's struct page.  I do not know of a way to
eliminate that check against the other page as the first reference we
see for a page is in the middle of that 2MiB aligned range.

To identify this as an area of concern, we had booted with a simulator,
setting watch points on the struct page array region once the
Uninitialized flag was set and maintaining that until it was cleared.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-15 Thread Robin Holt
On Thu, Jul 11, 2013 at 09:03:51PM -0500, Robin Holt wrote:
 We have been working on this since we returned from shutdown and have
 something to discuss now.  We restricted ourselves to 2MiB initialization
 to keep the patch set a little smaller and more clear.
 
 First, I think I want to propose getting rid of the page flag.  If I knew
 of a concrete way to determine that the page has not been initialized,
 this patch series would look different.  If there is no definitive
 way to determine that the struct page has been initialized aside from
 checking the entire page struct is zero, then I think I would suggest
 we change the page flag to indicate the page has been initialized.

Ingo or HPA,

Did I implement this wrong or is there a way to get rid of the page flag
which is not going to impact normal operation?  I don't want to put too
much more effort into this until I know we are stuck going this direction.
Currently, the expand() function has a relatively expensive checked
against the 2MiB aligned pfn's struct page.  I do not know of a way to
eliminate that check against the other page as the first reference we
see for a page is in the middle of that 2MiB aligned range.

To identify this as an area of concern, we had booted with a simulator,
setting watch points on the struct page array region once the
Uninitialized flag was set and maintaining that until it was cleared.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-15 Thread Robin Holt
On Fri, Jul 12, 2013 at 10:27:56AM +0200, Ingo Molnar wrote:
 
 * Robin Holt h...@sgi.com wrote:
 
  [...]
  
  With this patch, we did boot a 16TiB machine.  Without the patches, the 
  v3.10 kernel with the same configuration took 407 seconds for 
  free_all_bootmem.  With the patches and operating on 2MiB pages instead 
  of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
  for how the 1GiB chunk size will perform.
 
 That's pretty impressive.

And WRONG!

That is a 15x speedup in the freeing of memory at the free_all_bootmem
point.  That is _NOT_ the speedup from memmap_init_zone.  I forgot to
take that into account as Nate pointed out this morning in a hallway
discussion.  Before, on the 16TiB machine, memmap_init_zone took 1152
seconds.  After, it took 50.  If it were a straight 1/512th, we would
have expected that 1152 to be something more on the line of 2-3 so there
is still significant room for improvement.

Sorry for the confusion.

 It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
 something else being the current bottleneck, besides page init 
 granularity.
 
 Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
 memory, and then hot-add that memory? That would allow easy profiling of 
 remaining overhead.

Nate and I will be working on other things for the next few hours hoping
there is a better answer to the first question we asked about there
being a way to test a page other than comparing against all zeroes to
see if it has been initialized.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-12 Thread Robert Richter
On 12.07.13 10:27:56, Ingo Molnar wrote:
> 
> * Robin Holt  wrote:
> 
> > [...]
> > 
> > With this patch, we did boot a 16TiB machine.  Without the patches, the 
> > v3.10 kernel with the same configuration took 407 seconds for 
> > free_all_bootmem.  With the patches and operating on 2MiB pages instead 
> > of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
> > for how the 1GiB chunk size will perform.
> 
> That's pretty impressive.
> 
> It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
> something else being the current bottleneck, besides page init 
> granularity.
> 
> Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
> memory, and then hot-add that memory? That would allow easy profiling of 
> remaining overhead.
> 
> Side note:
> 
> Robert Richter and Boris Petkov are working on 'persistent events' support 
> for perf, which will eventually allow boot time profiling - I'm not sure 
> if the patches and the tooling support is ready enough yet for your 
> purposes.

The latest patch set is still this:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2

It requires the perf subsystem to be initialized first which might be
too late, see perf_event_init() in start_kernel(). The patch set is
currently also limited to tracepoints only.

If this is sufficient for you, you might register persistent events
with the function perf_add_persistent_event_by_id(), see
mcheck_init_tp() how to do this. Later you can fetch all samples with:

 # perf record -e persistent// sleep 1

> Robert, Boris, the following workflow would be pretty intuitive:
> 
>  - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB
> 
>  - we'd get a single (cycles?) event running once the perf subsystem is up
>and running, with a sampling frequency of 1 KHz, sending profiling
>trace events to a sufficiently sized profiling buffer of 16 MB per
>CPU.

I am not sure about the event you want to setup here, if it is a
tracepoint the sample_period should be always 1. The buffer size
parameter looks interesting, for now it is 512kB per cpu per default
(as perf tools setup the buffer).

> 
>  - once the system reaches SYSTEM_RUNNING, profiling is stopped either
>automatically - or the user stops it via a new tooling command.
> 
>  - the profiling buffer is extracted into a regular perf.data via a
>special 'perf record' call or some other, new perf tooling 
>solution/variant.

See the perf-record command above...

> 
>[ Alternatively the kernel could attempt to construct a 'virtual'
>  perf.data from the persistent buffer, available via /sys/debug or
>  elsewhere in /sys - just like the kernel constructs a 'virtual' 
>  /proc/kcore, etc. That file could be copied or used directly. ]
> 
>  - from that point on this workflow joins the regular profiling workflow: 
>perf report, perf script et al can be used to analyze the resulting
>boot profile.

Ingo, thanks for outlining this workflow. We will look how this could
fit into the new version of persistent events we currently working on.

Thanks,

-Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-12 Thread Ingo Molnar

* Robin Holt  wrote:

> [...]
> 
> With this patch, we did boot a 16TiB machine.  Without the patches, the 
> v3.10 kernel with the same configuration took 407 seconds for 
> free_all_bootmem.  With the patches and operating on 2MiB pages instead 
> of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
> for how the 1GiB chunk size will perform.

That's pretty impressive.

It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
something else being the current bottleneck, besides page init 
granularity.

Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
memory, and then hot-add that memory? That would allow easy profiling of 
remaining overhead.

Side note:

Robert Richter and Boris Petkov are working on 'persistent events' support 
for perf, which will eventually allow boot time profiling - I'm not sure 
if the patches and the tooling support is ready enough yet for your 
purposes.

Robert, Boris, the following workflow would be pretty intuitive:

 - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB

 - we'd get a single (cycles?) event running once the perf subsystem is up
   and running, with a sampling frequency of 1 KHz, sending profiling
   trace events to a sufficiently sized profiling buffer of 16 MB per
   CPU.

 - once the system reaches SYSTEM_RUNNING, profiling is stopped either
   automatically - or the user stops it via a new tooling command.

 - the profiling buffer is extracted into a regular perf.data via a
   special 'perf record' call or some other, new perf tooling 
   solution/variant.

   [ Alternatively the kernel could attempt to construct a 'virtual'
 perf.data from the persistent buffer, available via /sys/debug or
 elsewhere in /sys - just like the kernel constructs a 'virtual' 
 /proc/kcore, etc. That file could be copied or used directly. ]

 - from that point on this workflow joins the regular profiling workflow: 
   perf report, perf script et al can be used to analyze the resulting
   boot profile.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-12 Thread Ingo Molnar

* Robin Holt h...@sgi.com wrote:

 [...]
 
 With this patch, we did boot a 16TiB machine.  Without the patches, the 
 v3.10 kernel with the same configuration took 407 seconds for 
 free_all_bootmem.  With the patches and operating on 2MiB pages instead 
 of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
 for how the 1GiB chunk size will perform.

That's pretty impressive.

It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
something else being the current bottleneck, besides page init 
granularity.

Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
memory, and then hot-add that memory? That would allow easy profiling of 
remaining overhead.

Side note:

Robert Richter and Boris Petkov are working on 'persistent events' support 
for perf, which will eventually allow boot time profiling - I'm not sure 
if the patches and the tooling support is ready enough yet for your 
purposes.

Robert, Boris, the following workflow would be pretty intuitive:

 - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB

 - we'd get a single (cycles?) event running once the perf subsystem is up
   and running, with a sampling frequency of 1 KHz, sending profiling
   trace events to a sufficiently sized profiling buffer of 16 MB per
   CPU.

 - once the system reaches SYSTEM_RUNNING, profiling is stopped either
   automatically - or the user stops it via a new tooling command.

 - the profiling buffer is extracted into a regular perf.data via a
   special 'perf record' call or some other, new perf tooling 
   solution/variant.

   [ Alternatively the kernel could attempt to construct a 'virtual'
 perf.data from the persistent buffer, available via /sys/debug or
 elsewhere in /sys - just like the kernel constructs a 'virtual' 
 /proc/kcore, etc. That file could be copied or used directly. ]

 - from that point on this workflow joins the regular profiling workflow: 
   perf report, perf script et al can be used to analyze the resulting
   boot profile.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-12 Thread Robert Richter
On 12.07.13 10:27:56, Ingo Molnar wrote:
 
 * Robin Holt h...@sgi.com wrote:
 
  [...]
  
  With this patch, we did boot a 16TiB machine.  Without the patches, the 
  v3.10 kernel with the same configuration took 407 seconds for 
  free_all_bootmem.  With the patches and operating on 2MiB pages instead 
  of 1GiB, it took 26 seconds so performance was improved.  I have no feel 
  for how the 1GiB chunk size will perform.
 
 That's pretty impressive.
 
 It's still a 15x speedup instead of a 512x speedup, so I'd say there's 
 something else being the current bottleneck, besides page init 
 granularity.
 
 Can you boot with just a few gigs of RAM and stuff the rest into hotplug 
 memory, and then hot-add that memory? That would allow easy profiling of 
 remaining overhead.
 
 Side note:
 
 Robert Richter and Boris Petkov are working on 'persistent events' support 
 for perf, which will eventually allow boot time profiling - I'm not sure 
 if the patches and the tooling support is ready enough yet for your 
 purposes.

The latest patch set is still this:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2

It requires the perf subsystem to be initialized first which might be
too late, see perf_event_init() in start_kernel(). The patch set is
currently also limited to tracepoints only.

If this is sufficient for you, you might register persistent events
with the function perf_add_persistent_event_by_id(), see
mcheck_init_tp() how to do this. Later you can fetch all samples with:

 # perf record -e persistent/tracepoint/ sleep 1

 Robert, Boris, the following workflow would be pretty intuitive:
 
  - kernel developer sets boot flag: perf=boot,freq=1khz,size=16MB
 
  - we'd get a single (cycles?) event running once the perf subsystem is up
and running, with a sampling frequency of 1 KHz, sending profiling
trace events to a sufficiently sized profiling buffer of 16 MB per
CPU.

I am not sure about the event you want to setup here, if it is a
tracepoint the sample_period should be always 1. The buffer size
parameter looks interesting, for now it is 512kB per cpu per default
(as perf tools setup the buffer).

 
  - once the system reaches SYSTEM_RUNNING, profiling is stopped either
automatically - or the user stops it via a new tooling command.
 
  - the profiling buffer is extracted into a regular perf.data via a
special 'perf record' call or some other, new perf tooling 
solution/variant.

See the perf-record command above...

 
[ Alternatively the kernel could attempt to construct a 'virtual'
  perf.data from the persistent buffer, available via /sys/debug or
  elsewhere in /sys - just like the kernel constructs a 'virtual' 
  /proc/kcore, etc. That file could be copied or used directly. ]
 
  - from that point on this workflow joins the regular profiling workflow: 
perf report, perf script et al can be used to analyze the resulting
boot profile.

Ingo, thanks for outlining this workflow. We will look how this could
fit into the new version of persistent events we currently working on.

Thanks,

-Robert
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-11 Thread Robin Holt
We have been working on this since we returned from shutdown and have
something to discuss now.  We restricted ourselves to 2MiB initialization
to keep the patch set a little smaller and more clear.

First, I think I want to propose getting rid of the page flag.  If I knew
of a concrete way to determine that the page has not been initialized,
this patch series would look different.  If there is no definitive
way to determine that the struct page has been initialized aside from
checking the entire page struct is zero, then I think I would suggest
we change the page flag to indicate the page has been initialized.

The heart of the problem as I see it comes from expand().  We nearly
always see a first reference to a struct page which is in the middle
of the 2MiB region.  Due to that access, the unlikely() check that was
originally proposed really ends up referencing a different page entirely.
We actually did not introduce an unlikely and refactor the patches to
make that unlikely inside a static inline function.  Also, given the
strong warning at the head of expand(), we did not feel experienced
enough to refactor it to make things always reference the 2MiB page
first.

With this patch, we did boot a 16TiB machine.  Without the patches,
the v3.10 kernel with the same configuration took 407 seconds for
free_all_bootmem.  With the patches and operating on 2MiB pages instead
of 1GiB, it took 26 seconds so performance was improved.  I have no feel
for how the 1GiB chunk size will perform.

I am on vacation for the next three days so I am sorry in advance for
my infrequent or non-existant responses.


Signed-off-by: Robin Holt 
Signed-off-by: Nate Zimmer 
To: "H. Peter Anvin" 
To: Ingo Molnar 
Cc: Linux Kernel 
Cc: Linux MM 
Cc: Rob Landley 
Cc: Mike Travis 
Cc: Daniel J Blueman 
Cc: Andrew Morton 
Cc: Greg KH 
Cc: Yinghai Lu 
Cc: Mel Gorman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 0/4] Transparent on-demand struct page initialization embedded in the buddy allocator

2013-07-11 Thread Robin Holt
We have been working on this since we returned from shutdown and have
something to discuss now.  We restricted ourselves to 2MiB initialization
to keep the patch set a little smaller and more clear.

First, I think I want to propose getting rid of the page flag.  If I knew
of a concrete way to determine that the page has not been initialized,
this patch series would look different.  If there is no definitive
way to determine that the struct page has been initialized aside from
checking the entire page struct is zero, then I think I would suggest
we change the page flag to indicate the page has been initialized.

The heart of the problem as I see it comes from expand().  We nearly
always see a first reference to a struct page which is in the middle
of the 2MiB region.  Due to that access, the unlikely() check that was
originally proposed really ends up referencing a different page entirely.
We actually did not introduce an unlikely and refactor the patches to
make that unlikely inside a static inline function.  Also, given the
strong warning at the head of expand(), we did not feel experienced
enough to refactor it to make things always reference the 2MiB page
first.

With this patch, we did boot a 16TiB machine.  Without the patches,
the v3.10 kernel with the same configuration took 407 seconds for
free_all_bootmem.  With the patches and operating on 2MiB pages instead
of 1GiB, it took 26 seconds so performance was improved.  I have no feel
for how the 1GiB chunk size will perform.

I am on vacation for the next three days so I am sorry in advance for
my infrequent or non-existant responses.


Signed-off-by: Robin Holt h...@sgi.com
Signed-off-by: Nate Zimmer nzim...@sgi.com
To: H. Peter Anvin h...@zytor.com
To: Ingo Molnar mi...@kernel.org
Cc: Linux Kernel linux-kernel@vger.kernel.org
Cc: Linux MM linux...@kvack.org
Cc: Rob Landley r...@landley.net
Cc: Mike Travis tra...@sgi.com
Cc: Daniel J Blueman dan...@numascale-asia.com
Cc: Andrew Morton a...@linux-foundation.org
Cc: Greg KH gre...@linuxfoundation.org
Cc: Yinghai Lu ying...@kernel.org
Cc: Mel Gorman mgor...@suse.de
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/