Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-23 Thread Linus Torvalds
On Wed, Sep 23, 2020 at 10:16 AM Linus Torvalds
 wrote:
>
> But these two patches are very intentionally meant to be just "this
> clearly changes NO semantics at all".

The more I look at these, the more I go "this is a cleanup
regardless", so I'll just keep thes in my tree as-is.

 Linus


Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-23 Thread Linus Torvalds
On Mon, Sep 21, 2020 at 2:18 PM Peter Xu  wrote:
>
> There's one special path for copy_one_pte() with swap entries, in which
> add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return
> the swp_entry_t so that the caller will release the locks and redo the same
> thing with GFP_KERNEL.
>
> It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
> ptes are non-swap entries).  More importantly, we face other requirement to
> extend this "we need to do something else, but without the locks held" case.
>
> Rework the return value into something easier to understand, as defined in 
> enum
> copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union
> copy_mm_data parameter.

Ok, I'm reading this series, and I do hate this.

And I think it's unnecessary.

There's a very simple way to avoid this all: split out the
"!pte_present(pte)" case from the function entirely.

That actually makes the code much more legible: that non-present case
is very different, and it's also unlikely() and causes deeper
indentation etc.

Because it's unlikely, it probably also shouldn't be inline.

That unlikely case is also why when then have that special
"out_set_pte" label, which should just go away and be copied into the
(now uninlined) function.

Once that re-organization has been done, the second step is to then
just move the "pte_present()" check into the caller, and suddenly all
the ugly return value games will go entirely away.

I'm attaching the two patches that do this here, but I do want to note
how that first patch is much more legible with "--ignore-all-space",
and then you really see that the diff is a _pure_ code movement thing.
Otherwise it looks like it's doing a big change.

Comments?

NOTE! The intent here is that now we can easily add new argument (a
pre-allocated page or NULL) and a return value to
"copy_present_page()": it can return "I needed a temporary page but
you hadn't allocated one yet" or "I used up the temporary page you
gave me" or "all good, keep the temporary page around for the future".

But these two patches are very intentionally meant to be just "this
clearly changes NO semantics at all".

   Linus
From df3a57d1f6072d07978bafa7dbd9904cdf8f3e13 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Wed, 23 Sep 2020 09:56:59 -0700
Subject: [PATCH 1/2] mm: split out the non-present case from copy_one_pte()

This is a purely mechanical split of the copy_one_pte() function.  It's
not immediately obvious when looking at the diff because of the
indentation change, but the way to see what is going on in this commit
is to use the "-w" flag to not show pure whitespace changes, and you see
how the first part of copy_one_pte() is simply lifted out into a
separate function.

And since the non-present case is marked unlikely, don't make the new
function be inlined.  Not that gcc really seems to care, since it looks
like it will inline it anyway due to the whole "single callsite for
static function" logic.  In fact, code generation with the function
split is almost identical to before.  But not marking it inline is the
right thing to do.

This is pure prep-work and cleanup for subsequent changes.

Signed-off-by: Linus Torvalds 
---
 mm/memory.c | 152 
 1 file changed, 82 insertions(+), 70 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 469af373ae76..31a3ab7d9aa3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -695,6 +695,84 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
  * covered by this vma.
  */
 
+static unsigned long
+copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
+		unsigned long addr, int *rss)
+{
+	unsigned long vm_flags = vma->vm_flags;
+	pte_t pte = *src_pte;
+	struct page *page;
+	swp_entry_t entry = pte_to_swp_entry(pte);
+
+	if (likely(!non_swap_entry(entry))) {
+		if (swap_duplicate(entry) < 0)
+			return entry.val;
+
+		/* make sure dst_mm is on swapoff's mmlist. */
+		if (unlikely(list_empty(_mm->mmlist))) {
+			spin_lock(_lock);
+			if (list_empty(_mm->mmlist))
+list_add(_mm->mmlist,
+		_mm->mmlist);
+			spin_unlock(_lock);
+		}
+		rss[MM_SWAPENTS]++;
+	} else if (is_migration_entry(entry)) {
+		page = migration_entry_to_page(entry);
+
+		rss[mm_counter(page)]++;
+
+		if (is_write_migration_entry(entry) &&
+is_cow_mapping(vm_flags)) {
+			/*
+			 * COW mappings require pages in both
+			 * parent and child to be set to read.
+			 */
+			make_migration_entry_read();
+			pte = swp_entry_to_pte(entry);
+			if (pte_swp_soft_dirty(*src_pte))
+pte = pte_swp_mksoft_dirty(pte);
+			if (pte_swp_uffd_wp(*src_pte))
+pte = pte_swp_mkuffd

Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-23 Thread Oleg Nesterov
On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 08:23:18PM +0200, Oleg Nesterov wrote:
> >
> > But I still think that !pte_none() -> pte_none() transition is not possible
> > under mmap_write_lock()...
> >
> > OK, let me repeat I don't understans these code paths enough, let me reword:
> > I don't see how this transition is possible.
>
> Though I guess I'll keep my wording, because I still think it's accurate to
> me. :)
>
> Can we e.g. punch a page hole without changing vmas?

punch a hole? I don't know what does it mean...

However, I think you are right anyway. I forgot that (at least) truncate can
clear this pte without mmap_sem after pte_unmap_unlock().

So I think you are right, the current code is wrong too.

Thanks!

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Peter Xu
On Tue, Sep 22, 2020 at 08:23:18PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> > > On 09/22, Peter Xu wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > > > However since I didn't change this logic in this patch, it probably 
> > > > > > means this
> > > > > > bug is also in the original code before this series...  I'm 
> > > > > > thinking maybe I
> > > > > > should prepare a standalone patch to clear the swp_entry_t and cc 
> > > > > > stable.
> > > > >
> > > > > Well, if copy_one_pte(src_pte) hits a swap entry and returns 
> > > > > entry.val != 0, then
> > > > > pte_none(*src_pte) is not possible after restart? This means that 
> > > > > copy_one_pte()
> > > > > will be called at least once.
> > > >
> > > > Note that we've released the page table locks, so afaict the old swp 
> > > > entry can
> > > > be gone under us when we go back to the "do" loop... :)
> > >
> > > But how?
> > >
> > > I am just curious, I don't understand this code enough.
> >
> > Me neither.
> >
> > The point is I think we can't assume *src_pte will read the same if we have
> > released the src_ptl in copy_pte_range(),
> 
> This is clear.
> 
> But I still think that !pte_none() -> pte_none() transition is not possible
> under mmap_write_lock()...
> 
> OK, let me repeat I don't understans these code paths enough, let me reword:
> I don't see how this transition is possible.

Though I guess I'll keep my wording, because I still think it's accurate to
me. :)

Can we e.g. punch a page hole without changing vmas?

-- 
Peter Xu



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Oleg Nesterov
On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> > On 09/22, Peter Xu wrote:
> > >
> > > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > > However since I didn't change this logic in this patch, it probably 
> > > > > means this
> > > > > bug is also in the original code before this series...  I'm thinking 
> > > > > maybe I
> > > > > should prepare a standalone patch to clear the swp_entry_t and cc 
> > > > > stable.
> > > >
> > > > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val 
> > > > != 0, then
> > > > pte_none(*src_pte) is not possible after restart? This means that 
> > > > copy_one_pte()
> > > > will be called at least once.
> > >
> > > Note that we've released the page table locks, so afaict the old swp 
> > > entry can
> > > be gone under us when we go back to the "do" loop... :)
> >
> > But how?
> >
> > I am just curious, I don't understand this code enough.
>
> Me neither.
>
> The point is I think we can't assume *src_pte will read the same if we have
> released the src_ptl in copy_pte_range(),

This is clear.

But I still think that !pte_none() -> pte_none() transition is not possible
under mmap_write_lock()...

OK, let me repeat I don't understans these code paths enough, let me reword:
I don't see how this transition is possible.

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Peter Xu
On Tue, Sep 22, 2020 at 06:53:55PM +0200, Oleg Nesterov wrote:
> On 09/22, Peter Xu wrote:
> >
> > On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > > However since I didn't change this logic in this patch, it probably 
> > > > means this
> > > > bug is also in the original code before this series...  I'm thinking 
> > > > maybe I
> > > > should prepare a standalone patch to clear the swp_entry_t and cc 
> > > > stable.
> > >
> > > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 
> > > 0, then
> > > pte_none(*src_pte) is not possible after restart? This means that 
> > > copy_one_pte()
> > > will be called at least once.
> >
> > Note that we've released the page table locks, so afaict the old swp entry 
> > can
> > be gone under us when we go back to the "do" loop... :)
> 
> But how?
> 
> I am just curious, I don't understand this code enough.

Me neither.

The point is I think we can't assume *src_pte will read the same if we have
released the src_ptl in copy_pte_range(), because imho the src_ptl is the only
thing to protect it.  Or to be more explicit, we need pte_alloc_map_lock() to
read a stable pmd/pte or before update (since src_ptl itself could change too).

Thanks,

-- 
Peter Xu



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Oleg Nesterov
On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > > However since I didn't change this logic in this patch, it probably means 
> > > this
> > > bug is also in the original code before this series...  I'm thinking 
> > > maybe I
> > > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> >
> > Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 
> > 0, then
> > pte_none(*src_pte) is not possible after restart? This means that 
> > copy_one_pte()
> > will be called at least once.
>
> Note that we've released the page table locks, so afaict the old swp entry can
> be gone under us when we go back to the "do" loop... :)

But how?

I am just curious, I don't understand this code enough.

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Peter Xu
On Tue, Sep 22, 2020 at 05:48:46PM +0200, Oleg Nesterov wrote:
> > However since I didn't change this logic in this patch, it probably means 
> > this
> > bug is also in the original code before this series...  I'm thinking maybe I
> > should prepare a standalone patch to clear the swp_entry_t and cc stable.
> 
> Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, 
> then
> pte_none(*src_pte) is not possible after restart? This means that 
> copy_one_pte()
> will be called at least once.

Note that we've released the page table locks, so afaict the old swp entry can
be gone under us when we go back to the "do" loop... :) Extremely corner case,
but maybe still good to fix, extra clearness as a (good) side effect.

-- 
Peter Xu



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Oleg Nesterov
On 09/22, Peter Xu wrote:
>
> On Tue, Sep 22, 2020 at 12:18:16PM +0200, Oleg Nesterov wrote:
> > On 09/22, Oleg Nesterov wrote:
> > >
> > > On 09/21, Peter Xu wrote:
> > > >
> > > > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct 
> > > > *dst_mm, struct mm_struct *src_mm,
> > > > pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > > > cond_resched();
> > > >
> > > > -   if (entry.val) {
> > > > -   if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > > > +   switch (copy_ret) {
> > > > +   case COPY_MM_SWAP_CONT:
> > > > +   if (add_swap_count_continuation(data.entry, GFP_KERNEL) 
> > > > < 0)
> > > > return -ENOMEM;
> > > > -   progress = 0;
> > > > +   break;
> > >
> > > Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
> > >
> > > > +   default:
> > > > +   break;
> > > > }
> > > > +
> > > > if (addr != end)
> > > > goto again;
> > >
> > > After that the main loop can stop again because of need_resched(), and
> > > in this case add_swap_count_continuation(data.entry) will be called again?
> >
> > No, this is not possible, copy_one_pte() should be called at least once,
> > progress = 0 before restart. Sorry for noise.
>
> Oh wait, I think you're right... when we get a COPY_MM_SWAP_CONT, goto 
> "again",
> then if there're 32 pte_none() ptes _plus_ an need_resched(), then we might
> reach again at the same add_swap_count_continuation() with the same swp entry.

Yes, please see my reply to 4/5 ;)

> However since I didn't change this logic in this patch, it probably means this
> bug is also in the original code before this series...  I'm thinking maybe I
> should prepare a standalone patch to clear the swp_entry_t and cc stable.

Well, if copy_one_pte(src_pte) hits a swap entry and returns entry.val != 0, 
then
pte_none(*src_pte) is not possible after restart? This means that copy_one_pte()
will be called at least once.

So _think_ that the current code is fine, but I can be easily wrong and I agree
this doesn't look clean.

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Peter Xu
On Tue, Sep 22, 2020 at 12:18:16PM +0200, Oleg Nesterov wrote:
> On 09/22, Oleg Nesterov wrote:
> >
> > On 09/21, Peter Xu wrote:
> > >
> > > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
> > > struct mm_struct *src_mm,
> > >   pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > >   cond_resched();
> > >
> > > - if (entry.val) {
> > > - if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > > + switch (copy_ret) {
> > > + case COPY_MM_SWAP_CONT:
> > > + if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > >   return -ENOMEM;
> > > - progress = 0;
> > > + break;
> >
> > Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
> >
> > > + default:
> > > + break;
> > >   }
> > > +
> > >   if (addr != end)
> > >   goto again;
> >
> > After that the main loop can stop again because of need_resched(), and
> > in this case add_swap_count_continuation(data.entry) will be called again?
> 
> No, this is not possible, copy_one_pte() should be called at least once,
> progress = 0 before restart. Sorry for noise.

Oh wait, I think you're right... when we get a COPY_MM_SWAP_CONT, goto "again",
then if there're 32 pte_none() ptes _plus_ an need_resched(), then we might
reach again at the same add_swap_count_continuation() with the same swp entry.

However since I didn't change this logic in this patch, it probably means this
bug is also in the original code before this series...  I'm thinking maybe I
should prepare a standalone patch to clear the swp_entry_t and cc stable.

Thanks,

-- 
Peter Xu



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Peter Xu
On Tue, Sep 22, 2020 at 12:11:29AM -0700, John Hubbard wrote:
> On 9/21/20 2:17 PM, Peter Xu wrote:
> > There's one special path for copy_one_pte() with swap entries, in which
> > add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll 
> > return
> 
> I might be looking at the wrong place, but the existing code seems to call
> add_swap_count_continuation(GFP_KERNEL), not with GFP_ATOMIC?

Ah, I wanted to reference the one in swap_duplicate().

> 
> > the swp_entry_t so that the caller will release the locks and redo the same
> > thing with GFP_KERNEL.
> > 
> > It's confusing when copy_one_pte() must return a swp_entry_t (even if all 
> > the
> > ptes are non-swap entries).  More importantly, we face other requirement to
> > extend this "we need to do something else, but without the locks held" case.
> > 
> > Rework the return value into something easier to understand, as defined in 
> > enum
> > copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced 
> > union
> 
> I like the documentation here, but it doesn't match what you did in the patch.
> Actually, the documentation had the right idea (enum, rather than #define, for
> COPY_MM_* items). Below...

Yeah actually my very initial version has it as an enum, then I changed it to
macros because I started to want it return negative as errors.  However funnily
in the current version copy_one_pte() won't return an error anymore... So
probably, yes, it should be a good idea to get the enum back.

Also we should be able to drop the negative handling too with copy_ret, though
it should be in the next patch.

> 
> > copy_mm_data parameter.
> > 
> > Another trivial change is to move the reset of the "progress" counter into 
> > the
> > retry path, so that we'll reset it for other reasons too.
> > 
> > This should prepare us with adding new return codes, very soon.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >   mm/memory.c | 42 +-
> >   1 file changed, 29 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 7525147908c4..1530bb1070f4 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct 
> > *vma, unsigned long addr,
> >   }
> >   #endif
> > +#define  COPY_MM_DONE   0
> > +#define  COPY_MM_SWAP_CONT  1
> 
> Those should be enums, so as to get a little type safety and other goodness 
> from
> using non-macro items.
> 
> ...
> > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
> > struct mm_struct *src_mm,
> > pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > cond_resched();
> > -   if (entry.val) {
> > -   if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > +   switch (copy_ret) {
> > +   case COPY_MM_SWAP_CONT:
> > +   if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > return -ENOMEM;
> > -   progress = 0;
> 
> Yes. Definitely a little cleaner to reset this above, instead of here.
> 
> > +   break;
> > +   default:
> > +   break;
> 
> I assume this no-op noise is to placate the compiler and/or static checkers. 
> :)

This is (so far) for COPY_MM_DONE.  I normally will cover all cases in a
"switch()" and here "default" is for it.  Even if I covered all the
possibilities, I may still tend to keep one "default" and a WARN_ON_ONCE(1) to
make sure nothing I've missed.  Not sure whether that's the ideal way, though.

Thanks,

-- 
Peter Xu



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Oleg Nesterov
On 09/22, Oleg Nesterov wrote:
>
> On 09/21, Peter Xu wrote:
> >
> > @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
> > struct mm_struct *src_mm,
> > pte_unmap_unlock(orig_dst_pte, dst_ptl);
> > cond_resched();
> >
> > -   if (entry.val) {
> > -   if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> > +   switch (copy_ret) {
> > +   case COPY_MM_SWAP_CONT:
> > +   if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
> > return -ENOMEM;
> > -   progress = 0;
> > +   break;
>
> Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,
>
> > +   default:
> > +   break;
> > }
> > +
> > if (addr != end)
> > goto again;
>
> After that the main loop can stop again because of need_resched(), and
> in this case add_swap_count_continuation(data.entry) will be called again?

No, this is not possible, copy_one_pte() should be called at least once,
progress = 0 before restart. Sorry for noise.

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread Oleg Nesterov
On 09/21, Peter Xu wrote:
>
> @@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
> struct mm_struct *src_mm,
>   pte_unmap_unlock(orig_dst_pte, dst_ptl);
>   cond_resched();
>  
> - if (entry.val) {
> - if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
> + switch (copy_ret) {
> + case COPY_MM_SWAP_CONT:
> + if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
>   return -ENOMEM;
> - progress = 0;
> + break;

Note that you didn't clear copy_ret, it is still COPY_MM_SWAP_CONT,

> + default:
> + break;
>   }
> +
>   if (addr != end)
>   goto again;

After that the main loop can stop again because of need_resched(), and
in this case add_swap_count_continuation(data.entry) will be called again?

Oleg.



Re: [PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-22 Thread John Hubbard

On 9/21/20 2:17 PM, Peter Xu wrote:

There's one special path for copy_one_pte() with swap entries, in which
add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return


I might be looking at the wrong place, but the existing code seems to call
add_swap_count_continuation(GFP_KERNEL), not with GFP_ATOMIC?


the swp_entry_t so that the caller will release the locks and redo the same
thing with GFP_KERNEL.

It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
ptes are non-swap entries).  More importantly, we face other requirement to
extend this "we need to do something else, but without the locks held" case.

Rework the return value into something easier to understand, as defined in enum
copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union


I like the documentation here, but it doesn't match what you did in the patch.
Actually, the documentation had the right idea (enum, rather than #define, for
COPY_MM_* items). Below...


copy_mm_data parameter.

Another trivial change is to move the reset of the "progress" counter into the
retry path, so that we'll reset it for other reasons too.

This should prepare us with adding new return codes, very soon.

Signed-off-by: Peter Xu 
---
  mm/memory.c | 42 +-
  1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7525147908c4..1530bb1070f4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct 
*vma, unsigned long addr,
  }
  #endif
  
+#define  COPY_MM_DONE   0

+#define  COPY_MM_SWAP_CONT  1


Those should be enums, so as to get a little type safety and other goodness from
using non-macro items.

...

@@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
struct mm_struct *src_mm,
pte_unmap_unlock(orig_dst_pte, dst_ptl);
cond_resched();
  
-	if (entry.val) {

-   if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
+   switch (copy_ret) {
+   case COPY_MM_SWAP_CONT:
+   if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
return -ENOMEM;
-   progress = 0;


Yes. Definitely a little cleaner to reset this above, instead of here.


+   break;
+   default:
+   break;


I assume this no-op noise is to placate the compiler and/or static checkers. :)

I'm unable to find any actual problems with the diffs, aside from the nit about
using an enum.

thanks,
--
John Hubbard
NVIDIA


[PATCH 3/5] mm: Rework return value for copy_one_pte()

2020-09-21 Thread Peter Xu
There's one special path for copy_one_pte() with swap entries, in which
add_swap_count_continuation(GFP_ATOMIC) might fail.  In that case we'll return
the swp_entry_t so that the caller will release the locks and redo the same
thing with GFP_KERNEL.

It's confusing when copy_one_pte() must return a swp_entry_t (even if all the
ptes are non-swap entries).  More importantly, we face other requirement to
extend this "we need to do something else, but without the locks held" case.

Rework the return value into something easier to understand, as defined in enum
copy_mm_ret.  We'll pass the swp_entry_t back using the newly introduced union
copy_mm_data parameter.

Another trivial change is to move the reset of the "progress" counter into the
retry path, so that we'll reset it for other reasons too.

This should prepare us with adding new return codes, very soon.

Signed-off-by: Peter Xu 
---
 mm/memory.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7525147908c4..1530bb1070f4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -689,16 +689,24 @@ struct page *vm_normal_page_pmd(struct vm_area_struct 
*vma, unsigned long addr,
 }
 #endif
 
+#define  COPY_MM_DONE   0
+#define  COPY_MM_SWAP_CONT  1
+
+struct copy_mm_data {
+   /* COPY_MM_SWAP_CONT */
+   swp_entry_t entry;
+};
+
 /*
  * copy one vm_area from one task to the other. Assumes the page tables
  * already present in the new task to be cleared in the whole range
  * covered by this vma.
  */
 
-static inline unsigned long
+static inline int
 copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-   unsigned long addr, int *rss)
+   unsigned long addr, int *rss, struct copy_mm_data *data)
 {
unsigned long vm_flags = vma->vm_flags;
pte_t pte = *src_pte;
@@ -709,8 +717,10 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
swp_entry_t entry = pte_to_swp_entry(pte);
 
if (likely(!non_swap_entry(entry))) {
-   if (swap_duplicate(entry) < 0)
-   return entry.val;
+   if (swap_duplicate(entry) < 0) {
+   data->entry = entry;
+   return COPY_MM_SWAP_CONT;
+   }
 
/* make sure dst_mm is on swapoff's mmlist. */
if (unlikely(list_empty(_mm->mmlist))) {
@@ -809,7 +819,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
*src_mm,
 
 out_set_pte:
set_pte_at(dst_mm, addr, dst_pte, pte);
-   return 0;
+   return COPY_MM_DONE;
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -820,9 +830,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
pte_t *orig_src_pte, *orig_dst_pte;
pte_t *src_pte, *dst_pte;
spinlock_t *src_ptl, *dst_ptl;
-   int progress = 0;
+   int progress, copy_ret = COPY_MM_DONE;
int rss[NR_MM_COUNTERS];
-   swp_entry_t entry = (swp_entry_t){0};
+   struct copy_mm_data data;
 
 again:
init_rss_vec(rss);
@@ -837,6 +847,7 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
orig_dst_pte = dst_pte;
arch_enter_lazy_mmu_mode();
 
+   progress = 0;
do {
/*
 * We are holding two locks at this point - either of them
@@ -852,9 +863,9 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
progress++;
continue;
    }
-   entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
-   vma, addr, rss);
-   if (entry.val)
+   copy_ret = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
+   vma, addr, rss, );
+   if (copy_ret != COPY_MM_DONE)
break;
progress += 8;
} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);
@@ -866,13 +877,18 @@ static int copy_pte_range(struct mm_struct *dst_mm, 
struct mm_struct *src_mm,
pte_unmap_unlock(orig_dst_pte, dst_ptl);
cond_resched();
 
-   if (entry.val) {
-   if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
+   switch (copy_ret) {
+   case COPY_MM_SWAP_CONT:
+   if (add_swap_count_continuation(data.entry, GFP_KERNEL) < 0)
return -ENOMEM;
-   progress = 0;
+   break;
+   default:
+   break;
}
+
if (addr != end)
goto again;
+
return 0;
 }
 
-- 
2.26.2



Re: copy_one_pte()

2007-05-16 Thread Daniel J Blueman

On 03/04/07, Zoltan Menyhart <[EMAIL PROTECTED]> wrote:

Daniel J Blueman wrote:
> On 13 Mar, 20:20, Zoltan Menyhart <[EMAIL PROTECTED]> wrote:
>
>> I had a look at copy_one_pte().
>> I cannot see any ioproc_update_page() call, not even for the COW pages.
>> Is it intentional?
>
> There could be an ioproc_update_range() call in
> memory.c:copy_pte_range(), after the pte_unmap_unlock(), and this
> would be an optimisation, since explicitly updating the mappings in
> the Quadrics RDMA NIC avoids the need for the NIC to trap and update
> it's MMU later.

I can agree that this optimization is a good idea.

Can you please confirm that cp->update_range() understands the concept of
the COW? (I.e. the card should not write anything into the COW pages.)

> The code which implements [1] this takes the pagetable locks with
> pte_offset_map_lock(), and uses one of the kmap_atomic slots, so has
> to be after the pagetables are unlocked. The
> ioproc_invalidate_page/range() calls are different and can't live with
> this race, so have to be used with the pagetable locks held.

I can see two issues here:

- Performance problem: page_table_lock is more expensive than a split lock,
  and taking page_table_lock after the split lock released is even worse.


I have introduced an ioproc_update_page_locked() call for this and
related paths, and updated the kernel patches; this avoids a race
condition I was seeing in multi-threaded processes taking concurrent
page faults under the split locks. The split lock is now acquired
once, and the changes allowed me to do other some optimisations on
this path too.


- Synchronization problem: do_wp_page() is protected by the appropriate
  split lock, i.e. the COW page can be broken up while you are inside
  cp->update_range() under the protection of page_table_lock.


The split locks were used previously via pte_offset_map_lock(), so
this race wasn't possible previously.


I think the PTE modification and the cp->update_xxx() should be protected
by the very same lock, without releasing it between the two operations.
I think the PTE modification and the cp->update_xxx() should be an
atomic operation with respect to the VMM activity.


This does make most sense and is now the case, although atomiticity
wasn't a problem, since it's fine to use ioproc_update_page/range()
speculatively.

Dan


Thanks,

Zoltan

--
Daniel J Blueman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-05-16 Thread Daniel J Blueman

On 03/04/07, Zoltan Menyhart [EMAIL PROTECTED] wrote:

Daniel J Blueman wrote:
 On 13 Mar, 20:20, Zoltan Menyhart [EMAIL PROTECTED] wrote:

 I had a look at copy_one_pte().
 I cannot see any ioproc_update_page() call, not even for the COW pages.
 Is it intentional?

 There could be an ioproc_update_range() call in
 memory.c:copy_pte_range(), after the pte_unmap_unlock(), and this
 would be an optimisation, since explicitly updating the mappings in
 the Quadrics RDMA NIC avoids the need for the NIC to trap and update
 it's MMU later.

I can agree that this optimization is a good idea.

Can you please confirm that cp-update_range() understands the concept of
the COW? (I.e. the card should not write anything into the COW pages.)

 The code which implements [1] this takes the pagetable locks with
 pte_offset_map_lock(), and uses one of the kmap_atomic slots, so has
 to be after the pagetables are unlocked. The
 ioproc_invalidate_page/range() calls are different and can't live with
 this race, so have to be used with the pagetable locks held.

I can see two issues here:

- Performance problem: page_table_lock is more expensive than a split lock,
  and taking page_table_lock after the split lock released is even worse.


I have introduced an ioproc_update_page_locked() call for this and
related paths, and updated the kernel patches; this avoids a race
condition I was seeing in multi-threaded processes taking concurrent
page faults under the split locks. The split lock is now acquired
once, and the changes allowed me to do other some optimisations on
this path too.


- Synchronization problem: do_wp_page() is protected by the appropriate
  split lock, i.e. the COW page can be broken up while you are inside
  cp-update_range() under the protection of page_table_lock.


The split locks were used previously via pte_offset_map_lock(), so
this race wasn't possible previously.


I think the PTE modification and the cp-update_xxx() should be protected
by the very same lock, without releasing it between the two operations.
I think the PTE modification and the cp-update_xxx() should be an
atomic operation with respect to the VMM activity.


This does make most sense and is now the case, although atomiticity
wasn't a problem, since it's fine to use ioproc_update_page/range()
speculatively.

Dan


Thanks,

Zoltan

--
Daniel J Blueman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-04-02 Thread Daniel J Blueman

On 13 Mar, 20:20, Zoltan Menyhart <[EMAIL PROTECTED]> wrote:

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?


There could be an ioproc_update_range() call in
memory.c:copy_pte_range(), after the pte_unmap_unlock(), and this
would be an optimisation, since explicitly updating the mappings in
the Quadrics RDMA NIC avoids the need for the NIC to trap and update
it's MMU later.

The code which implements [1] this takes the pagetable locks with
pte_offset_map_lock(), and uses one of the kmap_atomic slots, so has
to be after the pagetables are unlocked. The
ioproc_invalidate_page/range() calls are different and can't live with
this race, so have to be used with the pagetable locks held.

Daniel

--- [1]

http://lwn.net/Articles/133627/
http://www.quadrics.com/linux


We can live with a COW page for a considerably long time.
How could the IO-PROC. know that a process-ID / user virt. addr. pair
refers to the same page?

The comment above ioproc_update_page() says that every time when a PTE
is created / modified...

Thanks,

ZoltanMenyhart

--
Daniel J Blueman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-04-02 Thread Daniel J Blueman

On 13 Mar, 20:20, Zoltan Menyhart [EMAIL PROTECTED] wrote:

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?


There could be an ioproc_update_range() call in
memory.c:copy_pte_range(), after the pte_unmap_unlock(), and this
would be an optimisation, since explicitly updating the mappings in
the Quadrics RDMA NIC avoids the need for the NIC to trap and update
it's MMU later.

The code which implements [1] this takes the pagetable locks with
pte_offset_map_lock(), and uses one of the kmap_atomic slots, so has
to be after the pagetables are unlocked. The
ioproc_invalidate_page/range() calls are different and can't live with
this race, so have to be used with the pagetable locks held.

Daniel

--- [1]

http://lwn.net/Articles/133627/
http://www.quadrics.com/linux


We can live with a COW page for a considerably long time.
How could the IO-PROC. know that a process-ID / user virt. addr. pair
refers to the same page?

The comment above ioproc_update_page() says that every time when a PTE
is created / modified...

Thanks,

ZoltanMenyhart

--
Daniel J Blueman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-15 Thread Matt Keenan

Andrew Morton wrote:

(cc restored.  Please always do reply-to-all).

  

On Wed, 14 Mar 2007 08:35:07 + Matt Keenan <[EMAIL PROTECTED]> wrote:
Christoph Hellwig wrote:


On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
  
  

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?



There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

  
  
It looks like this function exists as a part of patches to support 
Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
driver doesn't need to pin pages, it can be informed of page updates 
directly.  A patch was submitted to l-k sometime in 2005.



Oh Dear.

Which vendor's kernel are we talking about here?
  
I don't know of any vendor's kernels that support this (but then I run 
vanilla kernels on Debian).  I just grep'ed for the patch because it 
sounded interesting.  There was a posting for it to l-k on 26th April 
2005 from David Addison of Quadrics Ltd xref 
http://lkml.org/lkml/2005/4/26/198  According to David, you (Andrew) and 
Andrea Arcangeli asked for it to be posted for some feedback, the main 
feedback was on whitespace issues and COWs w.r.t. fork().  Brice Goglin 
made an interesting comment about using a similar method but tracking 
VMAs rather than address spaces.  By the looks of things it never went 
into the mainline kernel.  I lurk a bit (I sometimes miss things) on l-k 
but I hadn't noticed any other methods for dynamic DMA direct to user 
space (other than pinning pages), is there anything planned?


Matt


p.s. I Cc'ed Brice Goglin and David Addison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-15 Thread Andrew Morton

(cc restored.  Please always do reply-to-all).

> On Wed, 14 Mar 2007 08:35:07 + Matt Keenan <[EMAIL PROTECTED]> wrote:
> Christoph Hellwig wrote:
> > On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
> >   
> >> I had a look at copy_one_pte().
> >> I cannot see any ioproc_update_page() call, not even for the COW pages.
> >> Is it intentional?
> >> 
> >
> > There is no such thing as ioproc_update_page in any mainline tree.
> > You must be looking at some vendor tree with braindead patches applied.
> >
> >   
> It looks like this function exists as a part of patches to support 
> Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
> driver doesn't need to pin pages, it can be informed of page updates 
> directly.  A patch was submitted to l-k sometime in 2005.

Oh Dear.

Which vendor's kernel are we talking about here?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-15 Thread Andrew Morton

(cc restored.  Please always do reply-to-all).

 On Wed, 14 Mar 2007 08:35:07 + Matt Keenan [EMAIL PROTECTED] wrote:
 Christoph Hellwig wrote:
  On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:

  I had a look at copy_one_pte().
  I cannot see any ioproc_update_page() call, not even for the COW pages.
  Is it intentional?
  
 
  There is no such thing as ioproc_update_page in any mainline tree.
  You must be looking at some vendor tree with braindead patches applied.
 

 It looks like this function exists as a part of patches to support 
 Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
 driver doesn't need to pin pages, it can be informed of page updates 
 directly.  A patch was submitted to l-k sometime in 2005.

Oh Dear.

Which vendor's kernel are we talking about here?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-15 Thread Matt Keenan

Andrew Morton wrote:

(cc restored.  Please always do reply-to-all).

  

On Wed, 14 Mar 2007 08:35:07 + Matt Keenan [EMAIL PROTECTED] wrote:
Christoph Hellwig wrote:


On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
  
  

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?



There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

  
  
It looks like this function exists as a part of patches to support 
Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
driver doesn't need to pin pages, it can be informed of page updates 
directly.  A patch was submitted to l-k sometime in 2005.



Oh Dear.

Which vendor's kernel are we talking about here?
  
I don't know of any vendor's kernels that support this (but then I run 
vanilla kernels on Debian).  I just grep'ed for the patch because it 
sounded interesting.  There was a posting for it to l-k on 26th April 
2005 from David Addison of Quadrics Ltd xref 
http://lkml.org/lkml/2005/4/26/198  According to David, you (Andrew) and 
Andrea Arcangeli asked for it to be posted for some feedback, the main 
feedback was on whitespace issues and COWs w.r.t. fork().  Brice Goglin 
made an interesting comment about using a similar method but tracking 
VMAs rather than address spaces.  By the looks of things it never went 
into the mainline kernel.  I lurk a bit (I sometimes miss things) on l-k 
but I hadn't noticed any other methods for dynamic DMA direct to user 
space (other than pinning pages), is there anything planned?


Matt


p.s. I Cc'ed Brice Goglin and David Addison
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-14 Thread Matt Keenan

Christoph Hellwig wrote:

On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
  

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?



There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

  
It looks like this function exists as a part of patches to support 
Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
driver doesn't need to pin pages, it can be informed of page updates 
directly.  A patch was submitted to l-k sometime in 2005.


Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-14 Thread Matt Keenan

Christoph Hellwig wrote:

On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
  

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?



There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

  
It looks like this function exists as a part of patches to support 
Quadrics NICs / RDMA (HPC platforms).  The patches are there so the 
driver doesn't need to pin pages, it can be informed of page updates 
directly.  A patch was submitted to l-k sometime in 2005.


Matt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-13 Thread Christoph Hellwig
On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
> I had a look at copy_one_pte().
> I cannot see any ioproc_update_page() call, not even for the COW pages.
> Is it intentional?

There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


copy_one_pte()

2007-03-13 Thread Zoltan Menyhart

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?

We can live with a COW page for a considerably long time.
How could the IO-PROC. know that a process-ID / user virt. addr. pair
refers to the same page?

The comment above ioproc_update_page() says that every time when a PTE
is created / modified...

Thanks,

Zoltan Menyhart

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


copy_one_pte()

2007-03-13 Thread Zoltan Menyhart

I had a look at copy_one_pte().
I cannot see any ioproc_update_page() call, not even for the COW pages.
Is it intentional?

We can live with a COW page for a considerably long time.
How could the IO-PROC. know that a process-ID / user virt. addr. pair
refers to the same page?

The comment above ioproc_update_page() says that every time when a PTE
is created / modified...

Thanks,

Zoltan Menyhart

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: copy_one_pte()

2007-03-13 Thread Christoph Hellwig
On Tue, Mar 13, 2007 at 08:15:25PM +0100, Zoltan Menyhart wrote:
 I had a look at copy_one_pte().
 I cannot see any ioproc_update_page() call, not even for the COW pages.
 Is it intentional?

There is no such thing as ioproc_update_page in any mainline tree.
You must be looking at some vendor tree with braindead patches applied.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/