Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-10 Thread Eric Biggers
On Fri, Nov 10, 2017 at 10:37:30PM +1100, Herbert Xu wrote:
> On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> > From: Eric Biggers 
> > 
> > On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> > largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> > doing modular exponentiation in mpi_powm() without rescheduling.  If all
> > threads do it, it locks up the system.  Moreover, it can cause
> > rcu_sched-stall warnings.
> > 
> > Notwithstanding the insanity of doing this calculation in kernel mode
> > rather than in userspace, fix it by calling cond_resched() as each bit
> > from the exponent is processed.  It's still noninterruptible, but at
> > least it's preemptible now.
> > 
> > Cc: sta...@vger.kernel.org # v4.12+
> > Signed-off-by: Eric Biggers 
> 
> Patch applied.  Thanks.
> -- 

If it's not too late can you fix the stable line to be just

Cc: sta...@vger.kernel.org

As Mat pointed out KEYCTL_DH_COMPUTE was actually introduced in v4.7.  Also I
think the code is also reachable through RSA by adding an x509 certificate using
the "asymmetric" key type, although that appears to be limited to 4096-bit
inputs rather than 16384 bits.

Eric


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-10 Thread Eric Biggers
On Fri, Nov 10, 2017 at 10:37:30PM +1100, Herbert Xu wrote:
> On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> > From: Eric Biggers 
> > 
> > On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> > largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> > doing modular exponentiation in mpi_powm() without rescheduling.  If all
> > threads do it, it locks up the system.  Moreover, it can cause
> > rcu_sched-stall warnings.
> > 
> > Notwithstanding the insanity of doing this calculation in kernel mode
> > rather than in userspace, fix it by calling cond_resched() as each bit
> > from the exponent is processed.  It's still noninterruptible, but at
> > least it's preemptible now.
> > 
> > Cc: sta...@vger.kernel.org # v4.12+
> > Signed-off-by: Eric Biggers 
> 
> Patch applied.  Thanks.
> -- 

If it's not too late can you fix the stable line to be just

Cc: sta...@vger.kernel.org

As Mat pointed out KEYCTL_DH_COMPUTE was actually introduced in v4.7.  Also I
think the code is also reachable through RSA by adding an x509 certificate using
the "asymmetric" key type, although that appears to be limited to 4096-bit
inputs rather than 16384 bits.

Eric


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-10 Thread Herbert Xu
On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> doing modular exponentiation in mpi_powm() without rescheduling.  If all
> threads do it, it locks up the system.  Moreover, it can cause
> rcu_sched-stall warnings.
> 
> Notwithstanding the insanity of doing this calculation in kernel mode
> rather than in userspace, fix it by calling cond_resched() as each bit
> from the exponent is processed.  It's still noninterruptible, but at
> least it's preemptible now.
> 
> Cc: sta...@vger.kernel.org # v4.12+
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-10 Thread Herbert Xu
On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> doing modular exponentiation in mpi_powm() without rescheduling.  If all
> threads do it, it locks up the system.  Moreover, it can cause
> rcu_sched-stall warnings.
> 
> Notwithstanding the insanity of doing this calculation in kernel mode
> rather than in userspace, fix it by calling cond_resched() as each bit
> from the exponent is processed.  It's still noninterruptible, but at
> least it's preemptible now.
> 
> Cc: sta...@vger.kernel.org # v4.12+
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-07 Thread Eric Biggers
On Tue, Nov 07, 2017 at 10:38:30AM -0800, Mat Martineau wrote:
> 
> Eric,
> 
> On Mon, 6 Nov 2017, Eric Biggers wrote:
> 
> >From: Eric Biggers 
> >
> >On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> >largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> >doing modular exponentiation in mpi_powm() without rescheduling.  If all
> >threads do it, it locks up the system.  Moreover, it can cause
> >rcu_sched-stall warnings.
> >
> >Notwithstanding the insanity of doing this calculation in kernel mode
> >rather than in userspace, fix it by calling cond_resched() as each bit
> >from the exponent is processed.  It's still noninterruptible, but at
> >least it's preemptible now.
> 
> cond_resched() is in the outer loop and gets called every
> BITS_PER_LONG bits. That seems to be often enough for the system
> that was taking 10+ seconds, and might be ok for slower processors.
> 
> Was your intent to call cond_resched() for every bit as you
> described in the commit message?
> 

You're right, the cond_resched() is actually once per "limb", not once per bit.

With the largest permitted inputs (16384 bits), each limb of the exponent takes
about 38 milliseconds on an x86_64 CPU.  Therefore on some other CPUs it will
probably take 100+ milliseconds, which is much too long.  So I guess it should
do cond_resched() for each bit.  I'll send a revised patch...

Eric


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-07 Thread Eric Biggers
On Tue, Nov 07, 2017 at 10:38:30AM -0800, Mat Martineau wrote:
> 
> Eric,
> 
> On Mon, 6 Nov 2017, Eric Biggers wrote:
> 
> >From: Eric Biggers 
> >
> >On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> >largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> >doing modular exponentiation in mpi_powm() without rescheduling.  If all
> >threads do it, it locks up the system.  Moreover, it can cause
> >rcu_sched-stall warnings.
> >
> >Notwithstanding the insanity of doing this calculation in kernel mode
> >rather than in userspace, fix it by calling cond_resched() as each bit
> >from the exponent is processed.  It's still noninterruptible, but at
> >least it's preemptible now.
> 
> cond_resched() is in the outer loop and gets called every
> BITS_PER_LONG bits. That seems to be often enough for the system
> that was taking 10+ seconds, and might be ok for slower processors.
> 
> Was your intent to call cond_resched() for every bit as you
> described in the commit message?
> 

You're right, the cond_resched() is actually once per "limb", not once per bit.

With the largest permitted inputs (16384 bits), each limb of the exponent takes
about 38 milliseconds on an x86_64 CPU.  Therefore on some other CPUs it will
probably take 100+ milliseconds, which is much too long.  So I guess it should
do cond_resched() for each bit.  I'll send a revised patch...

Eric


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-07 Thread Mat Martineau


Eric,

On Mon, 6 Nov 2017, Eric Biggers wrote:


From: Eric Biggers 

On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling.  If all
threads do it, it locks up the system.  Moreover, it can cause
rcu_sched-stall warnings.

Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed.  It's still noninterruptible, but at
least it's preemptible now.


cond_resched() is in the outer loop and gets called every BITS_PER_LONG 
bits. That seems to be often enough for the system that was taking 10+ 
seconds, and might be ok for slower processors.


Was your intent to call cond_resched() for every bit as you described in 
the commit message?


Thanks for the fix.


Mat




Cc: sta...@vger.kernel.org # v4.12+
Signed-off-by: Eric Biggers 
---
lib/mpi/mpi-pow.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index e24388a863a7..f089a52dbbdb 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
 *   however I decided to publish this code under the plain GPL.
 */

+#include 
#include 
#include "mpi-internal.h"
#include "longlong.h"
@@ -263,6 +264,8 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
break;
e = ep[i];
c = BITS_PER_MPI_LIMB;
+
+   cond_resched();
}

/* We shifted MOD, the modulo reduction argument, left 
MOD_SHIFT_CNT
--
2.15.0




--
Mat Martineau
Intel OTC


Re: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-07 Thread Mat Martineau


Eric,

On Mon, 6 Nov 2017, Eric Biggers wrote:


From: Eric Biggers 

On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling.  If all
threads do it, it locks up the system.  Moreover, it can cause
rcu_sched-stall warnings.

Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed.  It's still noninterruptible, but at
least it's preemptible now.


cond_resched() is in the outer loop and gets called every BITS_PER_LONG 
bits. That seems to be often enough for the system that was taking 10+ 
seconds, and might be ok for slower processors.


Was your intent to call cond_resched() for every bit as you described in 
the commit message?


Thanks for the fix.


Mat




Cc: sta...@vger.kernel.org # v4.12+
Signed-off-by: Eric Biggers 
---
lib/mpi/mpi-pow.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index e24388a863a7..f089a52dbbdb 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
 *   however I decided to publish this code under the plain GPL.
 */

+#include 
#include 
#include "mpi-internal.h"
#include "longlong.h"
@@ -263,6 +264,8 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
break;
e = ep[i];
c = BITS_PER_MPI_LIMB;
+
+   cond_resched();
}

/* We shifted MOD, the modulo reduction argument, left 
MOD_SHIFT_CNT
--
2.15.0




--
Mat Martineau
Intel OTC


[PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-06 Thread Eric Biggers
From: Eric Biggers 

On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling.  If all
threads do it, it locks up the system.  Moreover, it can cause
rcu_sched-stall warnings.

Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed.  It's still noninterruptible, but at
least it's preemptible now.

Cc: sta...@vger.kernel.org # v4.12+
Signed-off-by: Eric Biggers 
---
 lib/mpi/mpi-pow.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index e24388a863a7..f089a52dbbdb 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
  *  however I decided to publish this code under the plain GPL.
  */
 
+#include 
 #include 
 #include "mpi-internal.h"
 #include "longlong.h"
@@ -263,6 +264,8 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
break;
e = ep[i];
c = BITS_PER_MPI_LIMB;
+
+   cond_resched();
}
 
/* We shifted MOD, the modulo reduction argument, left 
MOD_SHIFT_CNT
-- 
2.15.0



[PATCH] lib/mpi: call cond_resched() from mpi_powm() loop

2017-11-06 Thread Eric Biggers
From: Eric Biggers 

On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling.  If all
threads do it, it locks up the system.  Moreover, it can cause
rcu_sched-stall warnings.

Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed.  It's still noninterruptible, but at
least it's preemptible now.

Cc: sta...@vger.kernel.org # v4.12+
Signed-off-by: Eric Biggers 
---
 lib/mpi/mpi-pow.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index e24388a863a7..f089a52dbbdb 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@
  *  however I decided to publish this code under the plain GPL.
  */
 
+#include 
 #include 
 #include "mpi-internal.h"
 #include "longlong.h"
@@ -263,6 +264,8 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
break;
e = ep[i];
c = BITS_PER_MPI_LIMB;
+
+   cond_resched();
}
 
/* We shifted MOD, the modulo reduction argument, left 
MOD_SHIFT_CNT
-- 
2.15.0