Re: Stack allocation

2014-06-09 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  t...@gmplib.org (Torbjörn Granlund) writes:
  
   I decided to lower the TMP_SALLOC limit to a bit under 2^15 from the
   previous 2^16.
  
  What's the a relative cost of allocation vs simple operations like
  mpn_add_n? For 2^15 limit, that's 512 limbs (on 64-bit). I guess
  overhead of a malloc call might be comparable to an mpn_add_n with n =
  512, but it ought to be a lot faster than, e.g., an n = 256 mpn_mul_n.
  
It might be the case that malloc's performance vary a lot between
implementations.  I wouldn't be surprised if BSD and GNU malloc are
several times faster than malloc from the various non-free Unices.

I don't expect the free mallocs to need even near time(mpn_add_n(512)).
But perhaps they need 10% of that, which is still too much for GMP.

Fortunately, I don't think we make dynamic allocations for O(n)
operations.

  Would it make sense to lower the limit further to, say, 128 limbs?

Who knows.  I played with that, but it does not decrease stack usage as
much as one might expect (only 20% as measured by the test suite).
Lowering the limit adds gradually more overhead but gives rapidly
diminishing returns in stack use.

  Nice! That seems very reasonable on current desktop and server machines,
  but it might still be a bit large if people use gmp on embedded systems.
  
Perhaps alloca is not useful there?

There is currently one oddity in that the limit is more limbs on 32-bit
machines than on 64-bit machines.

Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Stack allocation

2014-06-09 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   Would it make sense to lower the limit further to, say, 128 limbs?

 Who knows.  I played with that, but it does not decrease stack usage as
 much as one might expect (only 20% as measured by the test suite).

I see.

   Nice! That seems very reasonable on current desktop and server machines,
   but it might still be a bit large if people use gmp on embedded systems.
   
 Perhaps alloca is not useful there?

What stack usage do you get if you disable use of stack allocation?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Stack allocation

2014-06-09 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  What stack usage do you get if you disable use of stack allocation?
  
A good question.  My measurements are blunt, using 'ulimit -s'.  I don't
know how to measure it accurately without instrumenting the code.

I assume that GMP will use around 1 KiB, since its recursion isn't very
deep.


Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Stack allocation

2014-06-08 Thread Torbjörn Granlund
I made the automated GMP nightbuilds use at most 512 KiB.

Now I realise that the testsuite needs might both overestimate and
underestimate the actual requirements.  The overestimate will come from
tests/mpn where we call functions outside their normal operand size
envelope.  Underestimation might happen because we don't use large
enough operands.

I tried lowering the TMP_SALLOC limit from 2^16 to 2^15 and 2^14, and
checked the resulting stack usage,

For the current limit 2^16, the use is about 512 KiB, depending a little
on the various THRESHOLDs.

For 2^15 the maximum use dropped to about 256 KiB, i.e., linear as
expected.

For 2^14 the maximum use didn't drop much at all, since here some
direct TMP_SALLOC allocations hurt.


Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Stack allocation

2014-06-08 Thread Torbjörn Granlund
I decided to lower the TMP_SALLOC limit to a bit under 2^15 from the
previous 2^16.  With that change and a couple of other allocaton
changes, GMP's now using less than 300 KiB of stack.

The nightly builds attempt to enforce this limit.


Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Stack allocation

2014-06-06 Thread Torbjörn Granlund
This started as a thread in gmp-discuss about crashes due to stack
overflow.

I modified the TMP_SALLOC macro in gmp-impl.h to print its allocation
argument.  I did this as I suspected that we sometimes invoke the SALLOC
form inappropriately for huge allocation.

Below is a sample output.  We clearly have some bad allocation code,
since TMP_SALLOC should only be used for small allocations.

ALLOC:721952
ALLOC:696992
PASS: t-mul
--
ALLOC:664480
ALLOC:664352
ALLOC:688288
ALLOC:688288
ALLOC:619296
ALLOC:619296
PASS: t-tdiv
--
ALLOC:642208
ALLOC:643744
ALLOC:642208
ALLOC:643744
ALLOC:642208
PASS: t-gcd
--
ALLOC:667424
ALLOC:667424
ALLOC:667424
ALLOC:661664
ALLOC:661664
ALLOC:661664
ALLOC:661664
ALLOC:661664
ALLOC:661664
ALLOC:667424
PASS: reuse
--
ALLOC:672544
ALLOC:652448
PASS: t-remove



Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Stack allocation

2014-06-06 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes:

  I modified the TMP_SALLOC macro in gmp-impl.h to print its allocation
  argument.  I did this as I suspected that we sometimes invoke the SALLOC
  form inappropriately for huge allocation.
  
After adding printing of __FILE__ and __LINE__ to the diagnostics code,
I identified two bad TMP_SALLOC_LIMBS invocations in mpn/generic/mul.c.
These are now patched.

The code can still be improved in many ways, including trimming of
allocation.


Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel