Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Vincent Lefevre
On 2024-02-12 08:18:09 -0500, Stefan Koch wrote:
> This was mostly my idea.  I thought that if _mp_alloc was small enough (in
> most cases 1), then  instead of _mp_d being the pointer to the first lib it
> was just the first limb.

Note that the size of a limb may be larger than the size of a pointer.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Stefan Koch
On Mon, Feb 12, 2024 at 7:01 AM Richard Biener  wrote:
>
> On Mon, 12 Feb 2024, Richard Biener wrote:
>
> > On Mon, 12 Feb 2024, Torbj?rn Granlund wrote:
> >
> > > marco.bodr...@tutanota.com writes:
> > >
> > >   But implementing it with the current mpz type is "impossible".
> > >   I mean, one should break the current interface.
> > >   Currently, _mp_d is always a pointer AND it always point to a readable 
> > > limb. Even if _mp_alloc is zero.
> > >
> > > If we set alloc = 0 and size >= 2^30, then that means the the pointer
> > > field is actually a numeric value, and perhaps the low 30 bits of the
> > > size field is more bits for the numeric value.  :-)
> >
> > Since both _mp_alloc is signed _mp_alloc < 0 could indicate an inline
> > limb, you can then declare _mp_size irrelevant, fixed to one limb
> > plus 2 * sizeof (int) * 8 - 1 bits.  Though that missing bit is
> > likely going to be awkward (also the position of the sign-bit given
> > endianesses).
>
> Oh, and _mp_alloc < 0 could also simply mean the allocation is
> "inline", aka
>
> typedef struct
> {
>   int _mp_alloc;/* Number of *limbs* allocated and pointed
>to by the _mp_d field.  */
>   int _mp_size; /* abs(_mp_size) is the number of limbs
> the
>last field points to.  If _mp_size is
>negative this is a negative number.  */
>   union {
>  mp_limb_t *_mp_d; /* Pointer to the limbs.  */
>  mp_limb_t _mp_inline_limbs[]; /* Inline limbs if _mp_alloc < 0.  */
>   };
> } __mpz_struct;
>
> you could then no longer pass such __mpz_struct by value though but
> it would allow on-stack allocations of (small) temporary __mpz_struct
> using alloca.

This was mostly my idea.  I thought that if _mp_alloc was small enough (in
most cases 1), then  instead of _mp_d being the pointer to the first lib it
was just the first limb.

This could be relatively easy to implement.  (That is assuming that the _mp_d
is not part of the public interface.)  In that case the x._mp_d could be
replaced with a macro _mpz_mp_d(x).  Where

#define _mpz_mp_d(x) ((x)._mp_alloc == 1) ? x._mp_inline_limbs : x._mp_d)

Stefan
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Vincent Lefevre
On 2024-02-12 13:01:45 +0100, Richard Biener wrote:
> Oh, and _mp_alloc < 0 could also simply mean the allocation is
> "inline", aka
> 
> typedef struct
> {
>   int _mp_alloc;/* Number of *limbs* allocated and pointed
>to by the _mp_d field.  */
>   int _mp_size; /* abs(_mp_size) is the number of limbs 
> the
>last field points to.  If _mp_size is
>negative this is a negative number.  */
>   union {
>  mp_limb_t *_mp_d; /* Pointer to the limbs.  */
>  mp_limb_t _mp_inline_limbs[]; /* Inline limbs if _mp_alloc < 0.  */
>   };
> } __mpz_struct;

It is not possible to use a flexible array member in a union.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Richard Biener
On Mon, 12 Feb 2024, Richard Biener wrote:

> On Mon, 12 Feb 2024, Torbj?rn Granlund wrote:
> 
> > marco.bodr...@tutanota.com writes:
> > 
> >   But implementing it with the current mpz type is "impossible".
> >   I mean, one should break the current interface.
> >   Currently, _mp_d is always a pointer AND it always point to a readable 
> > limb. Even if _mp_alloc is zero.
> > 
> > If we set alloc = 0 and size >= 2^30, then that means the the pointer
> > field is actually a numeric value, and perhaps the low 30 bits of the
> > size field is more bits for the numeric value.  :-)
> 
> Since both _mp_alloc is signed _mp_alloc < 0 could indicate an inline
> limb, you can then declare _mp_size irrelevant, fixed to one limb
> plus 2 * sizeof (int) * 8 - 1 bits.  Though that missing bit is
> likely going to be awkward (also the position of the sign-bit given
> endianesses).

Oh, and _mp_alloc < 0 could also simply mean the allocation is
"inline", aka

typedef struct
{
  int _mp_alloc;/* Number of *limbs* allocated and pointed
   to by the _mp_d field.  */
  int _mp_size; /* abs(_mp_size) is the number of limbs 
the
   last field points to.  If _mp_size is
   negative this is a negative number.  */
  union {
 mp_limb_t *_mp_d; /* Pointer to the limbs.  */
 mp_limb_t _mp_inline_limbs[]; /* Inline limbs if _mp_alloc < 0.  */
  };
} __mpz_struct;

you could then no longer pass such __mpz_struct by value though but
it would allow on-stack allocations of (small) temporary __mpz_struct
using alloca.

Richard.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Richard Biener
On Mon, 12 Feb 2024, Torbj?rn Granlund wrote:

> marco.bodr...@tutanota.com writes:
> 
>   But implementing it with the current mpz type is "impossible".
>   I mean, one should break the current interface.
>   Currently, _mp_d is always a pointer AND it always point to a readable 
> limb. Even if _mp_alloc is zero.
> 
> If we set alloc = 0 and size >= 2^30, then that means the the pointer
> field is actually a numeric value, and perhaps the low 30 bits of the
> size field is more bits for the numeric value.  :-)

Since both _mp_alloc is signed _mp_alloc < 0 could indicate an inline
limb, you can then declare _mp_size irrelevant, fixed to one limb
plus 2 * sizeof (int) * 8 - 1 bits.  Though that missing bit is
likely going to be awkward (also the position of the sign-bit given
endianesses).

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Torbjörn Granlund
marco.bodr...@tutanota.com writes:

  But implementing it with the current mpz type is "impossible".
  I mean, one should break the current interface.
  Currently, _mp_d is always a pointer AND it always point to a readable limb. 
Even if _mp_alloc is zero.

If we set alloc = 0 and size >= 2^30, then that means the the pointer
field is actually a numeric value, and perhaps the low 30 bits of the
size field is more bits for the numeric value.  :-)

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread marco . bodrato
Ciao,

12 feb 2024, 11:54 da paul.zimmerm...@inria.fr:

> as Niels said, I fear the cost of the tests might make the benefit vanish.
> But to be sure, the only way is to try to implement this idea inside GMP,
> which involves quite some work.
>
But implementing it with the current mpz type is "impossible".
I mean, one should break the current interface.
Currently, _mp_d is always a pointer AND it always point to a readable limb. 
Even if _mp_alloc is zero.

To implement the idea inside GMP one should choose:
 - break current interface, or
 - fully implement a new layer (not mpn, mpz, mpf, but... mpZ?).

Ĝis,
mb
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-12 Thread Paul Zimmermann
as Niels said, I fear the cost of the tests might make the benefit vanish.
But to be sure, the only way is to try to implement this idea inside GMP,
which involves quite some work.

Paul Zimmermann
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-11 Thread Niels Möller
Stefan Koch  writes:

> Anyway, I know you guys take performance very seriously, so I was wondering
> if you had experience with this?
> 1. Is there real potential for performance improvements in cas systems if
> gmp has a small integer optimization?

For usecases where almost all integers are small, I would expect a
small-integer optimization could pay off. For historic examples, look at
lisp systems (or any other languages where there's a single integer type
that can grow as large as needed, internally represented as either a
"fixnum" or "bignum" depending on size). Sparc processors (for 32-bit)
even had some special instructions that could in theory make the
implementation more efficient.

> 2. Have you looked at an approach like this in the past?  and if so what
> have you found?

I'm not aware of any substantial work along these lines for GMP.

> 3. If you think this is something worth pursuing, do you have any advice?

I doubt it is useful for the main mpz_t type in GMP. For numbers that
are large enough, say 100 limbs or more, the runtime overhead of an
extra branch on each operation to make the fixnum/bignum distinction is
likely not that costly. But performance for numbers of just a few limbs
(e.g., sizes relevant to public key cryptography) is also quite
important, and for those sizes I suspect the cost of the extra check
will be significant. In the context of GMP development, "small number
optimizations" usually means optimizations for for numbers of size up to
5-10 limbs or so.

So if you want to pursue a fixnum/bignum small-number optimization, I'd
suggest doing it as a separate library interface, defining a type
intended to represent numbers that are usually small, but uses gmp when
numbers grow too large for the fixnum representation. One classic way to
implement it would be to use one or two of the low bits of a computer
word as tag bits. If tag is zero, the rest of the bits represents a
signed fixnum. If tag is non-zero, you get a pointer to a heap-allocated
bignum by clearing the tag bits (which is fine, because pointers usually
require some alignment anyway).

To reason about potential gains, you'd need to keep in mind the pretty
high cost of a mis-predicted branch, compared to the cost of a memory
access, for relevant assumptions on cachedness.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Has there been historical work done to investigate small integer optimization?

2024-02-11 Thread Marc Glisse

On Sun, 11 Feb 2024, Stefan Koch wrote:


There is a fair amount of recent activity in the c++ world to optimize for
speed by minimizing processor cache misses.  One example is for in the
std::string class there is a concept of "short string optimization".  What
this does is that for shorter strings the data is stored in the string
class itself without need for dynamic memory allocation.   This has two
advantages: it does not need to make a call to dynamic memory allocation,
and the string data is in the same location as the meta data, and so only
in one cache fetch.  Both of which are potentially slow.

When I was looking at the implementation the _mp_d member of __mpz_struct
is 8 bytes, which is also the size of a single limb entry.  (I suspect that
this is common on recent systems, but it is probably not be universal.)  My
thought was that if _mp_alloc is one, then we have one limb needed.  In
that case, the space of _mp_d could serve as the limb itself instead of the
pointer to the limb.  That would eliminate dynamic memory allocation for
all integers under 2^64.

I was coming at this from a computer algebra system standpoint.  I think
(but I really don't know), that much of the arithmetic that is done is done
with integers under 2^64, and so those systems may be much faster if gmp
had a small integer optimization.

This approach will add a check when accessing the limb data for all
accesses but will save accessing potentially non cpu cache data for small
integers.  It may turn out that the small delay is not so noticeable for
large numbers, and the optimization for small numbers makes a large
difference for small numbers.

A second thought is to do the arithmetic if the sources are and result will
fit in a single limb with inline functions or macros. That would eliminate
the function call entirely for small numbers.  It would make the code
longer, and add additional overhead for when the integers are not short,
but is it not clear how much of an impact that is for lager numbers.

Anyway, I know you guys take performance very seriously, so I was wondering
if you had experience with this?
1. Is there real potential for performance improvements in cas systems if
gmp has a small integer optimization?
2. Have you looked at an approach like this in the past?  and if so what
have you found?
3. If you think this is something worth pursuing, do you have any advice?


Several software already wrap GMP using this kind of strategy: a "number" is 
either a 63-bit integer or a pointer to a GMP big integer (this is 
determined by one bit), operations on 63-bit integers are inline.


What is efficient depends on your use case. If you want to use the same 
integer type everywhere, you should definitely optimize for the small 
case. If you use separate types for small vs big, and big integers are 
always big, then you should optimize for the big case.


There is also an intermediate size where numbers occupy just a few limbs, 
where allocation can still dominate computation, and a small-string-like 
optimization can help significantly (see Boost.Multiprecision's cpp_int). 
Again, a one-size-fits-all seems optimistic.


--
Marc Glisse
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel