#13352: Running time improvement of the bitset_len method
---------------------------------+------------------------------------------
Reporter: dcoudert | Owner: jason
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-5.3
Component: misc | Resolution:
Keywords: | Work issues:
Report Upstream: N/A | Reviewers:
Authors: David Coudert | Merged in:
Dependencies: | Stopgaps:
---------------------------------+------------------------------------------
Old description:
> This patch improves the running time of the {{{bitset_len}}} method by
> using fast methods for counting bits in 32 and 64 bits integers
> (popcount). It
> - adds file {{{bitcount.pxi}}} to {{{sage/misc/}}}. This file contains
> the functions for counting bits in 32 or 64 bits integers
> - adds corresponding tests to file {{{misc_c.pyx}}}
> - modifies the {{{bitset_len}}} function accordingly
>
> Before:
> {{{
> sage: x = FrozenBitset('10'*1003002); len(x)
> 1003002
> sage: %timeit len(x)
> 125 loops, best of 3: 5.27 ms per loop
> sage: x = FrozenBitset('10'*10705); len(x)
> 10705
> sage: %timeit len(x)
> 625 loops, best of 3: 56.3 µs per loop
> }}}
>
> After:
> {{{
> sage: x = FrozenBitset('10'*1003002); len(x)
> 1003002
> sage: %timeit len(x)
> 625 loops, best of 3: 865 µs per loop
> sage: x = FrozenBitset('10'*10705); len(x)
> 10705
> sage: %timeit len(x)
> 625 loops, best of 3: 9.39 µs per loop
> }}}
>
> The ``popcount_32`` method is the same than the function used in patch
> #12371.
>
> The alternative would be to use functions {{{__builtin_popcount()}}} and
> {{{__builtin_popcountll()}}}. They are extremely fast but they required
> to add flag ``-mpopcnt`` or ``-msse4.2`` to the gcc compiler. Can we do
> that? how?
New description:
This patch improves the running time of the {{{bitset_len}}} method by
using fast methods for counting bits in 32 and 64 bits integers
(popcount). It
- adds file {{{bitcount.pxi}}} to {{{sage/misc/}}}. This file contains the
functions for counting bits in 32 or 64 bits integers
- adds corresponding tests to file {{{misc_c.pyx}}}
- modifies the {{{bitset_len}}} function accordingly
Before:
{{{
sage: x = FrozenBitset('10'*1003002); len(x)
1003002
sage: %timeit len(x)
125 loops, best of 3: 5.27 ms per loop
sage: x = FrozenBitset('10'*10705); len(x)
10705
sage: %timeit len(x)
625 loops, best of 3: 56.3 µs per loop
}}}
After:
{{{
sage: x = FrozenBitset('10'*1003002); len(x)
1003002
sage: %timeit len(x)
625 loops, best of 3: 865 µs per loop
sage: x = FrozenBitset('10'*10705); len(x)
10705
sage: %timeit len(x)
625 loops, best of 3: 9.39 µs per loop
}}}
The ``popcount_32`` method is the same than the function used in patch
#12371.
The alternative would be to use functions {{{__builtin_popcount()}}} and
{{{__builtin_popcountll()}}}. They are extremely fast but they required to
add flag ``-mpopcnt`` or ``-msse4.2`` to the gcc compiler. Can we do that?
how?
APPLY:
* trac_13352_bitset_len.patch
--
Comment (by dcoudert):
Replying to [comment:2 leif]:
> I'd rather "implement" the functions in C (where you can use the C
preprocessor and `__builtin_popcount()` if available) and write a thin
Cython wrapper.
>
> GCC e.g. chooses efficient machine implementations (probably a single
instruction), depending on the (configured or specified) target and
perhaps also further compiler flags.
As reported in https://groups.google.com/forum/#!topic/sage-
devel/GnudTUwbGG4, I tried to implement the function in C with
{{{__builtin_popcount()}}}. I was able to compile successfully, but then
sage was not working anymore.
See attached patch
http://trac.sagemath.org/sage_trac/attachment/ticket/13352/trac_13352
-bitcount-in-c.patch.
> Note that GMP has functions like `mpz_popcount()` for `mpz_t`s as
well.[[BR]]
It could be another solution, but I have not been able yet to find how to
access the functions used for int behind?
> Not sure what the `parallel` refers to ... ;-)
In fact, the 4 lines {{{ n = __COUNT32__(n, 0) }}} can be executed in
parallel and gcc is sometimes able to optimize the sequence of
instructions. For instance, all methods have similar running times on my
macbook air, but parallel methods are slower on my old PC (system still in
32-bits...).
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13352#comment:3>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.