#13352: Running time improvement of the bitset_len method
---------------------------------+------------------------------------------
       Reporter:  dcoudert       |         Owner:  jason       
           Type:  enhancement    |        Status:  needs_review
       Priority:  major          |     Milestone:  sage-5.3    
      Component:  misc           |    Resolution:              
       Keywords:                 |   Work issues:              
Report Upstream:  N/A            |     Reviewers:              
        Authors:  David Coudert  |     Merged in:              
   Dependencies:                 |      Stopgaps:              
---------------------------------+------------------------------------------

Old description:

> This patch improves the running time of the {{{bitset_len}}} method by
> using fast methods for counting bits in 32 and 64 bits integers
> (popcount). It
> - adds file {{{bitcount.pxi}}} to {{{sage/misc/}}}. This file contains
> the functions for counting bits in 32 or 64 bits integers
> - adds corresponding tests to file {{{misc_c.pyx}}}
> - modifies the {{{bitset_len}}} function accordingly
>
> Before:
> {{{
> sage: x = FrozenBitset('10'*1003002); len(x)
> 1003002
> sage: %timeit len(x)
> 125 loops, best of 3: 5.27 ms per loop
> sage: x = FrozenBitset('10'*10705); len(x)
> 10705
> sage: %timeit len(x)
> 625 loops, best of 3: 56.3 µs per loop
> }}}
>
> After:
> {{{
> sage: x = FrozenBitset('10'*1003002); len(x)
> 1003002
> sage: %timeit len(x)
> 625 loops, best of 3: 865 µs per loop
> sage: x = FrozenBitset('10'*10705); len(x)
> 10705
> sage: %timeit len(x)
> 625 loops, best of 3: 9.39 µs per loop
> }}}
>
> The ``popcount_32`` method is the same than the function used in patch
> #12371.
>
> The alternative would be to use functions {{{__builtin_popcount()}}} and
> {{{__builtin_popcountll()}}}. They are extremely fast but they required
> to add flag ``-mpopcnt`` or ``-msse4.2`` to the gcc compiler. Can we do
> that? how?

New description:

 This patch improves the running time of the {{{bitset_len}}} method by
 using fast methods for counting bits in 32 and 64 bits integers
 (popcount). It
 - adds file {{{bitcount.pxi}}} to {{{sage/misc/}}}. This file contains the
 functions for counting bits in 32 or 64 bits integers
 - adds corresponding tests to file {{{misc_c.pyx}}}
 - modifies the {{{bitset_len}}} function accordingly

 Before:
 {{{
 sage: x = FrozenBitset('10'*1003002); len(x)
 1003002
 sage: %timeit len(x)
 125 loops, best of 3: 5.27 ms per loop
 sage: x = FrozenBitset('10'*10705); len(x)
 10705
 sage: %timeit len(x)
 625 loops, best of 3: 56.3 µs per loop
 }}}

 After:
 {{{
 sage: x = FrozenBitset('10'*1003002); len(x)
 1003002
 sage: %timeit len(x)
 625 loops, best of 3: 865 µs per loop
 sage: x = FrozenBitset('10'*10705); len(x)
 10705
 sage: %timeit len(x)
 625 loops, best of 3: 9.39 µs per loop
 }}}

 The ``popcount_32`` method is the same than the function used in patch
 #12371.

 The alternative would be to use functions {{{__builtin_popcount()}}} and
 {{{__builtin_popcountll()}}}. They are extremely fast but they required to
 add flag ``-mpopcnt`` or ``-msse4.2`` to the gcc compiler. Can we do that?
 how?


 APPLY:

 * trac_13352_bitset_len.patch

--

Comment (by dcoudert):

 Replying to [comment:2 leif]:
 > I'd rather "implement" the functions in C (where you can use the C
 preprocessor and `__builtin_popcount()` if available) and write a thin
 Cython wrapper.
 >
 > GCC e.g. chooses efficient machine implementations (probably a single
 instruction), depending on the (configured or specified) target and
 perhaps also further compiler flags.

 As reported in https://groups.google.com/forum/#!topic/sage-
 devel/GnudTUwbGG4, I tried to implement the function in C with
 {{{__builtin_popcount()}}}. I was able to compile successfully, but then
 sage was not working anymore.
 See attached patch
 http://trac.sagemath.org/sage_trac/attachment/ticket/13352/trac_13352
 -bitcount-in-c.patch.


 > Note that GMP has functions like `mpz_popcount()` for `mpz_t`s as
 well.[[BR]]
 It could be another solution, but I have not been able yet to find how to
 access the functions used for int behind?


 > Not sure what the `parallel` refers to ... ;-)
 In fact, the 4 lines {{{ n = __COUNT32__(n, 0) }}} can be executed in
 parallel and gcc is sometimes able to optimize the sequence of
 instructions. For instance, all methods have similar running times on my
 macbook air, but parallel methods are slower on my old PC (system still in
 32-bits...).

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13352#comment:3>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to