#15820: Improve efficiency of the path semigroup
--------------------------------------------------+------------------------
       Reporter:  SimonKing                       |        Owner:
           Type:  enhancement                     |       Status:  new
       Priority:  major                           |    Milestone:  sage-6.2
      Component:  algebra                         |   Resolution:
       Keywords:  quiver path algebra efficiency  |    Merged in:
        Authors:                                  |    Reviewers:
Report Upstream:  N/A                             |  Work issues:
         Branch:                                  |       Commit:
   Dependencies:  #12630                          |     Stopgaps:
--------------------------------------------------+------------------------

Comment (by SimonKing):

 Replying to [comment:13 ncohen]:
 > Hmmmmmmm... Well, looks like the last one is the best, even though it is
 a bit slow for hashing.

 What do you mean by "the last one"? Using GMP's `mpz_t`directly, which is
 `Path_v5` in the attachment? Or the improved "semicompressed" version of
 `uint64_t*` (storing chunks of size `2^n` rather then in the smallest
 possible chunk size), which is `Path_v4`?

 > If the sizeof a block is a power of two it should be quite good, don't
 you think ?

 Should be. My own summary of the timings above would be this. We have 6
 benchmarks, and in the following list I am giving for each benchmark a
 small, a medium and a large example. In the end, `+` or `-` indicate
 whether the respective implementation did well/poorly in the three
 examples.

 1. Concatenation
 {{{
                      5x5   50x50  1000x1000
 Uncompressed long : 739 ns  885 ns 3.06 us  - -
 Uncompressed short: 700 ns  808 ns 2.13 us
 Uncompressed char : 679 ns  772 ns 1.31 us    +
 Compressed   long : 676 ns  759 ns 1.61 us
 Compressed   short: 718 ns  722 ns 1.98 us   +
 Semicompr.   long : 684 ns  827 ns 1.42 us
 Integer           : 645 ns 1.82 us 1.78 us  +-
 GMP          mpz_t: 723 ns  799 ns 1.41 us
 }}}

 2. Hash
 {{{
                      5      100     1000
 Uncompressed long : 135 ns 333 ns  2.09 us  +
 Uncompressed short: 138 ns 383 ns  2.61 us  +--
 Uncompressed char : 133 ns 374 ns  2.53 us  +
 Compressed   long : 132 ns 155 ns   466 ns  +++
 Compressed   short: 132 ns 206 ns   951 ns  +
 Semicompr.   long : 148 ns 194 ns   594 ns
 Integer           : 164 ns 274 ns  1.94 us  -
 GMP          mpz_t: 159 ns 266 ns  1.93 us
 }}}

 3. == for equal paths
 {{{
                      5       100     1000
 Uncompressed long : 1.63 us 23   us  212   us ---
 Uncompressed short: 1.08 us 14.9 us  137   us
 Uncompressed char : 1.08 us 12.5 us  121   us
 Compressed   long :  710 ns  4.4 us   38   us
 Compressed   short:  723 ns  5.3 us   47.5 us
 Semicompr.   long :  498 ns  712 ns    2.4 us
 Integer           :  667 ns  658 ns    817 ns
 GMP          mpz_t:  466 ns  485 ns    693 ns +++
 }}}

 4. == for "easily" unequal paths
 {{{
                      5       100     1000
 Uncompressed long : 634 ns   686 ns   698 ns
 Uncompressed short: 613 ns   603 ns   635 ns
 Uncompressed char : 581 ns   599 ns   636 ns
 Compressed   long : 692 ns   729 ns   752 ns  --
 Compressed   short: 577 ns   588 ns   607 ns
 Semicompr.   long : 499 ns   526 ns   545 ns
 Integer           : 746 ns   731 ns   733 ns --
 GMP          mpz_t: 451 ns   468 ns   482 ns +++
 }}}

 5. == for "difficult" unequal paths
 {{{
                      5       100      1000
 Uncompressed long : 1.65 us   22 us    213 us   ---
 Uncompressed short: 1.15 us   12.8 us  140 us
 Uncompressed char : 1.06 us   12.7 us  121 us
 Compressed   long :  706 ns    4.4 us   37.6 us
 Compressed   short:  754 ns    5.3 us   47.3 us
 Semicompr.   long :  506 ns    704 ns   2.5 us
 Integer           :  710 ns    719 ns   755 ns
 GMP          mpz_t:  448 ns    457 ns   470 ns  +++
 }}}

 6. startswith (here we only have one example size, one where the answer is
 True, and one where the answer is False)
 {{{
                      yes      no
 Uncompressed long :  325 ns  311 ns
 Uncompressed short:  336 ns  325 ns
 Uncompressed char :  324 ns  315 ns
 Compressed   long :  272 ns  257 ns  +
 Compressed   short:  272 ns  256 ns  +
 Semicompr.   long :  263 ns  232 ns  ++
 Integer           : 2.46 us 2.45 us  --
 GMP          mpz_t:  673 ns  657 ns
 }}}

 So, I don't think there is a clear winner in the competition.

 > This being said, perhaps  it should be implemented like the bitset
 stuff, in a algebraic-free file, to store sequences of integers as you
 said.

 Yes, it would make sense.

 > How do you define your hashing function and your comparison function, by
 the way ?

 Depends on the implementation. Of course, when I use `Integer` or `mpz_t`,
 I am using the hash function provided by GMP. When I implement a `block*`,
 where `block` is either `char`, `uint32_t` or `uint64_t`, I am using some
 hash function for lists that I found in the internet a while ago (I don't
 recall where). It is as follows:
 {{{
 #!python
     def __hash__(self):
         cdef block h
         cdef block* ptr
         h = <block>(self.start^(self.end<<16)^(self._len))
         ptr = <block *>self.data
         cdef size_t i
         h ^= FNV_offset
         for i from 0<=i<self._len:
             h ^= ptr[i]
             h *= FNV_prime
         if h==-1:
             return -2
         return h
 }}}
 where `FNV_prime` and `FNV_offset` depend on the architecture. I am
 surprised that this is faster than GMP's hash function.

--
Ticket URL: <http://trac.sagemath.org/ticket/15820#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to