#15820: Improve efficiency of the path semigroup
--------------------------------------------------+------------------------
Reporter: SimonKing | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: sage-6.2
Component: algebra | Resolution:
Keywords: quiver path algebra efficiency | Merged in:
Authors: | Reviewers:
Report Upstream: N/A | Work issues:
Branch: | Commit:
Dependencies: #12630 | Stopgaps:
--------------------------------------------------+------------------------
Comment (by SimonKing):
Replying to [comment:13 ncohen]:
> Hmmmmmmm... Well, looks like the last one is the best, even though it is
a bit slow for hashing.
What do you mean by "the last one"? Using GMP's `mpz_t`directly, which is
`Path_v5` in the attachment? Or the improved "semicompressed" version of
`uint64_t*` (storing chunks of size `2^n` rather then in the smallest
possible chunk size), which is `Path_v4`?
> If the sizeof a block is a power of two it should be quite good, don't
you think ?
Should be. My own summary of the timings above would be this. We have 6
benchmarks, and in the following list I am giving for each benchmark a
small, a medium and a large example. In the end, `+` or `-` indicate
whether the respective implementation did well/poorly in the three
examples.
1. Concatenation
{{{
5x5 50x50 1000x1000
Uncompressed long : 739 ns 885 ns 3.06 us - -
Uncompressed short: 700 ns 808 ns 2.13 us
Uncompressed char : 679 ns 772 ns 1.31 us +
Compressed long : 676 ns 759 ns 1.61 us
Compressed short: 718 ns 722 ns 1.98 us +
Semicompr. long : 684 ns 827 ns 1.42 us
Integer : 645 ns 1.82 us 1.78 us +-
GMP mpz_t: 723 ns 799 ns 1.41 us
}}}
2. Hash
{{{
5 100 1000
Uncompressed long : 135 ns 333 ns 2.09 us +
Uncompressed short: 138 ns 383 ns 2.61 us +--
Uncompressed char : 133 ns 374 ns 2.53 us +
Compressed long : 132 ns 155 ns 466 ns +++
Compressed short: 132 ns 206 ns 951 ns +
Semicompr. long : 148 ns 194 ns 594 ns
Integer : 164 ns 274 ns 1.94 us -
GMP mpz_t: 159 ns 266 ns 1.93 us
}}}
3. == for equal paths
{{{
5 100 1000
Uncompressed long : 1.63 us 23 us 212 us ---
Uncompressed short: 1.08 us 14.9 us 137 us
Uncompressed char : 1.08 us 12.5 us 121 us
Compressed long : 710 ns 4.4 us 38 us
Compressed short: 723 ns 5.3 us 47.5 us
Semicompr. long : 498 ns 712 ns 2.4 us
Integer : 667 ns 658 ns 817 ns
GMP mpz_t: 466 ns 485 ns 693 ns +++
}}}
4. == for "easily" unequal paths
{{{
5 100 1000
Uncompressed long : 634 ns 686 ns 698 ns
Uncompressed short: 613 ns 603 ns 635 ns
Uncompressed char : 581 ns 599 ns 636 ns
Compressed long : 692 ns 729 ns 752 ns --
Compressed short: 577 ns 588 ns 607 ns
Semicompr. long : 499 ns 526 ns 545 ns
Integer : 746 ns 731 ns 733 ns --
GMP mpz_t: 451 ns 468 ns 482 ns +++
}}}
5. == for "difficult" unequal paths
{{{
5 100 1000
Uncompressed long : 1.65 us 22 us 213 us ---
Uncompressed short: 1.15 us 12.8 us 140 us
Uncompressed char : 1.06 us 12.7 us 121 us
Compressed long : 706 ns 4.4 us 37.6 us
Compressed short: 754 ns 5.3 us 47.3 us
Semicompr. long : 506 ns 704 ns 2.5 us
Integer : 710 ns 719 ns 755 ns
GMP mpz_t: 448 ns 457 ns 470 ns +++
}}}
6. startswith (here we only have one example size, one where the answer is
True, and one where the answer is False)
{{{
yes no
Uncompressed long : 325 ns 311 ns
Uncompressed short: 336 ns 325 ns
Uncompressed char : 324 ns 315 ns
Compressed long : 272 ns 257 ns +
Compressed short: 272 ns 256 ns +
Semicompr. long : 263 ns 232 ns ++
Integer : 2.46 us 2.45 us --
GMP mpz_t: 673 ns 657 ns
}}}
So, I don't think there is a clear winner in the competition.
> This being said, perhaps it should be implemented like the bitset
stuff, in a algebraic-free file, to store sequences of integers as you
said.
Yes, it would make sense.
> How do you define your hashing function and your comparison function, by
the way ?
Depends on the implementation. Of course, when I use `Integer` or `mpz_t`,
I am using the hash function provided by GMP. When I implement a `block*`,
where `block` is either `char`, `uint32_t` or `uint64_t`, I am using some
hash function for lists that I found in the internet a while ago (I don't
recall where). It is as follows:
{{{
#!python
def __hash__(self):
cdef block h
cdef block* ptr
h = <block>(self.start^(self.end<<16)^(self._len))
ptr = <block *>self.data
cdef size_t i
h ^= FNV_offset
for i from 0<=i<self._len:
h ^= ptr[i]
h *= FNV_prime
if h==-1:
return -2
return h
}}}
where `FNV_prime` and `FNV_offset` depend on the architecture. I am
surprised that this is faster than GMP's hash function.
--
Ticket URL: <http://trac.sagemath.org/ticket/15820#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.