#15820: Implement sequences of bounded integers
-------------------------------------+-------------------------------------
Reporter: SimonKing | Owner:
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-6.3
Component: algebra | Resolution:
Keywords: sequence bounded | Merged in:
integer | Reviewers:
Authors: Simon King | Work issues:
Report Upstream: N/A | Commit:
Branch: | 5dc78c5f5f647bf95b98da916d7086fae539ffb8
u/SimonKing/ticket/15820 | Stopgaps:
Dependencies: |
-------------------------------------+-------------------------------------
Changes (by {'newvalue': u'Simon King', 'oldvalue': ''}):
* status: new => needs_review
* commit: => 5dc78c5f5f647bf95b98da916d7086fae539ffb8
* author: => Simon King
Comment:
I have pushed a branch with an initial implementation of bounded integer
sequences. I took care of error handling: I think you will find that those
GMP commands that may result in a memory error are wrapped in
sig_on()/sig_off(), and the cdef boilerplate functions do propagate
errors. In fact, in some of my tests, I was hitting Ctrl-C, and it worked.
The rest of this post is about benchmarks. I compare against Python tuples
---which is of course ambitious, and you will find that not in all
situations tuples are slower than bounded integer sequences. However, one
should see it positively: The bounded integer sequences implemented here
have features very similar to tuples, and there ''are'' operations for
which they are a lot faster than tuples. It all depends on the length and
the bound of the sequence. That's why I think that it is a reasonable
contribution. And to make it faster (without Python classes), one can
still use the cdef boilerplate functions. Generally, it seems that long
bounded integer sequences are faster than long tuples, but short tuples
are faster than short sequences. For hash, bounded integer sequences are
pretty good, but they suck in accessing items, or in slicing with step
different from 1.
In contrast to tuples, bounded integer sequences also provide fairly quick
tests for whether a sequences starts with a given sequence, or whether a
sequence contains a certain sub-sequence. This is a feature I took from
strings.
Here are the tests, with different sizes and bounds. Variable names
starting with "T" hold tuples, those starting with "S" hold bounded
integer sequences.
{{{
sage: from sage.structure.bounded_integer_sequences import
BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
}}}
__Iteration__
{{{
sage: timeit("x=[y for y in T0]", number=1000000)
1000000 loops, best of 3: 783 ns per loop
sage: timeit("x=[y for y in T1]", number=1000000)
1000000 loops, best of 3: 1.48 µs per loop
sage: timeit("x=[y for y in T2]", number=100000)
100000 loops, best of 3: 4.3 µs per loop
sage: timeit("x=[y for y in T3]", number=1000)
1000 loops, best of 3: 245 µs per loop
sage: timeit("x=[y for y in S0]", number=1000000)
1000000 loops, best of 3: 2.38 µs per loop
sage: timeit("x=[y for y in S1]", number=100000)
100000 loops, best of 3: 4.1 µs per loop
sage: timeit("x=[y for y in S2]", number=100000)
100000 loops, best of 3: 10.1 µs per loop
sage: timeit("x=[y for y in S3]", number=1000)
1000 loops, best of 3: 1.79 ms per loop
}}}
__Slicing__
Bounded integer sequences are immutable and hence copied by identity. But
let us do slicing, dropping the last item:
{{{
sage: timeit("x=T3[:-1]", number=100000)
100000 loops, best of 3: 19.5 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 3.48 µs per loop
}}}
Slicing with `step!=1` is much slower, though:
{{{
sage: timeit("x=T3[:-1:2]", number=100000)
100000 loops, best of 3: 11.7 µs per loop
sage: timeit("x=S3[:-1:2]", number=1000)
1000 loops, best of 3: 2.23 ms per loop
}}}
Perhaps I should mark it "TODO"...
__Accessing single items__
Short sequences
{{{
sage: timeit("x=T0[1]", number=1000000)
1000000 loops, best of 3: 361 ns per loop
sage: timeit("x=T0[4]", number=1000000)
1000000 loops, best of 3: 373 ns per loop
sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 959 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 960 ns per loop
}}}
Large sequences (time also depends on the bound for the integer sequence!)
{{{
sage: timeit("x=T3[1]", number=1000000)
1000000 loops, best of 3: 359 ns per loop
sage: timeit("x=T3[4500]", number=1000000)
1000000 loops, best of 3: 382 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 1.97 µs per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 1.49 µs per loop
}}}
__Comparison__
Note that comparison of bounded integer sequences works different from
comparison of tuples or lists, as detailed in the documentation.
We compare sequences that are equal but non-identical, that differ in
early items, or that differ in late items.
{{{
sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)] # verified that they
differ from L0,L1,L2,L3
sage: T0x = tuple(L0x); T1x = tuple(L1x); T2x = tuple(L2x); T3x =
tuple(L3x)
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: T0y = tuple(L0); T1y = tuple(L1); T2y = tuple(L2); T3y = tuple(L3)
sage: S1y = BoundedIntegerSequence(8, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
}}}
Early differences:
{{{
sage: timeit("T0==T0x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T1==T1x", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("T2==T2x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T3==T3x", number=1000000)
1000000 loops, best of 3: 161 ns per loop
sage: timeit("S0==S0x", number=1000000)
1000000 loops, best of 3: 538 ns per loop
sage: timeit("S1==S1x", number=1000000)
1000000 loops, best of 3: 550 ns per loop
sage: timeit("S2==S2x", number=1000000)
1000000 loops, best of 3: 490 ns per loop
sage: timeit("S3==S3x", number=1000000)
1000000 loops, best of 3: 559 ns per loop
}}}
Equal sequences:
{{{
sage: timeit("T0==T0y", number=1000000)
1000000 loops, best of 3: 169 ns per loop
sage: timeit("T1==T1y", number=1000000)
1000000 loops, best of 3: 255 ns per loop
sage: timeit("T2==T2y", number=1000000)
1000000 loops, best of 3: 597 ns per loop
sage: timeit("T3==T3y", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: timeit("S0==S0y", number=1000000)
1000000 loops, best of 3: 511 ns per loop
sage: timeit("S1==S1y", number=1000000)
1000000 loops, best of 3: 493 ns per loop
sage: timeit("S2==S2y", number=1000000)
1000000 loops, best of 3: 583 ns per loop
sage: timeit("S3==S3y", number=1000000)
1000000 loops, best of 3: 1.41 µs per loop
}}}
Late differences:
{{{
sage: T0z1 = T0+T0
sage: T0z2 = T0+T0x
sage: T1z1 = T1+T1
sage: T1z2 = T1+T1x
sage: T2z1 = T2+T2
sage: T2z2 = T2+T2x
sage: T3z1 = T3+T3
sage: T3z2 = T3+T3x
sage: timeit("T0z1==T0z2", number=100000)
100000 loops, best of 3: 206 ns per loop
sage: timeit("T1z1==T1z2", number=100000)
100000 loops, best of 3: 308 ns per loop
sage: timeit("T2z1==T2z2", number=100000)
100000 loops, best of 3: 640 ns per loop
sage: timeit("T3z1==T3z2", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z1==S0z2", number=100000)
100000 loops, best of 3: 585 ns per loop
sage: timeit("S1z1==S1z2", number=100000)
100000 loops, best of 3: 494 ns per loop
sage: timeit("S2z1==S2z2", number=100000)
100000 loops, best of 3: 555 ns per loop
sage: timeit("S3z1==S3z2", number=100000)
100000 loops, best of 3: 578 ns per loop
}}}
__Hash__
This is the only operation for which bounded integer sequences seem to be
consistently faster than tuples:
{{{
sage: timeit("hash(T0)", number=100000)
100000 loops, best of 3: 179 ns per loop
sage: timeit("hash(T0)", number=1000000)
1000000 loops, best of 3: 177 ns per loop
sage: timeit("hash(T1)", number=1000000)
1000000 loops, best of 3: 267 ns per loop
sage: timeit("hash(T2)", number=1000000)
1000000 loops, best of 3: 584 ns per loop
sage: timeit("hash(T3)", number=100000)
100000 loops, best of 3: 45.3 µs per loop
sage: timeit("hash(S0)", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("hash(S1)", number=1000000)
1000000 loops, best of 3: 153 ns per loop
sage: timeit("hash(S2)", number=1000000)
1000000 loops, best of 3: 194 ns per loop
sage: timeit("hash(S3)", number=1000000)
1000000 loops, best of 3: 8.97 µs per loop
}}}
__Concatenation__
{{{
sage: timeit("T0+T0", number=1000000)
1000000 loops, best of 3: 200 ns per loop
sage: timeit("T1+T1", number=1000000)
1000000 loops, best of 3: 311 ns per loop
sage: timeit("T2+T2", number=1000000)
1000000 loops, best of 3: 742 ns per loop
sage: timeit("T3+T3", number=10000)
10000 loops, best of 3: 40.3 µs per loop
sage: timeit("S0+S0", number=1000000)
1000000 loops, best of 3: 576 ns per loop
sage: timeit("S1+S1", number=1000000)
1000000 loops, best of 3: 590 ns per loop
sage: timeit("S2+S2", number=1000000)
1000000 loops, best of 3: 618 ns per loop
sage: timeit("S3+S3", number=1000000)
1000000 loops, best of 3: 2.17 µs per loop
}}}
__Subsequences__
Recall the definition of `S0z1` etc. We find:
{{{
sage: S0z2.startswith(S0)
True
sage: S0z2.startswith(S0x)
False
sage: S0x in S0z2
True
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 239 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 241 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 694 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 227 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 1.08 µs per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 247 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 230 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 3.21 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 989 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 218 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 3.57 ms per loop
}}}
I wonder if the latter could be improved. Another "TODO"...
----
New commits:
||[http://git.sagemath.org/sage.git/commit/?id=5dc78c5f5f647bf95b98da916d7086fae539ffb8
5dc78c5]||{{{Implement sequences of bounded integers}}}||
--
Ticket URL: <http://trac.sagemath.org/ticket/15820#comment:50>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.