#8187: improve equality tests for words
-----------------------------+----------------------------------------------
Reporter: slabbe | Owner: sage-combinat
Type: enhancement | Status: new
Priority: major | Milestone: sage-4.3.2
Component: combinatorics | Keywords:
Author: | Upstream: N/A
Reviewer: | Merged:
Work_issues: |
-----------------------------+----------------------------------------------
Description changed by slabbe:
Old description:
> More often, when we compare two words, we test their equality and not
> that one is less than the other. So why not implement the equatlity
> test!? This ticket does this for datatypes and for math objects. It also
> improve the comparison function (by removing it!).
>
> 1. This patch adds equality test for `WordDatatype_str`,
> `WordDatatype_list` and `WordDatatype_tuple` (via `__richcmp__`) usefull
> when comparing two words represented by the same kind of data. It is now
> as fast as the python object :
>
> {{{
> BEFORE:
>
> sage: w = Word(range(10000))
> sage: z = Word(range(10000))
> sage: %timeit w == z
> 125 loops, best of 3: 4.08 ms per loop
>
> AFTER:
>
> sage: w = Word(range(10000))
> sage: z = Word(range(10000))
> sage: %timeit w == z
> 625 loops, best of 3: 422 µs per loop
>
> PYTHON OBJECT:
>
> sage: w = range(10000)
> sage: z = range(10000)
> sage: %timeit w == z
> 625 loops, best of 3: 442 µs per loop
> }}}
>
> {{{
> BEFORE:
>
> sage: w = Word(tuple(range(10000)))
> sage: z = Word(tuple(range(10000)))
> sage: %timeit w == z
> 125 loops, best of 3: 3.97 ms per loop
>
> AFTER:
>
> sage: w = Word(tuple(range(10000)))
> sage: z = Word(tuple(range(10000)))
> sage: %timeit w == z
> 625 loops, best of 3: 419 µs per loop
>
> PYTHON OBJECT:
>
> sage: w = tuple(range(10000))
> sage: z = tuple(range(10000))
> sage: %timeit w == z
> 625 loops, best of 3: 420 µs per loop
> }}}
>
> {{{
> BEFORE:
>
> sage: w = Word('a'*10000)
> sage: z = Word('a'*10000)
> sage: %timeit w == z
> 125 loops, best of 3: 3.9 ms per loop
>
> AFTER:
>
> sage: w = Word('a'*10000)
> sage: z = Word('a'*10000)
> sage: %timeit w == z
> 625 loops, best of 3: 2.36 µs per loop
>
> PYTHON OBJECT:
>
> sage: w = 'a'*10000
> sage: z = 'a'*10000
> sage: %timeit w == z
> 625 loops, best of 3: 2.03 µs per loop
>
> }}}
>
> 2. Remove the `__cmp__` from `FiniteWord_class` since the same function
> in `Word_class` does the job anyway and in a cleaner way : it doesn't use
> the (useless?) coerce function. Surprinsingly, removing it makes it
> faster :
>
> {{{
> BEFORE:
>
> sage: w = Word([0]*10000)
> sage: z = Word([0]*10000, alphabet=[0,1])
> sage: type(w)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: type(z)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: %timeit w.__cmp__(w)
> 125 loops, best of 3: 3.79 ms per loop
> sage: %timeit w.__cmp__(z)
> 25 loops, best of 3: 13.3 ms per loop
> sage: %timeit z.__cmp__(w)
> 5 loops, best of 3: 50.1 ms per loop
> sage: %timeit z.__cmp__(z)
> 25 loops, best of 3: 35.7 ms per loop
>
> AFTER:
>
> sage: w = Word([0]*10000)
> sage: z = Word([0]*10000, alphabet=[0,1])
> sage: type(w)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: type(z)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: %timeit w.__cmp__(w)
> 125 loops, best of 3: 3.89 ms per loop
> sage: %timeit w.__cmp__(z)
> 125 loops, best of 3: 5.4 ms per loop
> sage: %timeit z.__cmp__(w)
> 25 loops, best of 3: 35.9 ms per loop
> sage: %timeit z.__cmp__(z)
> 25 loops, best of 3: 35.7 ms per loop
> }}}
>
> NOTE : The difference between w and z above is that the parent of w is
> the alphabet of all python objects which uses the cmp of python to
> compare the letters whereas z compares its letters relatively to the
> order of the letters defined by the parent (here 0 < 1 but one could also
> say 1 < 0) which is slower.
>
> 3. Add the `__eq__` and `__ne__` for `Word_class` because it is faster
> than using `__cmp__` (especially when parent alphabet are defined). These
> functions are used to test the equality of two words represented by
> different python objects (datatypes).
>
> no parents :
>
> {{{
> BEFORE:
>
> sage: L = range(10000)
> sage: t = tuple(L)
> sage: w = Word(L)
> sage: z = Word(t)
> sage: type(w)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: type(z)
> <class 'sage.combinat.words.word.FiniteWord_tuple'>
> sage: %timeit w == z
> 125 loops, best of 3: 3.69 ms per loop
>
> AFTER:
>
> sage: L = range(10000)
> sage: t = tuple(L)
> sage: w = Word(L)
> sage: z = Word(t)
> sage: type(w)
> <class 'sage.combinat.words.word.FiniteWord_list'>
> sage: type(z)
> <class 'sage.combinat.words.word.FiniteWord_tuple'>
> sage: %timeit w == z
> 625 loops, best of 3: 1.44 ms per loop
> }}}
>
> with parents (!!):
>
> {{{
> BEFORE:
>
> sage: W = Words([0,1,2])
> sage: w = W([0, 1, 1, 2]*4000)
> sage: z = W([0, 1, 1, 2]*4000)
> sage: %timeit w == z
> 5 loops, best of 3: 63 ms per loop
>
> AFTER:
>
> sage: W = Words([0,1,2])
> sage: w = W([0, 1, 1, 2]*4000)
> sage: z = W([0, 1, 1, 2]*4000)
> sage: %timeit w == z
> 125 loops, best of 3: 2.57 ms per loop
> }}}
New description:
More often, when we compare two words, we test their equality and not that
one is less than the other. So why not implement the equatlity test!? This
ticket does this for datatypes and for math objects. It also improve the
comparison function (by removing it!).
1. This patch adds equality test for `WordDatatype_str`,
`WordDatatype_list` and `WordDatatype_tuple` (via `__richcmp__`) usefull
when comparing two words represented by the same kind of data. It is now
as fast as the python object :
{{{
BEFORE:
sage: w = Word(range(10000))
sage: z = Word(range(10000))
sage: %timeit w == z
125 loops, best of 3: 4.08 ms per loop
AFTER:
sage: w = Word(range(10000))
sage: z = Word(range(10000))
sage: %timeit w == z
625 loops, best of 3: 422 µs per loop
PYTHON OBJECT:
sage: w = range(10000)
sage: z = range(10000)
sage: %timeit w == z
625 loops, best of 3: 442 µs per loop
}}}
{{{
BEFORE:
sage: w = Word(tuple(range(10000)))
sage: z = Word(tuple(range(10000)))
sage: %timeit w == z
125 loops, best of 3: 3.97 ms per loop
AFTER:
sage: w = Word(tuple(range(10000)))
sage: z = Word(tuple(range(10000)))
sage: %timeit w == z
625 loops, best of 3: 419 µs per loop
PYTHON OBJECT:
sage: w = tuple(range(10000))
sage: z = tuple(range(10000))
sage: %timeit w == z
625 loops, best of 3: 420 µs per loop
}}}
{{{
BEFORE:
sage: w = Word('a'*10000)
sage: z = Word('a'*10000)
sage: %timeit w == z
125 loops, best of 3: 3.9 ms per loop
AFTER:
sage: w = Word('a'*10000)
sage: z = Word('a'*10000)
sage: %timeit w == z
625 loops, best of 3: 2.36 µs per loop
PYTHON OBJECT:
sage: w = 'a'*10000
sage: z = 'a'*10000
sage: %timeit w == z
625 loops, best of 3: 2.03 µs per loop
}}}
2. Remove the `__cmp__` from `FiniteWord_class` since the same function in
`Word_class` does the job anyway and in a cleaner way : it doesn't use the
(useless?) coerce function. Surprinsingly, removing it makes it faster :
{{{
BEFORE:
sage: w = Word([0]*10000)
sage: z = Word([0]*10000, alphabet=[0,1])
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: type(z)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: %timeit w.__cmp__(w)
125 loops, best of 3: 3.79 ms per loop
sage: %timeit w.__cmp__(z)
25 loops, best of 3: 13.3 ms per loop
sage: %timeit z.__cmp__(w)
5 loops, best of 3: 50.1 ms per loop
sage: %timeit z.__cmp__(z)
25 loops, best of 3: 35.7 ms per loop
AFTER:
sage: w = Word([0]*10000)
sage: z = Word([0]*10000, alphabet=[0,1])
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: type(z)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: %timeit w.__cmp__(w)
125 loops, best of 3: 3.89 ms per loop
sage: %timeit w.__cmp__(z)
125 loops, best of 3: 5.4 ms per loop
sage: %timeit z.__cmp__(w)
25 loops, best of 3: 35.9 ms per loop
sage: %timeit z.__cmp__(z)
25 loops, best of 3: 35.7 ms per loop
}}}
NOTE : The difference between w and z above is that the parent of w is the
alphabet of all python objects which uses the cmp of python to compare the
letters whereas z compares its letters relatively to the order of the
letters defined by its parent (here 0 < 1 but one could also say 1 < 0)
which is slower.
3. Add the `__eq__` and `__ne__` for `Word_class` because it is faster
than using `__cmp__` (especially when parent alphabet are defined). These
functions are used to test the equality of two words represented by
different python objects (datatypes).
no parents :
{{{
BEFORE:
sage: L = range(10000)
sage: t = tuple(L)
sage: w = Word(L)
sage: z = Word(t)
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: type(z)
<class 'sage.combinat.words.word.FiniteWord_tuple'>
sage: %timeit w == z
125 loops, best of 3: 3.69 ms per loop
AFTER:
sage: L = range(10000)
sage: t = tuple(L)
sage: w = Word(L)
sage: z = Word(t)
sage: type(w)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: type(z)
<class 'sage.combinat.words.word.FiniteWord_tuple'>
sage: %timeit w == z
625 loops, best of 3: 1.44 ms per loop
}}}
with parents (!!):
{{{
BEFORE:
sage: W = Words([0,1,2])
sage: w = W([0, 1, 1, 2]*4000)
sage: z = W([0, 1, 1, 2]*4000)
sage: %timeit w == z
5 loops, best of 3: 63 ms per loop
AFTER:
sage: W = Words([0,1,2])
sage: w = W([0, 1, 1, 2]*4000)
sage: z = W([0, 1, 1, 2]*4000)
sage: %timeit w == z
125 loops, best of 3: 2.57 ms per loop
}}}
--
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/8187#comment:2>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.