[sage-trac] Re: [Sage] #8187: improve equality tests for words

Sage Thu, 04 Feb 2010 19:35:22 -0800

#8187: improve equality tests for words
-----------------------------+----------------------------------------------
   Reporter:  slabbe         |       Owner:  sage-combinat
       Type:  enhancement    |      Status:  new          
   Priority:  major          |   Milestone:  sage-4.3.2   
  Component:  combinatorics  |    Keywords:               
     Author:                 |    Upstream:  N/A          
   Reviewer:                 |      Merged:               
Work_issues:                 |  
-----------------------------+----------------------------------------------
Description changed by slabbe:


Old description:

> More often, when we compare two words, we test their equality and not
> that one is less than the other. So why not implement the equatlity
> test!? This ticket does this for datatypes and for math objects. It also
> improve the comparison function (by removing it!).
>
> 1. This patch adds equality test for `WordDatatype_str`,
> `WordDatatype_list` and `WordDatatype_tuple`  (via `__richcmp__`) usefull
> when comparing two words represented by the same kind of data. It is now
> as fast as the python object :
>
> {{{
> BEFORE:
>
>     sage: w = Word(range(10000))
>     sage: z = Word(range(10000))
>     sage: %timeit w == z
>     125 loops, best of 3: 4.08 ms per loop
>
> AFTER:
>
>     sage: w = Word(range(10000))
>     sage: z = Word(range(10000))
>     sage: %timeit w == z
>     625 loops, best of 3: 422 µs per loop
>
> PYTHON OBJECT:
>
>     sage: w = range(10000)
>     sage: z = range(10000)
>     sage: %timeit w == z
>     625 loops, best of 3: 442 µs per loop
> }}}
>
> {{{
> BEFORE:
>
>     sage: w = Word(tuple(range(10000)))
>     sage: z = Word(tuple(range(10000)))
>     sage: %timeit w == z
>     125 loops, best of 3: 3.97 ms per loop
>
> AFTER:
>
>     sage: w = Word(tuple(range(10000)))
>     sage: z = Word(tuple(range(10000)))
>     sage: %timeit w == z
>     625 loops, best of 3: 419 µs per loop
>
> PYTHON OBJECT:
>
>     sage: w = tuple(range(10000))
>     sage: z = tuple(range(10000))
>     sage: %timeit w == z
>     625 loops, best of 3: 420 µs per loop
> }}}
>
> {{{
> BEFORE:
>
>     sage: w = Word('a'*10000)
>     sage: z = Word('a'*10000)
>     sage: %timeit w == z
>     125 loops, best of 3: 3.9 ms per loop
>
> AFTER:
>
>     sage: w = Word('a'*10000)
>     sage: z = Word('a'*10000)
>     sage: %timeit w == z
>     625 loops, best of 3: 2.36 µs per loop
>
> PYTHON OBJECT:
>
>     sage: w = 'a'*10000
>     sage: z = 'a'*10000
>     sage: %timeit w == z
>     625 loops, best of 3: 2.03 µs per loop
>
> }}}
>
> 2. Remove the `__cmp__` from `FiniteWord_class` since the same function
> in `Word_class` does the job anyway and in a cleaner way : it doesn't use
> the (useless?) coerce function. Surprinsingly, removing it makes it
> faster :
>
> {{{
> BEFORE:
>
>     sage: w = Word([0]*10000)
>     sage: z = Word([0]*10000, alphabet=[0,1])
>     sage: type(w)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: type(z)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: %timeit w.__cmp__(w)
>     125 loops, best of 3: 3.79 ms per loop
>     sage: %timeit w.__cmp__(z)
>     25 loops, best of 3: 13.3 ms per loop
>     sage: %timeit z.__cmp__(w)
>     5 loops, best of 3: 50.1 ms per loop
>     sage: %timeit z.__cmp__(z)
>     25 loops, best of 3: 35.7 ms per loop
>

> AFTER:
>
>     sage: w = Word([0]*10000)
>     sage: z = Word([0]*10000, alphabet=[0,1])
>     sage: type(w)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: type(z)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: %timeit w.__cmp__(w)
>     125 loops, best of 3: 3.89 ms per loop
>     sage: %timeit w.__cmp__(z)
>     125 loops, best of 3: 5.4 ms per loop
>     sage: %timeit z.__cmp__(w)
>     25 loops, best of 3: 35.9 ms per loop
>     sage: %timeit z.__cmp__(z)
>     25 loops, best of 3: 35.7 ms per loop
> }}}
>
> NOTE : The difference between w and z above is that the parent of w is
> the alphabet of all python objects which uses the cmp of python to
> compare the letters whereas z compares its letters relatively to the
> order of the letters defined by the parent (here 0 < 1 but one could also
> say 1 < 0) which is slower.
>

> 3. Add the `__eq__` and `__ne__` for `Word_class` because it is faster
> than using `__cmp__` (especially when parent alphabet are defined). These
> functions are used to test the equality of two words represented by
> different python objects (datatypes).
>
> no parents :
>
> {{{
> BEFORE:
>
>     sage: L = range(10000)
>     sage: t = tuple(L)
>     sage: w = Word(L)
>     sage: z = Word(t)
>     sage: type(w)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: type(z)
>     <class 'sage.combinat.words.word.FiniteWord_tuple'>
>     sage: %timeit w == z
>     125 loops, best of 3: 3.69 ms per loop
>

> AFTER:
>
>     sage: L = range(10000)
>     sage: t = tuple(L)
>     sage: w = Word(L)
>     sage: z = Word(t)
>     sage: type(w)
>     <class 'sage.combinat.words.word.FiniteWord_list'>
>     sage: type(z)
>     <class 'sage.combinat.words.word.FiniteWord_tuple'>
>     sage: %timeit w == z
>     625 loops, best of 3: 1.44 ms per loop
> }}}
>
> with parents (!!):
>
> {{{
> BEFORE:
>
>     sage: W = Words([0,1,2])
>     sage: w = W([0, 1, 1, 2]*4000)
>     sage: z = W([0, 1, 1, 2]*4000)
>     sage: %timeit w == z
>     5 loops, best of 3: 63 ms per loop
>
> AFTER:
>
>     sage: W = Words([0,1,2])
>     sage: w = W([0, 1, 1, 2]*4000)
>     sage: z = W([0, 1, 1, 2]*4000)
>     sage: %timeit w == z
>     125 loops, best of 3: 2.57 ms per loop
> }}}

New description:

 More often, when we compare two words, we test their equality and not that
 one is less than the other. So why not implement the equatlity test!? This
 ticket does this for datatypes and for math objects. It also improve the
 comparison function (by removing it!).

 1. This patch adds equality test for `WordDatatype_str`,
 `WordDatatype_list` and `WordDatatype_tuple`  (via `__richcmp__`) usefull
 when comparing two words represented by the same kind of data. It is now
 as fast as the python object :

 {{{
 BEFORE:

     sage: w = Word(range(10000))
     sage: z = Word(range(10000))
     sage: %timeit w == z
     125 loops, best of 3: 4.08 ms per loop

 AFTER:

     sage: w = Word(range(10000))
     sage: z = Word(range(10000))
     sage: %timeit w == z
     625 loops, best of 3: 422 µs per loop

 PYTHON OBJECT:

     sage: w = range(10000)
     sage: z = range(10000)
     sage: %timeit w == z
     625 loops, best of 3: 442 µs per loop
 }}}

 {{{
 BEFORE:

     sage: w = Word(tuple(range(10000)))
     sage: z = Word(tuple(range(10000)))
     sage: %timeit w == z
     125 loops, best of 3: 3.97 ms per loop

 AFTER:

     sage: w = Word(tuple(range(10000)))
     sage: z = Word(tuple(range(10000)))
     sage: %timeit w == z
     625 loops, best of 3: 419 µs per loop

 PYTHON OBJECT:

     sage: w = tuple(range(10000))
     sage: z = tuple(range(10000))
     sage: %timeit w == z
     625 loops, best of 3: 420 µs per loop
 }}}

 {{{
 BEFORE:

     sage: w = Word('a'*10000)
     sage: z = Word('a'*10000)
     sage: %timeit w == z
     125 loops, best of 3: 3.9 ms per loop

 AFTER:

     sage: w = Word('a'*10000)
     sage: z = Word('a'*10000)
     sage: %timeit w == z
     625 loops, best of 3: 2.36 µs per loop

 PYTHON OBJECT:

     sage: w = 'a'*10000
     sage: z = 'a'*10000
     sage: %timeit w == z
     625 loops, best of 3: 2.03 µs per loop

 }}}

 2. Remove the `__cmp__` from `FiniteWord_class` since the same function in
 `Word_class` does the job anyway and in a cleaner way : it doesn't use the
 (useless?) coerce function. Surprinsingly, removing it makes it faster :

 {{{
 BEFORE:

     sage: w = Word([0]*10000)
     sage: z = Word([0]*10000, alphabet=[0,1])
     sage: type(w)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: type(z)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: %timeit w.__cmp__(w)
     125 loops, best of 3: 3.79 ms per loop
     sage: %timeit w.__cmp__(z)
     25 loops, best of 3: 13.3 ms per loop
     sage: %timeit z.__cmp__(w)
     5 loops, best of 3: 50.1 ms per loop
     sage: %timeit z.__cmp__(z)
     25 loops, best of 3: 35.7 ms per loop


 AFTER:

     sage: w = Word([0]*10000)
     sage: z = Word([0]*10000, alphabet=[0,1])
     sage: type(w)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: type(z)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: %timeit w.__cmp__(w)
     125 loops, best of 3: 3.89 ms per loop
     sage: %timeit w.__cmp__(z)
     125 loops, best of 3: 5.4 ms per loop
     sage: %timeit z.__cmp__(w)
     25 loops, best of 3: 35.9 ms per loop
     sage: %timeit z.__cmp__(z)
     25 loops, best of 3: 35.7 ms per loop
 }}}

 NOTE : The difference between w and z above is that the parent of w is the
 alphabet of all python objects which uses the cmp of python to compare the
 letters whereas z compares its letters relatively to the order of the
 letters defined by its parent (here 0 < 1 but one could also say 1 < 0)
 which is slower.


 3. Add the `__eq__` and `__ne__` for `Word_class` because it is faster
 than using `__cmp__` (especially when parent alphabet are defined). These
 functions are used to test the equality of two words represented by
 different python objects (datatypes).

 no parents :

 {{{
 BEFORE:

     sage: L = range(10000)
     sage: t = tuple(L)
     sage: w = Word(L)
     sage: z = Word(t)
     sage: type(w)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: type(z)
     <class 'sage.combinat.words.word.FiniteWord_tuple'>
     sage: %timeit w == z
     125 loops, best of 3: 3.69 ms per loop


 AFTER:

     sage: L = range(10000)
     sage: t = tuple(L)
     sage: w = Word(L)
     sage: z = Word(t)
     sage: type(w)
     <class 'sage.combinat.words.word.FiniteWord_list'>
     sage: type(z)
     <class 'sage.combinat.words.word.FiniteWord_tuple'>
     sage: %timeit w == z
     625 loops, best of 3: 1.44 ms per loop
 }}}

 with parents (!!):

 {{{
 BEFORE:

     sage: W = Words([0,1,2])
     sage: w = W([0, 1, 1, 2]*4000)
     sage: z = W([0, 1, 1, 2]*4000)
     sage: %timeit w == z
     5 loops, best of 3: 63 ms per loop

 AFTER:

     sage: W = Words([0,1,2])
     sage: w = W([0, 1, 1, 2]*4000)
     sage: z = W([0, 1, 1, 2]*4000)
     sage: %timeit w == z
     125 loops, best of 3: 2.57 ms per loop
 }}}

--

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/8187#comment:2>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

[sage-trac] Re: [Sage] #8187: improve equality tests for words

Reply via email to