#19494: Make finite word method nb_subword_occurences_in much faster
-------------------------+-------------------------------------------------
       Reporter:         |        Owner:
  slabbe                 |       Status:  needs_info
           Type:         |    Milestone:  sage-6.10
  enhancement            |   Resolution:
       Priority:  major  |    Merged in:
      Component:         |    Reviewers:  Vincent Delecroix
  combinatorics          |  Work issues:
       Keywords:         |       Commit:
        Authors:         |  664fd8e0650d7e9c95fab0d484ae5f800641b2a6
  Sébastien Labbé        |     Stopgaps:
Report Upstream:  N/A    |
         Branch:         |
  u/slabbe/19494         |
   Dependencies:         |
-------------------------+-------------------------------------------------

Comment (by slabbe):

 That binomial was introduced by Jacques Sakarovitch as the theme of a
 chapter in the (first?) Lothaire book. It generalizes the binomial since
 `a^n.binomial(a^k)` is equal to `n.binomial(k)`. Ok, I agree that too many
 methods is too many. I removed it in the last commit.

 I added a new commit allowing easier reviewing and comparison of
 implementations. I still believe we will need one more commit to clean
 this up if finally we think that one implementation is useless or if one
 argument is useless. We will see.

 About using sparse matrices. A first remark is that matrices computed are
 upper triangular. So using sparse will lead eventually to saving half the
 space at least.

 If the size of the alphabet is low, using sparse matrices is about 10
 times slower:

 {{{
 sage: u = words.ThueMorseWord()[:1000]
 sage: L = list(u)
 sage:
 sage: %time u[:20].nb_subword_occurrences_in(u, sparse=False)
 CPU times: user 130 ms, sys: 1.37 ms, total: 131 ms
 Wall time: 131 ms
 347151814258896164694010210266438495
 sage: %time u[:20].nb_subword_occurrences_in(u, sparse=True)
 CPU times: user 979 ms, sys: 5.88 ms, total: 984 ms
 Wall time: 986 ms
 347151814258896164694010210266438495
 }}}

 If the size of alphabet is bigger (computation done with 201x201
 matrices): then sparse or not seems about the same too:
 {{{
 sage: u = Word(range(200))
 sage: %time u.nb_subword_occurrences_in(u*u*u, sparse=True)
 CPU times: user 16.1 s, sys: 27.1 ms, total: 16.1 s
 Wall time: 16.2 s
 20301
 sage: %time u.nb_subword_occurrences_in(u*u*u, sparse=False)
 CPU times: user 16.6 s, sys: 169 ms, total: 16.7 s
 Wall time: 16.7 s
 20301
 }}}

 In comparison, here is the longest word (following the previous example
 construction) the original algorithm can handle in less then 16s:
 {{{
 sage: u = Word(range(60))
 sage: %time u.nb_subword_occurrences_in(u*u*u, algorithm='original')
 CPU times: user 12.1 s, sys: 13 ms, total: 12.1 s
 Wall time: 12.1 s
 1891
 }}}
 I never found an example where the original implementation is better. So,
 that's why I would suggest to just delete it. Or we like the idea of
 having two distinct implementations.

--
Ticket URL: <http://trac.sagemath.org/ticket/19494#comment:5>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to