#19494: Make finite word method nb_subword_occurences_in much faster
-------------------------------------+-------------------------------------
       Reporter:  slabbe             |        Owner:
           Type:  enhancement        |       Status:  needs_work
       Priority:  major              |    Milestone:  sage-6.10
      Component:  combinatorics      |   Resolution:
       Keywords:                     |    Merged in:
        Authors:  Sébastien Labbé    |    Reviewers:  Vincent Delecroix
Report Upstream:  N/A                |  Work issues:
         Branch:                     |       Commit:
  u/slabbe/19494-folded              |  580eb72f9868895e6cb7e773324bcf22995ac15e
   Dependencies:                     |     Stopgaps:
-------------------------------------+-------------------------------------
Changes (by slabbe):

 * status:  needs_review => needs_work
 * commit:  89f0878a41bbe4b7a68c11dd5d00efa6f7ad7950 =>
     580eb72f9868895e6cb7e773324bcf22995ac15e
 * branch:  u/vdelecroix/19494 => u/slabbe/19494-folded


Comment:

 For reviewing purposes, I updated my branch with your code as a fourth
 algorithm.

 I found some cases where the matrices algorithm is faster:

 {{{
 sage: u = words.ThueMorseWord()[:10000]
 sage: v = Word([0,1,0,1])
 sage: %time v.nb_subword_occurrences_in(u, 'vincent')
 CPU times: user 214 ms, sys: 38.9 ms, total: 253 ms
 Wall time: 233 ms
 26041662500000
 sage: %time v.nb_subword_occurrences_in(u, 'matrices')
 CPU times: user 182 ms, sys: 7.84 ms, total: 190 ms
 Wall time: 192 ms
 26041662500000
 }}}

 {{{
 sage: u = words.ThueMorseWord()[:100000]
 sage: %time v.nb_subword_occurrences_in(u, 'vincent')
 CPU times: user 2.08 s, sys: 64.7 ms, total: 2.15 s
 Wall time: 2.13 s
 260416666250000000
 sage: %time v.nb_subword_occurrences_in(u, 'matrices')
 CPU times: user 1.82 s, sys: 86 ms, total: 1.9 s
 Wall time: 1.86 s
 260416666250000000
 }}}

 Why? When is your algorithm faster than the matrix algorithm?

 I also found a case where the two algorithms do no return the same answer:

 {{{
 sage: v = Word([0,1,0,1,1,0])
 sage: %time v.nb_subword_occurrences_in(u, 'matrices')
 CPU times: user 2.23 s, sys: 72.7 ms, total: 2.31 s
 Wall time: 2.27 s
 21700086762157985968744680
 sage: %time v.nb_subword_occurrences_in(u, 'vincent')
 CPU times: user 719 ms, sys: 49 ms, total: 768 ms
 Wall time: 742 ms
 21702690928814235968754680L
 }}}

 You might want to use defaultdict instead which simplifies both loops:

 {{{
 #!python
 # record the position of letters in self
 from collections import defaultdict
 pos = defaultdict(list)
 for i,a in enumerate(self):
     pos[a].append(i)

 # compute the occurrences of all prefixes of self as subwords in other
 occ = [0] * (len(self)+1)
 occ[0] = 1
 for a in other:
     for i in pos[a]:
        occ[i+1] += occ[i]

 # return only the number of occurrences of self
 return occ[-1]
 }}}
 ----
 New commits:
 
||[http://git.sagemath.org/sage.git/commit/?id=93a28c5474f550a223c22e00864909f1c4809ec6
 93a28c5]||{{{Trac #19494: making nb_subword_occurences_in much faster}}}||
 
||[http://git.sagemath.org/sage.git/commit/?id=580eb72f9868895e6cb7e773324bcf22995ac15e
 580eb72]||{{{Trac #19494: adding vincent algorithm}}}||

--
Ticket URL: <http://trac.sagemath.org/ticket/19494#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to