#19494: Make finite word method nb_subword_occurences_in much faster
-------------------------------------+-------------------------------------
Reporter: slabbe | Owner:
Type: enhancement | Status: needs_work
Priority: major | Milestone: sage-6.10
Component: combinatorics | Resolution:
Keywords: | Merged in:
Authors: Sébastien Labbé | Reviewers: Vincent Delecroix
Report Upstream: N/A | Work issues:
Branch: | Commit:
u/slabbe/19494-folded | 580eb72f9868895e6cb7e773324bcf22995ac15e
Dependencies: | Stopgaps:
-------------------------------------+-------------------------------------
Changes (by slabbe):
* status: needs_review => needs_work
* commit: 89f0878a41bbe4b7a68c11dd5d00efa6f7ad7950 =>
580eb72f9868895e6cb7e773324bcf22995ac15e
* branch: u/vdelecroix/19494 => u/slabbe/19494-folded
Comment:
For reviewing purposes, I updated my branch with your code as a fourth
algorithm.
I found some cases where the matrices algorithm is faster:
{{{
sage: u = words.ThueMorseWord()[:10000]
sage: v = Word([0,1,0,1])
sage: %time v.nb_subword_occurrences_in(u, 'vincent')
CPU times: user 214 ms, sys: 38.9 ms, total: 253 ms
Wall time: 233 ms
26041662500000
sage: %time v.nb_subword_occurrences_in(u, 'matrices')
CPU times: user 182 ms, sys: 7.84 ms, total: 190 ms
Wall time: 192 ms
26041662500000
}}}
{{{
sage: u = words.ThueMorseWord()[:100000]
sage: %time v.nb_subword_occurrences_in(u, 'vincent')
CPU times: user 2.08 s, sys: 64.7 ms, total: 2.15 s
Wall time: 2.13 s
260416666250000000
sage: %time v.nb_subword_occurrences_in(u, 'matrices')
CPU times: user 1.82 s, sys: 86 ms, total: 1.9 s
Wall time: 1.86 s
260416666250000000
}}}
Why? When is your algorithm faster than the matrix algorithm?
I also found a case where the two algorithms do no return the same answer:
{{{
sage: v = Word([0,1,0,1,1,0])
sage: %time v.nb_subword_occurrences_in(u, 'matrices')
CPU times: user 2.23 s, sys: 72.7 ms, total: 2.31 s
Wall time: 2.27 s
21700086762157985968744680
sage: %time v.nb_subword_occurrences_in(u, 'vincent')
CPU times: user 719 ms, sys: 49 ms, total: 768 ms
Wall time: 742 ms
21702690928814235968754680L
}}}
You might want to use defaultdict instead which simplifies both loops:
{{{
#!python
# record the position of letters in self
from collections import defaultdict
pos = defaultdict(list)
for i,a in enumerate(self):
pos[a].append(i)
# compute the occurrences of all prefixes of self as subwords in other
occ = [0] * (len(self)+1)
occ[0] = 1
for a in other:
for i in pos[a]:
occ[i+1] += occ[i]
# return only the number of occurrences of self
return occ[-1]
}}}
----
New commits:
||[http://git.sagemath.org/sage.git/commit/?id=93a28c5474f550a223c22e00864909f1c4809ec6
93a28c5]||{{{Trac #19494: making nb_subword_occurences_in much faster}}}||
||[http://git.sagemath.org/sage.git/commit/?id=580eb72f9868895e6cb7e773324bcf22995ac15e
580eb72]||{{{Trac #19494: adding vincent algorithm}}}||
--
Ticket URL: <http://trac.sagemath.org/ticket/19494#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.