#8547: implement hidden markov models in Cython from scratch (so can remove the
GHMM standard package from Sage)
---------------------------+------------------------------------------------
   Reporter:  was          |       Owner:  amhou       
       Type:  enhancement  |      Status:  needs_review
   Priority:  major        |   Milestone:  sage-4.4    
  Component:  statistics   |    Keywords:              
     Author:               |    Upstream:  N/A         
   Reviewer:               |      Merged:              
Work_issues:               |  
---------------------------+------------------------------------------------

Comment(by was):

 * I've attached a part 2 patch.   I made sure all cdef's methods have
 docstrings and also that doctests are properly sphinx formated (some
 weren't since they were copied from the old code).

 * Jason said above that IntList.sum doesn't have a doctest for the
 overflow case... but it does, so I don't know what he meant.

 * I read the source code for {{{sage.misc.misc_c.normalize_index}}} and
 cannot bring myself to use that in this situation.  That  function
 actually returns a Python *list* of Python ints for every single index
 into the list that is being sliced!  That would easily lead to factor of
 50-100 slowdowns on realistic operations:
 {{{
 sage: timeit('z = sage.misc.misc_c.normalize_index(slice(1,10^5),10^5)')
 # slow because constructions a big python list
 125 loops, best of 3: 2.17 ms per loop
 sage: a = stats.IntList([1..10^5])
 sage: timeit('a[1:10^5]')                       # slice is just a memcpy
 625 loops, best of 3: 48.4 µs per loop
 sage: 2.17/0.0484
 44.8347107438017
 }}}
 Here's an example with a step:
 {{{
 sage: a = stats.IntList([1..10^5])
 sage: timeit('a[1:10^5:2]')
 625 loops, best of 3: 92.2 µs per loop
 sage: timeit('z = sage.misc.misc_c.normalize_index(slice(1,10^5,2),10^5)')
 625 loops, best of 3: 772 µs per loop
 }}}
 and that 772 microseconds is *before* we do the actual iteration through
 the returned list of python ints, convert them to c ints, copy stuff
 around in memory, etc.

 This stats code I'm writing is really meant to be industrial strength --
 the sort of code maybe somebody would use in "realtime" processing of
 large datastreams.    I don't want slow functions anywhere in there.

  -- William

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/8547#comment:13>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to