Call accounting also costs time, especially for very small functions it can be a significant fraction. Also, the Cython profile uses CPython's profiling so I'm pretty sure accounting is only done whenever control gets back to Python. Its unlikely to give you perfect numbers for cdef inline stuff.
You could do a debug build and use the standard gcc gprof, this is more likely to be accurate but has other flaws (debug build, no inlining). crun / gperftools is a shotgun profiler which is a completely different mechanism. Its output has much smaller systematic error (overhead is not dependent on function length etc), but of course has statistical noise and no guarantees that all function invocations are covered. >From http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html: Analyzing Text Output Text mode has lines of output that look like this: 14 2.1% 17.2% 58 8.7% std::_Rb_tree::find Here is how to interpret the columns: Number of profiling samples in this function Percentage of profiling samples in this function Percentage of profiling samples in the functions printed so far Number of profiling samples in this function and its callees Percentage of profiling samples in this function and its callees Function name On Sunday, August 23, 2015 at 1:51:58 PM UTC-4, Simon King wrote: > > Hi! > > I try to speed-up some little functions in > sage.data_structures.bounded_integer_sequences, that will hopefully soon > be > frequently used by Cython code for path algebra elements. But I have > problems to understand the results of profiling. > > I use cProfile.run, putting "# cython: profile=True" at the top of each > .pyx > file that I want to profile. Would %crun provide more detailed > information? > If so, how can one interprete the output of %crun? Column captions > are missing. > > cProfile.run tells me that each of the following functions is called > 286101 times, needing the indicated *cumulative* (hopefully that's the > one that is interesting here) times: > biseq_init 1.434s > bitset_or 0.250s > bitset_lshift 0.463s > bitset_init 0.301s > limb_lower_bits_up 0.297s (for 335005 calls actually) > biseq_init_concat 3.348s > > The times do not seem to match, because the code is > @cython.overflowcheck > cdef inline bint biseq_init(biseq_t R, mp_size_t l, mp_bitcnt_t itemsize) > except -1: > cdef mp_bitcnt_t totalbitsize > if l: > totalbitsize = l * itemsize > else: > totalbitsize = 1 > bitset_init(R.data, totalbitsize) > R.length = l > R.itembitsize = itemsize > R.mask_item = limb_lower_bits_up(itemsize) > > and > > cdef inline bint biseq_init_concat(biseq_t R, biseq_t S1, biseq_t S2) > except -1: > biseq_init(R, S1.length + S2.length, S1.itembitsize) > sig_on() > bitset_lshift(R.data, S2.data, S1.length * S1.itembitsize) > bitset_or(R.data, R.data, S1.data) > sig_off() > > So, the (cumulative) time for biseq_init_concat should be the time for > biseq_init plus the time for bitset_lshift plus the time for bitset_or. > But > 1.434+0.463+0.250-3.348 == -1.201 > > So, how can one explain the missing 1.201s? > > Similarly, the time for biseq_init should only be little more than > the time for bitset_init plus the time for limb_lower_bits_up. > But > 0.301+0.297-1.434 == -0.836 > which means that I have no clue where more than 50% of the time > is coming from. > > How can it be explained? > > Best regards, > Simon > > > -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.
