Le 23/11/2014 18:07, Volker Braun a écrit :
Did you profile your code on the C-level? e.g. using gprof? As a rule of
thumb, guesses about where the bottleneck is are wrong :-)  Its entirely
conceivable that branch prediction and speculative execution solve this
already for you.


Is gprof enough powerful with modern architectures on such programs? from my point of view, no. There are non free, commercial, tools like vtune which can do fantastic measurement job. Vtune shows, for example, that a call to std::copy is not as fast as a for loop, which is turned by the compiler in a memcopy (probably std::copy is not!). I do not think we can see this with gprof.
But ok, you are not supposed to buy vtune...

What about likwid https://code.google.com/p/likwid ? It is free. Did somebody used it to measure cython code performances?

Likwid (and Vtune) have in common to use performance counters on Intel and AMD processors (not sure for AMD with Vtune...).

What is the size of what you are sorting ? If it is small enough to fit in the caches, and better in the L1 cache, you can possibly improve something with your modification, but otherwise it is certainly memory bounded and you cannot do much... You have to measure the bandwidth of your program. Vtune does this, possibly likwid too.

t.d.
C++ std::sort will be able to inline the comparator.

Link-time optimization (e.g. gcc -flto) can in principle also inline on
the level of object code, after the compilation did not inline (because
of different compilation units, say).Though the libc sort is in a shared
library.


On Sunday, November 23, 2014 4:33:46 PM UTC, Nathann Cohen wrote:

    Hello everybody,

    I wrote a bruteforce Cython code recently (#17309) which spends most
    of its time on calls to qsort.

    This is normal, sorting is sort of the most expensive thing I do,
    but to call qsort you need to provide a comparison function. Now, as
    qsort is compiled in a library, the comparison function cannot be
    inlined inside of qsort and I suspect that it has a nontrivial cost
    (given how simple the comparison is).

    Thus, if I copy/paste the original code of qsort into my Cython file
    the code should be faster, only that is ugly.

    Sooooooo... Do you know if there ais  way to re-compile my code
    along with the code of qsort without having to copy/paste it ?

    Thanks,

    Nathann

--
You received this message because you are subscribed to the Google
Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sage-devel+unsubscr...@googlegroups.com
<mailto:sage-devel+unsubscr...@googlegroups.com>.
To post to this group, send email to sage-devel@googlegroups.com
<mailto:sage-devel@googlegroups.com>.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

<<attachment: tdumont.vcf>>

Reply via email to