Re: [matplotlib-devel] performance (speed) of logarithmic plots
On Thu, Mar 18, 2010 at 6:21 PM, Andrew Hawryluk hawr...@novachem.comwrote: I've observed a significant difference in the time required by different plotting functions. With a plot of 5000 random data points (all positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG on my machine.) I used cProfile (script attached) and found several significant differences between the profiles of each plotting command. On my first analysis, it appears that most of the difference is due to increased use of mathtext in semilogx: == Plotting command == cumtime (s) plotsemilogx semilogy loglog == total running time 0.618 2.192 0.953 1.362 axis.py:181(draw) 0.118 1.500 0.412 0.569 text.py:504(draw) 0.056 1.353 0.290 0.287 mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 mathtext.py:2772(parse) --- 1.294 0.143 0.254 pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 pyparsing.py:3129(oneOf)--- 0.991 --- --- pyparsing.py:3147(lambda) --- 0.358 --- --- lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 = It seems that semilogx could be made as fast as semilogy since they have to do the same amount of work, but I'm not sure where the differences lie. Can anyone suggest where I should look first? Much thanks, Andrew Hawryluk matplotlib.__version__ = '0.99.1' Windows XP Professional Version 2002, Service Pack 3 Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM Hello, How did you get the cumtime listing? The output of the run doesn't produce a cumulative sum table as you showed here. Platform : Linux-2.6.31.9-174.fc12.i686.PAE-i686-with-fedora-12-Constantine Python : ('CPython', 'tags/r262', '71600') NumPy: 1.5.0.dev8038 Matplotlib : 1.0.svn -- Gökhan -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
Re: [matplotlib-devel] performance (speed) of logarithmic plots
This is indeed a very interesting result and I am able to reproduce similar ratios for total running time. However, I think the semilogx result is somewhat of a red herring. If you change the order of the tests in your script, you'll notice that the first *log* plot always takes the longest run time. If you run each test in a separate process, all of the *log* run times are approximately equal (with loglog being slightly slower). The reason for this is the caching of mathtext expressions. I agree that mathtext is the bottleneck -- but mathtext expressions are only parsed and rendered the first time they are encountered, and simply pulled from a cache after that. It's sort of a known issue that mathtext is slow-ish. It's a very function-call heavy and object-oriented bit of code and most attempts at optimization seem to lead to too much uglification. The algorithms themselves are from TeX, so I don't know if there's much room for improvement, but there is something about the translation from Pascal/C to Python that creates a very different performance profile. An interesting result may be to disable the mathtext rendering for log plots (by setting the axis formatters to something static) and comparing those numbers. That would give a better sense of the overhead of merely log-transforming the points and the transformation system itself. I don't think a factor of 2 is too problematic, given all of the extra work that has to be done to maintain two copies of the data, extra care to calculate xlim and ylim etc. Mike Andrew Hawryluk wrote: I've observed a significant difference in the time required by different plotting functions. With a plot of 5000 random data points (all positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG on my machine.) I used cProfile (script attached) and found several significant differences between the profiles of each plotting command. On my first analysis, it appears that most of the difference is due to increased use of mathtext in semilogx: == Plotting command == cumtime (s) plotsemilogx semilogy loglog == total running time 0.618 2.192 0.953 1.362 axis.py:181(draw) 0.118 1.500 0.412 0.569 text.py:504(draw) 0.056 1.353 0.290 0.287 mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 mathtext.py:2772(parse) --- 1.294 0.143 0.254 pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 pyparsing.py:3129(oneOf)--- 0.991 --- --- pyparsing.py:3147(lambda) --- 0.358 --- --- lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 = It seems that semilogx could be made as fast as semilogy since they have to do the same amount of work, but I'm not sure where the differences lie. Can anyone suggest where I should look first? Much thanks, Andrew Hawryluk matplotlib.__version__ = '0.99.1' Windows XP Professional Version 2002, Service Pack 3 Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
[matplotlib-devel] performance (speed) of logarithmic plots
I've observed a significant difference in the time required by different plotting functions. With a plot of 5000 random data points (all positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG on my machine.) I used cProfile (script attached) and found several significant differences between the profiles of each plotting command. On my first analysis, it appears that most of the difference is due to increased use of mathtext in semilogx: == Plotting command == cumtime (s) plotsemilogx semilogy loglog == total running time 0.618 2.192 0.953 1.362 axis.py:181(draw) 0.118 1.500 0.412 0.569 text.py:504(draw) 0.056 1.353 0.290 0.287 mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 mathtext.py:2772(parse) --- 1.294 0.143 0.254 pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 pyparsing.py:3129(oneOf)--- 0.991 --- --- pyparsing.py:3147(lambda) --- 0.358 --- --- lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 = It seems that semilogx could be made as fast as semilogy since they have to do the same amount of work, but I'm not sure where the differences lie. Can anyone suggest where I should look first? Much thanks, Andrew Hawryluk matplotlib.__version__ = '0.99.1' Windows XP Professional Version 2002, Service Pack 3 Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM semilogPerformance.py Description: semilogPerformance.py -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel