Re: [matplotlib-devel] Axes.add_line() is oddly slow?

2008-10-07 Thread Michael Droettboom
According to lsprofcalltree, the slowness appears to be entirely in the 
units code by a wide margin -- which is unfortunately code I understand 
very little about.  The difference in timing before and after adding the 
line to the axes appears to be because the unit conversion is not 
invalidated until the line has been added to an axes.


In units.get_converter(), it iterates through every *value* in the data 
to see if any of them require unit conversion, and returns the first one 
it finds.  It seems like if we're passing in a numpy array of numbers 
(i.e. not array of objects), then we're pretty much guaranteed from the 
get-go not to find a single value that requires unit conversion so we 
might as well not look.  Am I making the wrong assumption?


However, for lists, it also seems that, since the code returns the first 
converter it finds, maybe it could just look at the first element of the 
sequence, rather than the entire sequence.  It the first is not in the 
same unit as everything else, then the result will be broken anyway.  
For example, if I hack evans_test.py to contain a single int amongst the 
list of "Foo" objects in the data, I get an exception anyway, even as 
the code stands now.


I have attached a patch against unit.py to speed up the first case 
(passing Numpy arrays).  I think I need more feedback from the units 
experts whether my suggestion for lists (to only look at the first 
element) is reasonable.


Feel free to commit the patch if it seems reasonable to those who know 
more about units than I do.


Mike

Eric Firing wrote:
I am getting very inconsistent timings when looking into plotting a line 
with a very large number of points.  Axes.add_line() is very slow, and 
the time is taken by Axes._update_line_limits().  But when I simply run 
the latter, on a Line2D of the same dimensions, it can be fast.


import matplotlib
matplotlib.use('template')
import numpy as np
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
from time import time
t = time(); ax.add_line(LL); time()-t
###16.621543884277344
LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax.add_line(LL); time()-t
###16.579419136047363
## We added two identical lines, each took 16 seconds.

LL = mlines.Line2D(np.arange(1.5e6), np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.1733548641204834
## But when we made another identical line, updating the limits was
## fast.

# Below are similar experiments:
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.18362092971801758

## with a fresh axes:
plt.clf()
ax = plt.gca()
LL = mlines.Line2D(np.arange(1.5e6), 2*np.sin(np.arange(1.5e6)))
t = time(); ax._update_line_limits(LL); time()-t
###0.22244811058044434

t = time(); ax.add_line(LL); time()-t
###16.724560976028442

What is going on?  I used print statements inside add_line() to verify 
that all the time is in _update_line_limits(), which runs one or two 
orders of magnitude slower when run inside of add_line than when run 
outside--even if I run the preceding parts of add_line first.


Eric

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
  


--
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

Index: lib/matplotlib/units.py
===
--- lib/matplotlib/units.py (revision 6142)
+++ lib/matplotlib/units.py (working copy)
@@ -44,6 +44,7 @@

 """
 from matplotlib.cbook import iterable, is_numlike
+import numpy as np

 class AxisInfo:
 'information to support default axis labeling and tick labeling'
@@ -127,6 +128,9 @@
 converter = self.get(classx)

 if converter is None and iterable(x):
+if isinstance(x, np.ndarray) and x.dtype != np.object:
+return None
+
 for thisx in x:
 converter = self.get_converter( thisx )
 if converter: break
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner

[matplotlib-devel] Installation bug on OS X

2008-10-07 Thread Keaton Mowery
Hey all,

I hope this is the right list for this sort of thing, but here goes.
My installation of matplotlib (via macports) bombed out with this
error:

Traceback (most recent call last):
  File "setup.py", line 125, in 
if check_for_tk() or (options['build_tkagg'] is True):
  File 
"/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py25-matplotlib/work/matplotlib-0.98.3/setupext.py",
line 841, in check_for_tk
explanation = add_tk_flags(module)
  File 
"/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py25-matplotlib/work/matplotlib-0.98.3/setupext.py",
line 1055, in add_tk_flags
module.libraries.extend(['tk' + tk_ver, 'tcl' + tk_ver])
UnboundLocalError: local variable 'tk_ver' referenced before assignment

I fixed it by adding
tcl_lib_dir = ""
tk_lib_dir = ""
tk_ver = ""
at line 1033 in setupext.py.  That way, if we do get an exception in
the ensuing try block, the variables are still defined.  This seemed
to clear things up nicely.  Hope that's clear... feel free to ask for
any further debugging info.  Thanks!

Keaton Mowery

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] Axes.add_line() is oddly slow?

2008-10-07 Thread John Hunter
On Tue, Oct 7, 2008 at 9:18 AM, Michael Droettboom <[EMAIL PROTECTED]> wrote:
> According to lsprofcalltree, the slowness appears to be entirely in the
> units code by a wide margin -- which is unfortunately code I understand very
> little about.  The difference in timing before and after adding the line to
> the axes appears to be because the unit conversion is not invalidated until
> the line has been added to an axes.
>
> In units.get_converter(), it iterates through every *value* in the data to
> see if any of them require unit conversion, and returns the first one it
> finds.  It seems like if we're passing in a numpy array of numbers (i.e. not
> array of objects), then we're pretty much guaranteed from the get-go not to
> find a single value that requires unit conversion so we might as well not
> look.  Am I making the wrong assumption?
>
> However, for lists, it also seems that, since the code returns the first
> converter it finds, maybe it could just look at the first element of the
> sequence, rather than the entire sequence.  It the first is not in the same
> unit as everything else, then the result will be broken anyway.

I made this change -- return the converter from the first element --
and added Michael's non-object numpy arrat optimization too.  The
units code needs some attention, I just haven't been able to get to
it...

This helps performance considerably -- on backend driver:

Before:
  Backend agg took 1.32 minutes to complete
  Backend ps took 1.37 minutes to complete
  Backend pdf took 1.78 minutes to complete
  Backend template took 0.83 minutes to complete
  Backend svg took 1.53 minutes to complete

After:
  Backend agg took 1.08 minutes to complete
  Backend ps took 1.15 minutes to complete
  Backend pdf took 1.57 minutes to complete
  Backend template took 0.61 minutes to complete
  Backend svg took 1.31 minutes to complete

Obviously, the results for tests focused on lines with lots of data
would be more dramatic.


Thanks for these suggestions.
JDH

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] SF.net SVN: matplotlib:[6166] trunk/matplotlib/lib/matplotlib/units.py

2008-10-07 Thread Michael Droettboom
This isn't quite what I was suggesting (and seems to be equivalent to 
the code as before).  In the common case where there are no units in the 
data, this will still traverse the entire list.

I think replacing the whole loop with:

  converter = self.get_converter(iter(x).next())

would be even better.  (Since lists of data should not be heterogeneous 
anyway...)

Mike

[EMAIL PROTECTED] wrote:
> Revision: 6166
>   http://matplotlib.svn.sourceforge.net/matplotlib/?rev=6166&view=rev
> Author:   jdh2358
> Date: 2008-10-07 15:13:53 + (Tue, 07 Oct 2008)
>
> Log Message:
> ---
> added michaels unit detection optimization for arrays
>
> Modified Paths:
> --
> trunk/matplotlib/lib/matplotlib/units.py
>
> Modified: trunk/matplotlib/lib/matplotlib/units.py
> ===
> --- trunk/matplotlib/lib/matplotlib/units.py  2008-10-07 15:13:13 UTC (rev 
> 6165)
> +++ trunk/matplotlib/lib/matplotlib/units.py  2008-10-07 15:13:53 UTC (rev 
> 6166)
> @@ -135,7 +135,7 @@
>  
>  for thisx in x:
>  converter = self.get_converter( thisx )
> -if converter: break
> +return converter
>  
>  #DISABLED self._cached[idx] = converter
>  return converter
>
>
> This was sent by the SourceForge.net collaborative development platform, the 
> world's largest Open Source development site.
>
> -
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> ___
> Matplotlib-checkins mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/matplotlib-checkins
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification with nan (or move_to)

2008-10-07 Thread Michael Droettboom
Eric Firing wrote:
> Mike, John,
>
> Because path simplification does not work with anything but a 
> continuous line, it is turned off if there are any nans in the path.  
> The result is that if one does this:
>
> import numpy as np
> xx = np.arange(20)
> yy = np.random.rand(20)
> #plot(xx, yy)
> yy[1000] = np.nan
> plot(xx, yy)
>
> the plot fails with an incomplete rendering and general 
> unresponsiveness; apparently some mysterious agg limit is quietly 
> exceeded.
The limit in question is "cell_block_limit" in 
agg_rasterizer_cells_aa.h.  The relationship between the number vertices 
and the number of rasterization cells I suspect depends on the nature of 
the values. 

However, if we want to increase the limit, each "cell_block" is 4096 
cells, each with 16 bytes, and currently it maxes out at 1024 cell 
blocks, for a total of 67,108,864 bytes.  So, the question is, how much 
memory should be devoted to rasterization, when the data set is large 
like this?  I think we could safely quadruple this number for a lot of 
modern machines, and this maximum won't affect people plotting smaller 
data sets, since the memory is dynamically allocated anyway.  It works 
for me, but I have 4GB RAM here at work.
> With or without the nan, this test case also shows the bizarre 
> slowness of add_line that I asked about in a message yesterday, and 
> that has me completely baffled.
lsprofcalltree is my friend!
>
> Both of these are major problems for real-world use.
>
> Do you have any thoughts on timing and strategy for solving this 
> problem?  A few weeks ago, when the problem with nans and path 
> simplification turned up, I tried to figure out what was going on and 
> how to fix it, but I did not get very far.  I could try again, but as 
> you know I don't get along well with C++.
That simplification code is pretty hairy, particularly because it tries 
to avoid a copy by doing everything in an iterator/generator way.  I 
think even just supporting MOVETOs there would be tricky, but probably 
the easiest first thing.
>
> I am also wondering whether more than straightforward path 
> simplification with nan/moveto might be needed.  Suppose there is a 
> nightmarish time series with every third point being bad, so it is 
> essentially a sequence of 2-point line segments.  The simplest form of 
> path simplification fix might be to reset the calculation whenever a 
> moveto is encountered, but this would yield no simplification in this 
> case.  I assume Agg would still choke. Is there a need for some sort 
> of automatic chunking of the rendering operation in addition to path 
> simplification?
>
Chunking is probably something worth looking into (for lines, at least), 
as it might also reduce memory usage vs. the "increase the 
cell_block_limit" scenario.

I also think for the special case of high-resolution time series data, 
where x if uniform, there is an opportunity to do something completely 
different that should be far faster.  Audio editors (such as Audacity), 
draw each column of pixels based on the min/max and/or mean and/or RMS 
of the values within that column.  This makes the rendering extremely 
fast and simple.  See:

http://audacity.sourceforge.net/about/images/audacity-macosx.png

Of course, that would mean writing a bunch of new code, but it shouldn't 
be incredibly tricky new code.  It could convert the time series data to 
an image and plot that, or to a filled polygon whose vertices are 
downsampled from the original data.  The latter may be nicer for Ps/Pdf 
output.

Cheers,
Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] SF.net SVN: matplotlib:[6166] trunk/matplotlib/lib/matplotlib/units.py

2008-10-07 Thread John Hunter
On Tue, Oct 7, 2008 at 11:26 AM, Michael Droettboom <[EMAIL PROTECTED]> wrote:
> This isn't quite what I was suggesting (and seems to be equivalent to
> the code as before).  In the common case where there are no units in the
> data, this will still traverse the entire list.
>
> I think replacing the whole loop with:
>
>  converter = self.get_converter(iter(x).next())
>
> would be even better.  (Since lists of data should not be heterogeneous
> anyway...)

Hmm, I don't see how it would traverse the entire list

for thisx in x:
converter = self.get_converter( thisx )
return converter

since it will return after the first element in the loop.  I have no
problem with the iter approach, but am not seeing what the problem is
with this usage.

JDH

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] SF.net SVN: matplotlib:[6166] trunk/matplotlib/lib/matplotlib/units.py

2008-10-07 Thread Michael Droettboom
Sorry.  I didn't read carefully enough.  That's right -- the "if 
converter: break" was replaced with "return converter".

You're right.  This is fine.

Mike

John Hunter wrote:
> On Tue, Oct 7, 2008 at 11:26 AM, Michael Droettboom <[EMAIL PROTECTED]> wrote:
>   
>> This isn't quite what I was suggesting (and seems to be equivalent to
>> the code as before).  In the common case where there are no units in the
>> data, this will still traverse the entire list.
>>
>> I think replacing the whole loop with:
>>
>>  converter = self.get_converter(iter(x).next())
>>
>> would be even better.  (Since lists of data should not be heterogeneous
>> anyway...)
>> 
>
> Hmm, I don't see how it would traverse the entire list
>
> for thisx in x:
> converter = self.get_converter( thisx )
> return converter
>
> since it will return after the first element in the loop.  I have no
> problem with the iter approach, but am not seeing what the problem is
> with this usage.
>
> JDH
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Matplotlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] path simplification with nan (or move_to)

2008-10-07 Thread Eric Firing

Michael Droettboom wrote:

Eric Firing wrote:

Mike, John,

Because path simplification does not work with anything but a 
continuous line, it is turned off if there are any nans in the path.  
The result is that if one does this:


import numpy as np
xx = np.arange(20)
yy = np.random.rand(20)
#plot(xx, yy)
yy[1000] = np.nan
plot(xx, yy)

the plot fails with an incomplete rendering and general 
unresponsiveness; apparently some mysterious agg limit is quietly 
exceeded.
The limit in question is "cell_block_limit" in 
agg_rasterizer_cells_aa.h.  The relationship between the number vertices 
and the number of rasterization cells I suspect depends on the nature of 
the values.
However, if we want to increase the limit, each "cell_block" is 4096 
cells, each with 16 bytes, and currently it maxes out at 1024 cell 
blocks, for a total of 67,108,864 bytes.  So, the question is, how much 
memory should be devoted to rasterization, when the data set is large 
like this?  I think we could safely quadruple this number for a lot of 
modern machines, and this maximum won't affect people plotting smaller 
data sets, since the memory is dynamically allocated anyway.  It works 
for me, but I have 4GB RAM here at work.


It sounds like we have little to lose by increasing the limit as you 
suggest here.  In addition, it would be nice if hitting that limit 
triggered an informative exception instead of a puzzling and quiet 
failure, but maybe that would be hard to arrange.  I have no idea how to 
approach it.


With or without the nan, this test case also shows the bizarre 
slowness of add_line that I asked about in a message yesterday, and 
that has me completely baffled.

lsprofcalltree is my friend!


Thank you very much for finding that!



Both of these are major problems for real-world use.

Do you have any thoughts on timing and strategy for solving this 
problem?  A few weeks ago, when the problem with nans and path 
simplification turned up, I tried to figure out what was going on and 
how to fix it, but I did not get very far.  I could try again, but as 
you know I don't get along well with C++.
That simplification code is pretty hairy, particularly because it tries 
to avoid a copy by doing everything in an iterator/generator way.  I 
think even just supporting MOVETOs there would be tricky, but probably 
the easiest first thing.


The attached patch seems to work, based on cursory testing.  I can make 
an array of 1M points, salt it with nans, and plot it, complete with 
gaps, and all in a reasonably snappy fashion, thanks to your units fix.


I will hold off on committing it until I hear from you or John; or if 
either of you want to polish and commit it (or an alternative), that's 
even better.


Eric



I am also wondering whether more than straightforward path 
simplification with nan/moveto might be needed.  Suppose there is a 
nightmarish time series with every third point being bad, so it is 
essentially a sequence of 2-point line segments.  The simplest form of 
path simplification fix might be to reset the calculation whenever a 
moveto is encountered, but this would yield no simplification in this 
case.  I assume Agg would still choke. Is there a need for some sort 
of automatic chunking of the rendering operation in addition to path 
simplification?


Chunking is probably something worth looking into (for lines, at least), 
as it might also reduce memory usage vs. the "increase the 
cell_block_limit" scenario.


I also think for the special case of high-resolution time series data, 
where x if uniform, there is an opportunity to do something completely 
different that should be far faster.  Audio editors (such as Audacity), 
draw each column of pixels based on the min/max and/or mean and/or RMS 
of the values within that column.  This makes the rendering extremely 
fast and simple.  See:


http://audacity.sourceforge.net/about/images/audacity-macosx.png

Of course, that would mean writing a bunch of new code, but it shouldn't 
be incredibly tricky new code.  It could convert the time series data to 
an image and plot that, or to a filled polygon whose vertices are 
downsampled from the original data.  The latter may be nicer for Ps/Pdf 
output.


Cheers,
Mike



Index: src/agg_py_path_iterator.h
===
--- src/agg_py_path_iterator.h	(revision 6166)
+++ src/agg_py_path_iterator.h	(working copy)
@@ -137,7 +137,8 @@
  double width = 0.0, double height = 0.0) :
 m_source(&source), m_quantize(quantize), m_simplify(simplify),
 m_width(width + 1.0), m_height(height + 1.0), m_queue_read(0), m_queue_write(0),
-m_moveto(true), m_lastx(0.0), m_lasty(0.0), m_clipped(false),
+m_moveto(true), m_after_moveto(false),
+m_lastx(0.0), m_lasty(0.0), m_clipped(false),
 m_do_clipping(width > 0.0 && height > 0.0),
 m_origdx(0.0), m_origdy(0.0),