Re: [Matplotlib-users] Millions of data points saved to pdf

2014-05-02 Thread Daniele Nicolodi
On 01/05/2014 19:50, nertskull wrote:
 Is there anyway to have reasonable pdf sizes as well as this improved
 performance for keeping them in vector format?

As others tried to explain to you, plotting that many points in a plot
does not make any sense. The only thing that makes sense is to
down-sample your data to a manageable size. Depending on which features
of your data you are interested in, there are different methods for
doing that.

PS: which viewer are you using to render the PDF? I believe different
renders may have substantially different performances in rendering such
PDFs...

Cheers,
Daniele


--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Millions of data points saved to pdf

2014-05-02 Thread Jouni K . Seppänen
nertskull nertsk...@gmail.com writes:

 If I change that line the if True: then I get MUCH better results. 
 But I also get enormous file sizes.

That's interesting! It means that your pdf viewing program (which one,
by the way? Adobe Reader or some alternative?) is slow at compositing a
large number of prerendered markers, or perhaps it just renders each of
them again and again instead of prerendering, and does so more slowly
than if they were part of the same path.

 I've taken a subset of 10 of my 750 graphs.

 Those 10, before changing the backend, would make file sizes about about
 290KiB.  After changing the backend, if I use plot(x, y, '-') I still
 get a file size about 290KiB.

 But after changing the backend, if I use plot(x, y, '.') for my markers,
 my file size is no 21+ MB.  Just for 10 of my graphs.  I'm afraid making
 all 750 in the same pdf may be impossible at those size.

Does using ',' (comma) instead of '.' (full stop) as the marker help?  I
think the '.' marker is a circle, just at a small size, while the ','
marker is just two very short lines in the pdf backend. If the ','
marker produces an acceptable file size but its shape is not good
enough, we could experiment with creating a marker of intermediate
complexity.

One thing that I never thought about much is the precision in the
numbers the pdf backend outputs in the file. It seems that they are
being output with a fixed precision of ten digits after the decimal
point, which is probably overkill. There is currently no way to change
this except by editing the source code - the critical line is

r = (%.10f % obj).encode('ascii')

where 10 is the number of digits used. The same precision is used for
all floating-point numbers, including various transformation matrices,
so I can't offer a simple rule for how large deviations you will cause
by reducing the precision - you could experiment by making one figure
with the existing code and another with '%.3f', and see if the latter
looks good enough at the kind of zoom levels you are going to use (and
if it really reduces the file size much - there's a compression layer on
top of the ASCII representation).

That reminds me: one thing that could have an effect is the
pdf.compression setting, which defaults to 6 but you can set it to 9 
to make the compressed size a little bit smaller, at the expense of
spending more time when writing the file. That's not going to be a major
difference, though.

 Is there anyway to have reasonable pdf sizes as well as this improved
 performance for keeping them in vector format?

Like others have recommended, rendering huge clouds of single points is
a problematic task. I think it's an entirely valid thing to ask for, but
it's not likely that there will be a perfect solution, and some other
way of visualizing the data may be needed. Bokeh (suggested by Benjamin
Root) looks like something that could fit your needs better than a pdf
file in a viewer.

-- 
Jouni K. Seppänen
http://www.iki.fi/jks


--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


Re: [Matplotlib-users] Millions of data points saved to pdf

2014-05-02 Thread claudef
Dear colleagues, 

I had a similar issues with a large plot and several thousands of elements 
printed under Linux and Qt4Agg back-end.  At the PDF render I got some 
vector overlay and distortion of markers in the drawing, so I've changed 
the plotting output into a two step process, generating first a high 
resolution .png file and the using the Python image library to compress 
it into a much smaller .jpeg image output, which produces a browser 
friendly file or input source for Adobe .pdf editors like OpenOffice. 

Source:

import Image
# size for jpg and png output (16000 x 12000 pixel)
w = 80
h = 60
#
dpi_resolution  = 400
fig.set_size_inches(w,h)
DPI = fig.get_dpi()
print DPI:, DPI
Size = fig.get_size_inches()
print Size in Inches, Size
myformats = plt.gcf().canvas.get_supported_filetypes()
print Supported formats are:  + str(myformats)
mybackend = plt.get_backend()
print Backend used is:  + str(mybackend)
# save screen copy
fig.savefig('myplot.png', format='png', dpi= (dpi_resolution))
# JPEG compression with quality of 10
myimage = Image.open('myplot.png')
myimage = myimage.resize((16000, 12000), Image.ANTIALIAS)
#quality = 10% .. very high compression with few blurs
quality_val = 10
myimage.save('myplot.jpg', 'JPEG', quality=quality_val)

The visual result looks acceptable with no distortion. This process gives 
some control about compression and quality. 

Hope this is useful. 

Regards, 
Claude

Claude Falbriard 
Certified IT Specialist L2 - Middleware
AMS Hortolândia / SP - Brazil
phone:+55 13 9 9760 0453
cell: +55 13 9 8117 3316
e-mail:clau...@br.ibm.com




From:   Jouni K. Seppänen j...@iki.fi
To: matplotlib-users@lists.sourceforge.net, 
Date:   02/05/2014 12:55
Subject:Re: [Matplotlib-users] Millions of data points saved to 
pdf



nertskull nertsk...@gmail.com writes:

 If I change that line the if True: then I get MUCH better results. 
 But I also get enormous file sizes.

That's interesting! It means that your pdf viewing program (which one,
by the way? Adobe Reader or some alternative?) is slow at compositing a
large number of prerendered markers, or perhaps it just renders each of
them again and again instead of prerendering, and does so more slowly
than if they were part of the same path.

 I've taken a subset of 10 of my 750 graphs.

 Those 10, before changing the backend, would make file sizes about about
 290KiB.  After changing the backend, if I use plot(x, y, '-') I still
 get a file size about 290KiB.

 But after changing the backend, if I use plot(x, y, '.') for my markers,
 my file size is no 21+ MB.  Just for 10 of my graphs.  I'm afraid making
 all 750 in the same pdf may be impossible at those size.

Does using ',' (comma) instead of '.' (full stop) as the marker help?  I
think the '.' marker is a circle, just at a small size, while the ','
marker is just two very short lines in the pdf backend. If the ','
marker produces an acceptable file size but its shape is not good
enough, we could experiment with creating a marker of intermediate
complexity.

One thing that I never thought about much is the precision in the
numbers the pdf backend outputs in the file. It seems that they are
being output with a fixed precision of ten digits after the decimal
point, which is probably overkill. There is currently no way to change
this except by editing the source code - the critical line is

r = (%.10f % obj).encode('ascii')

where 10 is the number of digits used. The same precision is used for
all floating-point numbers, including various transformation matrices,
so I can't offer a simple rule for how large deviations you will cause
by reducing the precision - you could experiment by making one figure
with the existing code and another with '%.3f', and see if the latter
looks good enough at the kind of zoom levels you are going to use (and
if it really reduces the file size much - there's a compression layer on
top of the ASCII representation).

That reminds me: one thing that could have an effect is the
pdf.compression setting, which defaults to 6 but you can set it to 9 
to make the compressed size a little bit smaller, at the expense of
spending more time when writing the file. That's not going to be a major
difference, though.

 Is there anyway to have reasonable pdf sizes as well as this improved
 performance for keeping them in vector format?

Like others have recommended, rendering huge clouds of single points is
a problematic task. I think it's an entirely valid thing to ask for, but
it's not likely that there will be a perfect solution, and some other
way of visualizing the data may be needed. Bokeh (suggested by Benjamin
Root) looks like something that could fit your needs better than a pdf
file in a viewer.

-- 
Jouni K. Seppänen
http://www.iki.fi/jks


--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests