On Mon, Jan 02, 2012 at 06:44:54PM +0100, Francesc Alted wrote:
> Perhaps you may get a bit more performance if you use the
> `[read,write]_vl_blosc2_hdf` functions that I have sent in my earlier
> post, but that adds the python-blosc dependency (available at
> http://pypi.python.org/pypi/blosc/1.0.3), so yeah, that might be a bit
> 'exotic'.

Yes, it is exotic, but it would make sens that it should be faster. Your
code definetely looks very suited to what I want to do. I can see from
the benchmarks that you did that I manages to squeeze 10 to 30% from the
CArrays using blosc that I tried. I didn't bench this strategy, because I
didn't want to add a dependency to a package that does not already have a
large installation base. Maybe I'll add the benchmarks in the blog post,
if I find time (I am already way overtime on this), but anyhow, I'll
point to your email on this mailing list. 

I don't think that you expose the blosc bindings in pytables. Thus to use
this strategy, I would have to have blosc installed on all my computers.
For my own interest, and hopefully later inclusion of such a strategy in
my codebase, is there a way of getting closer to such performance without
requiring Python access to blosc?

Sadely, as the Linux distributions are lagging behind with the version of
pytables they ship, this would not be enough to enable me to rely on this
strategy for computers that do not have EPD installed.

> Also, as your datasets are pretty small, you may want to add a warning
> about the fact that these benchmarks are mainly doing I/O against
> memory, not disk.

Right, I should point out that I am benching only datasets that hold in
memory, whereas pytables is especially interesting for datasets that do
not hold in memory, isn't it? But I am curious at your statement that I
am mostly benching memory, not disk. I have indeed observed that using a
USB disk didn't affect performance too much, and I was quite surprised.
Do you have an explaination of why the disk bandwidth does seem too
relevent?

> BTW, another cleaner, faster way to empty the OS filesystem cache is this:

> sudo echo 3 | sudo tee /proc/sys/vm/drop_caches

> [ from http://ubuntuforums.org/showthread.php?t=589975 ]

That's useful!

Thanks a lot for all your comments.

Gaƫl

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to