On Mon, Jan 02, 2012 at 06:44:54PM +0100, Francesc Alted wrote: > Perhaps you may get a bit more performance if you use the > `[read,write]_vl_blosc2_hdf` functions that I have sent in my earlier > post, but that adds the python-blosc dependency (available at > http://pypi.python.org/pypi/blosc/1.0.3), so yeah, that might be a bit > 'exotic'.
Yes, it is exotic, but it would make sens that it should be faster. Your code definetely looks very suited to what I want to do. I can see from the benchmarks that you did that I manages to squeeze 10 to 30% from the CArrays using blosc that I tried. I didn't bench this strategy, because I didn't want to add a dependency to a package that does not already have a large installation base. Maybe I'll add the benchmarks in the blog post, if I find time (I am already way overtime on this), but anyhow, I'll point to your email on this mailing list. I don't think that you expose the blosc bindings in pytables. Thus to use this strategy, I would have to have blosc installed on all my computers. For my own interest, and hopefully later inclusion of such a strategy in my codebase, is there a way of getting closer to such performance without requiring Python access to blosc? Sadely, as the Linux distributions are lagging behind with the version of pytables they ship, this would not be enough to enable me to rely on this strategy for computers that do not have EPD installed. > Also, as your datasets are pretty small, you may want to add a warning > about the fact that these benchmarks are mainly doing I/O against > memory, not disk. Right, I should point out that I am benching only datasets that hold in memory, whereas pytables is especially interesting for datasets that do not hold in memory, isn't it? But I am curious at your statement that I am mostly benching memory, not disk. I have indeed observed that using a USB disk didn't affect performance too much, and I was quite surprised. Do you have an explaination of why the disk bandwidth does seem too relevent? > BTW, another cleaner, faster way to empty the OS filesystem cache is this: > sudo echo 3 | sudo tee /proc/sys/vm/drop_caches > [ from http://ubuntuforums.org/showthread.php?t=589975 ] That's useful! Thanks a lot for all your comments. Gaƫl ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users