Re: [Numpy-discussion] fast numpy i/o
Hi, I have been using h5py a lot (both on windows and Mac OSX) and can only recommend it- haven't tried the other options though Cheers, Simon On Tue, Jun 21, 2011 at 8:24 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 21.06.2011, at 7:58PM, Neal Becker wrote: I think, in addition, that hdf5 is the only one that easily interoperates with matlab? speaking of hdf5, I see: pyhdf5io 0.7 - Python module containing high-level hdf5 load and save functions. h5py 2.0.0 - Read and write HDF5 files from Python Any thoughts on the relative merits of these? In my experience, HDF5 access usually approaches disk access speed, and random access to sub-datasets should be significantly faster than reading in the entire file, though I have not been able to test this. I have not heard about pyhdf5io (how does it work together with numpy?) - as alternative to h5py I'd rather recommend pytables, though I prefer the former for its cleaner/simpler interface (but that probably depends on your programming habits). HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What Requires C and what is just python
Hi Ben, It's very easy to package numpy (and most other modules) with py2exe, which like Dan mentioned above, will include all necessary (also non-python) libraries into a dist-folder. The folder to distribute can of course get quite large if you include a lot of libraries - but I think that only standard libraries and numpy will be below 5 mb. Cheers, Simon On Mon, Mar 21, 2011 at 9:30 AM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 20. mars 2011, at 16.08, Ben Smith wrote: So, in addition to my computer science work, I'm a PhD student in econ. Right now, the class is using GAUSS for almost everything. This sort of pisses me off because it means people are building libraries of code that become valueless when they graduate (because right now we get GAUSS licenses for free, but it is absurdly expensive later) -- particularly when this is the only language they know. So, I had this idea of building some command line tools to do the same things using the most basic pieces of NumPy (arrays, dot products, transpose and inverse -- that's it). And it is going great. My problem however is that I'd like to be able to share these tools but I know I'm opening up a big can of worms where I have to go around building numpy on 75 peoples computers. What I'd like to do is limit myself to just the functions that are implemented in python, package it with py2exe and hand that to anyone that needs it. So, my question, if anyone knows, what's implemented in python and what depends on the c libraries? Is this even possible? I can testify that on most windows computers python(x,y) will give you everything you need - numpy, scipy, matplotlib, pyqt for GUI design, and much more. The only problem I ever saw was that some people had problems with $PATH not being set properly on windows. But this was on machines that seemed to be full of other problems. Oh, and in my experience, it is easier to run python scripts from the generic windows command line than in the ipython shell. Good luck, Paul ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to limit the numpy.memmap's RAM usage?
Hi List, I had similar problems on windows. I tried to use memmaps to buffer a large amount of data and process it in chunks. But I found that whenever I tried to do this, I always ended filling up RAM completely which led to crashes of my python script with a MemoryError. This led me to consider, actually from an advice via this list, the module h5py, which has a nice numpy interface to the hdf5 file format. It seemed more clear to me with the h5py-module, what was being buffered on disk and what was stored in RAM. Cheers, Simon On Sun, Oct 24, 2010 at 2:15 AM, David Cournapeau courn...@gmail.comwrote: On Sun, Oct 24, 2010 at 12:44 AM, braingateway braingate...@gmail.com wrote: I agree with you about the point of using memmap. That is why the behavior is so strange to me. I think it is expected. What kind of behavior were you expecting ? To be clear, if I have a lot of available ram, I expect memmap arrays to take almost all of it (virtual memroy ~ resident memory). Now, if at the same time, another process starts taking a lot of memory, I expect the OS to automatically lower resident memory for the process using memmap. I did a small experiment on mac os x, creating a giant mmap'd array in numpy, and at the same time running a small C program using mlock (to lock pages into physical memory). As soon as I lock a big area (where big means most of my physical ram), the python process dealing with the mmap area sees its resident memory decrease. As soon as I kill the C program locking the memory, the resident memory starts increasing again. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Accessing data in a large file
Hi list, I am new to this list, so forgive me if this is a trivial problem, however i would appreciate any help. I am using numpy to work with large amounts of data - sometimes too much to fit into memory. Therefore I want to be able to store data in binary files and use numpy to read chunks of the file into memory. I've tried to use numpy.memmap and numpy.load and numpy.save with mmap_mode=r. However when I try to perform any nontrivial operation on a slice of the memmap I always end up reading the entire file into memory - which then leads to memory errors. Is there a way to get numpy to do what I want, using an internal platform independent numpy-format like .npy, or do I have to wrap a custom file reader with something like ctypes? Of course numpy.fromfile is a possibility, but it seems to be a quite inflexible alternative as it doesn't really support slices and might have a problem with platform dependency (byte order). Hope that someone can help, cheers, Simon ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Accessing data in a large file
Thanks for the references to these libraries - they seem to fix my problem! Cheers, Simon On Thu, Jun 17, 2010 at 2:58 PM, davide lasagnadav...@gmail.com wrote: You may have a look to the nice python-h5py module, which gives an OO interface to the underlying hdf5 file format. I'm using it for storing large amounts (~10Gb) of experimental data. Very fast, very convenient. Ciao Davide On Thu, 2010-06-17 at 08:33 -0400, greg whittier wrote: On Thu, Jun 17, 2010 at 4:21 AM, Simon Lyngby Kokkendorff sil...@gmail.com wrote: memory errors. Is there a way to get numpy to do what I want, using an internal platform independent numpy-format like .npy, or do I have to wrap a custom file reader with something like ctypes? You might give http://www.pytables.org/ a try. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion