Re: [Numpy-discussion] [ANN] IPython 0.11 is officially out
On Sun, Jul 31, 2011 at 10:19 AM, Fernando Perez fperez@gmail.com wrote: Please see our release notes for the full details on everything about this release: https://github.com/ipython/ipython/zipball/rel-0.11 And embarrassingly, that URL was for a zip download instead (copy/paste error), the detailed release notes are here: http://ipython.org/ipython-doc/rel-0.11/whatsnew/version0.11.html Sorry about the mistake... Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice
On 8/2/11 1:40 PM, Jeremy Conlin wrote: Thanks for that information. It helps greatly in understanding what is happening. Next time I'll put my data into tuples. I don't remember where they all are, but there are a few places in numpy where tuples and lists are interpreted differently (fancy indexing?). It kind of breaks python duck typing (a sequence is a sequence), but it's useful, too. So when a list fails to do what you want, try a tuple. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Reading a big netcdf file
Hi. I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. The data are described as: *The GEBCO gridded data set is stored in NetCDF as a one dimensional array of 2-byte signed integers that represent integer elevations in metres. The complete data set gives global coverage. It consists of 21601 x 10801 data values, one for each one minute of latitude and longitude for 233312401 points. The data start at position 90°N, 180°W and are arranged in bands of 360 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.* The problem is that it is very slow (or I am quite newbie). Anyone has a suggestion to get these data in a numpy array in a faster way? Thanks in advance. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
Here are my values for your comparison: test.nc file is about 715 MB. The details are below: In [21]: netCDF4.__version__ Out[21]: '0.9.4' In [22]: np.__version__ Out[22]: '2.0.0.dev-b233716' In [23]: from netCDF4 import Dataset In [24]: f = Dataset(test.nc) In [25]: f.variables['reflectivity'].shape Out[25]: (6, 18909, 506) In [26]: f.variables['reflectivity'].size Out[26]: 57407724 In [27]: f.variables['reflectivity'][:].dtype Out[27]: dtype('float32') In [28]: timeit z = f.variables['reflectivity'][:] 1 loops, best of 3: 731 ms per loop How long it takes in your side to read that big array? On Wed, Aug 3, 2011 at 10:30 AM, Kiko kikocorre...@gmail.com wrote: Hi. I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. The data are described as: *The GEBCO gridded data set is stored in NetCDF as a one dimensional array of 2-byte signed integers that represent integer elevations in metres. The complete data set gives global coverage. It consists of 21601 x 10801 data values, one for each one minute of latitude and longitude for 233312401 points. The data start at position 90°N, 180°W and are arranged in bands of 360 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.* The problem is that it is very slow (or I am quite newbie). Anyone has a suggestion to get these data in a numpy array in a faster way? Thanks in advance. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 8/3/11 9:30 AM, Kiko wrote: I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. I've never noticed that netCDF4 was particularly slow for reading (writing can be pretty slow some times). How slow is slow? The data are described as: please post the results of: ncdump -h the_file_name.nc So we can see if there is anything odd in the structure (though I don't know what it might be) Post your code (in the simnd pplest form you can). and post your timings and machine type Is the file netcdf4 or 3 format? (the python lib will read either) As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). YOu can test that like this: a = np.zeros((21601, 10801), dtype=np.uint16) a.tofile('temp.npa') del a timeit a = np.fromfile('temp.npa', dtype=np.uint16) (using ipython's timeit) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
Just a few extra tests on my side pushing the limits of my system memory: In [34]: k = np.zeros((21601, 10801, 3), dtype='int16') k ndarray 21601x10801x3: 699937203 elems, type `int16`, 1399874406 bytes (1335 Mb) And for the first time my memory explodes with a hard kernel crash: In [36]: k = np.zeros((21601, 10801, 13), dtype='int16') Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531155] [ cut here ] Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531163] invalid opcode: [#1] SMP Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531166] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531253] Stack: Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531265] Call Trace: Message from syslogd@ccn at Aug 3 10:51:43 ... kernel:[48715.531332] Code: be 33 01 00 00 48 89 fb 48 c7 c7 67 31 7a 81 e8 b0 2d f1 ff e8 90 f2 33 00 48 89 df e8 86 db 00 00 48 83 bb 60 01 00 00 00 74 02 0f 0b 48 8b 83 10 02 00 00 a8 20 75 02 0f 0b a8 40 74 02 0f 0b On Wed, Aug 3, 2011 at 10:46 AM, Gökhan Sever gokhanse...@gmail.com wrote: Here are my values for your comparison: test.nc file is about 715 MB. The details are below: In [21]: netCDF4.__version__ Out[21]: '0.9.4' In [22]: np.__version__ Out[22]: '2.0.0.dev-b233716' In [23]: from netCDF4 import Dataset In [24]: f = Dataset(test.nc) In [25]: f.variables['reflectivity'].shape Out[25]: (6, 18909, 506) In [26]: f.variables['reflectivity'].size Out[26]: 57407724 In [27]: f.variables['reflectivity'][:].dtype Out[27]: dtype('float32') In [28]: timeit z = f.variables['reflectivity'][:] 1 loops, best of 3: 731 ms per loop How long it takes in your side to read that big array? On Wed, Aug 3, 2011 at 10:30 AM, Kiko kikocorre...@gmail.com wrote: Hi. I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. The data are described as: *The GEBCO gridded data set is stored in NetCDF as a one dimensional array of 2-byte signed integers that represent integer elevations in metres. The complete data set gives global coverage. It consists of 21601 x 10801 data values, one for each one minute of latitude and longitude for 233312401 points. The data start at position 90°N, 180°W and are arranged in bands of 360 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.* The problem is that it is very slow (or I am quite newbie). Anyone has a suggestion to get these data in a numpy array in a faster way? Thanks in advance. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gökhan -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 8/3/11 12:50 PM, Christopher Barker wrote: As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). 2.57 seconds? or minutes? If seconds, does it actually read the whole thing into memory in that time, or is there some kind of delayed read going on? Ian attachment: ijstokes.vcf___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 8/3/11 11:09 AM, Ian Stokes-Rees wrote: On 8/3/11 12:50 PM, Christopher Barker wrote: As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). 2.57 seconds? or minutes? sorry -- seconds. If seconds, does it actually read the whole thing into memory in that time, or is there some kind of delayed read going on? I think it reads it all in. However, now that you bring it up, I think timeit does it a few times, and after the first time, there may well be disk cache that speeds things up. In fact, as I recently wrote the file, there may be disk cache issues even on the first read. I'm no timing expert, but there must be ways to get a clean time. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 8/3/11 9:46 AM, Gökhan Sever wrote: In [23]: from netCDF4 import Dataset In [24]: f = Dataset(test.nc http://test.nc) In [25]: f.variables['reflectivity'].shape Out[25]: (6, 18909, 506) In [26]: f.variables['reflectivity'].size Out[26]: 57407724 In [27]: f.variables['reflectivity'][:].dtype Out[27]: dtype('float32') In [28]: timeit z = f.variables['reflectivity'][:] 1 loops, best of 3: 731 ms per loop that seems pretty fast, actually -- are you sure that [:] forces the full data read? It probably does, but I'm not totally sure. is z a numpy array object at that point? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
This is what I get here: In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) In [2]: a.tofile('temp.npa') In [3]: del a In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 251 ms per loop On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker chris.bar...@noaa.govwrote: On 8/3/11 9:30 AM, Kiko wrote: I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. I've never noticed that netCDF4 was particularly slow for reading (writing can be pretty slow some times). How slow is slow? The data are described as: please post the results of: ncdump -h the_file_name.nc So we can see if there is anything odd in the structure (though I don't know what it might be) Post your code (in the simnd pplest form you can). and post your timings and machine type Is the file netcdf4 or 3 format? (the python lib will read either) As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). YOu can test that like this: a = np.zeros((21601, 10801), dtype=np.uint16) a.tofile('temp.npa') del a timeit a = np.fromfile('temp.npa', dtype=np.uint16) (using ipython's timeit) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
I think these answer your questions. In [3]: type f.variables['reflectivity'] -- type(f.variables['reflectivity']) Out[3]: type 'netCDF4.Variable' In [4]: type f.variables['reflectivity'][:] -- type(f.variables['reflectivity'][:]) Out[4]: type 'numpy.ndarray' In [5]: z = f.variables['reflectivity'][:] In [6]: type z -- type(z) Out[6]: type 'numpy.ndarray' In [10]: id f.variables['reflectivity'][:] --- id(f.variables['reflectivity'][:]) Out[10]: 37895488 In [11]: id z --- id(z) Out[11]: 37901440 On Wed, Aug 3, 2011 at 12:40 PM, Christopher Barker chris.bar...@noaa.govwrote: On 8/3/11 9:46 AM, Gökhan Sever wrote: In [23]: from netCDF4 import Dataset In [24]: f = Dataset(test.nc http://test.nc) In [25]: f.variables['reflectivity'].shape Out[25]: (6, 18909, 506) In [26]: f.variables['reflectivity'].size Out[26]: 57407724 In [27]: f.variables['reflectivity'][:].dtype Out[27]: dtype('float32') In [28]: timeit z = f.variables['reflectivity'][:] 1 loops, best of 3: 731 ms per loop that seems pretty fast, actually -- are you sure that [:] forces the full data read? It probably does, but I'm not totally sure. is z a numpy array object at that point? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 8/3/11 1:57 PM, Gökhan Sever wrote: This is what I get here: In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) In [2]: a.tofile('temp.npa') In [3]: del a In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 251 ms per loop so that's about 10 times faster than my machine. I didn't think disks had gotten much faster -- they are still generally 7200 rpm (or slower in laptops). So I've either got a really slow disk, or you have a really fast one (or both), or maybe you're getting cache effect, as you wrote the file just before reading it. repeating, doing just what you did: In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 2.53 s per loop then I wrote a bunch of others to disk, and tried again: In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 2.45 s per loop so ti seems I'm not seeing cache effects, but maybe you are. Anyway, we haven't heard from the OP -- I'm not sure what s/he thought was slow. -Chris On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov wrote: On 8/3/11 9:30 AM, Kiko wrote: I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. I've never noticed that netCDF4 was particularly slow for reading (writing can be pretty slow some times). How slow is slow? The data are described as: please post the results of: ncdump -h the_file_name.nc http://the_file_name.nc So we can see if there is anything odd in the structure (though I don't know what it might be) Post your code (in the simnd pplest form you can). and post your timings and machine type Is the file netcdf4 or 3 format? (the python lib will read either) As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). YOu can test that like this: a = np.zeros((21601, 10801), dtype=np.uint16) a.tofile('temp.npa') del a timeit a = np.fromfile('temp.npa', dtype=np.uint16) (using ipython's timeit) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 tel:%28206%29%20526-6959 voice 7600 Sand Point Way NE (206) 526-6329 tel:%28206%29%20526-6329 fax Seattle, WA 98115 (206) 526-6317 tel:%28206%29%20526-6317 main reception chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading a big netcdf file
On 08/03/2011 11:24 AM, Gökhan Sever wrote: I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 263 ms per loop You need to clear your cache and then run timeit with options -n1 -r1. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion