Re: [Numpy-discussion] [ANN] IPython 0.11 is officially out

2011-08-03 Thread Fernando Perez
On Sun, Jul 31, 2011 at 10:19 AM, Fernando Perez fperez@gmail.com wrote:

 Please see our release notes for the full details on everything about
 this release: https://github.com/ipython/ipython/zipball/rel-0.11

And embarrassingly, that URL was for a zip download instead
(copy/paste error), the detailed release notes are here:

http://ipython.org/ipython-doc/rel-0.11/whatsnew/version0.11.html

Sorry about the mistake...

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice

2011-08-03 Thread Christopher Barker
On 8/2/11 1:40 PM, Jeremy Conlin wrote:

 Thanks for that information. It helps greatly in understanding what is
 happening. Next time I'll put my data into tuples.

I don't remember where they all are, but there are a few places in numpy 
where tuples and lists are interpreted differently (fancy indexing?). It 
kind of breaks python duck typing (a sequence is a sequence), but it's 
useful, too.

So when a list fails to do what you want, try a tuple.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Kiko
Hi.

I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

The data are described as:
*The GEBCO gridded data set is stored in NetCDF as a one dimensional array
of 2-byte signed integers that represent integer elevations in metres.
The complete data set gives global coverage. It consists of 21601 x 10801
data values, one for each one minute of latitude and longitude for 233312401
points.
The data start at position 90°N, 180°W and are arranged in bands of 360
degrees x 60 points/degree + 1 = 21601 values. The data range eastward from
180°W longitude to 180°E longitude, i.e. the 180° value is repeated.*

The problem is that it is very slow (or I am quite newbie).

Anyone has a suggestion to get these data in a numpy array in a faster way?

Thanks in advance.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Gökhan Sever
Here are my values for your comparison:

test.nc file is about 715 MB. The details are below:

In [21]: netCDF4.__version__
Out[21]: '0.9.4'

In [22]: np.__version__
Out[22]: '2.0.0.dev-b233716'

In [23]: from netCDF4 import Dataset

In [24]: f = Dataset(test.nc)

In [25]: f.variables['reflectivity'].shape
Out[25]: (6, 18909, 506)

In [26]: f.variables['reflectivity'].size
Out[26]: 57407724

In [27]: f.variables['reflectivity'][:].dtype
Out[27]: dtype('float32')

In [28]: timeit z = f.variables['reflectivity'][:]
1 loops, best of 3: 731 ms per loop

How long it takes in your side to read that big array?

On Wed, Aug 3, 2011 at 10:30 AM, Kiko kikocorre...@gmail.com wrote:

 Hi.

 I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

 The data are described as:
 *The GEBCO gridded data set is stored in NetCDF as a one dimensional array
 of 2-byte signed integers that represent integer elevations in metres.
 The complete data set gives global coverage. It consists of 21601 x 10801
 data values, one for each one minute of latitude and longitude for 233312401
 points.
 The data start at position 90°N, 180°W and are arranged in bands of 360
 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from
 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.*

 The problem is that it is very slow (or I am quite newbie).

 Anyone has a suggestion to get these data in a numpy array in a faster way?

 Thanks in advance.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker
On 8/3/11 9:30 AM, Kiko wrote:
 I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

I've never noticed that netCDF4 was particularly slow for reading 
(writing can be pretty slow some times). How slow is slow?

 The data are described as:

please post the results of:

ncdump -h the_file_name.nc

So we can see if there is anything odd in the structure (though I don't 
know what it might be)

Post your code (in the simnd pplest form you can).

and post your timings and machine type

Is the file netcdf4 or 3 format? (the python lib will read either)

As a reference, reading that much data in from a raw file into a numpy 
array takes 2.57 on my machine (a rather old Mac, but disks haven't 
gotten much  faster). YOu can test that like this:

a = np.zeros((21601, 10801), dtype=np.uint16)

a.tofile('temp.npa')

del a

timeit a = np.fromfile('temp.npa', dtype=np.uint16)

(using ipython's timeit)

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Gökhan Sever
Just a few extra tests on my side pushing the limits of my system memory:

In [34]: k = np.zeros((21601, 10801, 3), dtype='int16')
k  ndarray 21601x10801x3: 699937203 elems, type `int16`,
1399874406 bytes (1335 Mb)

And for the first time my memory explodes with a hard kernel crash:

In [36]: k = np.zeros((21601, 10801, 13), dtype='int16')

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531155] [ cut here ]

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531163] invalid opcode:  [#1] SMP

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531166] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531253] Stack:

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531265] Call Trace:

Message from syslogd@ccn at Aug  3 10:51:43 ...
 kernel:[48715.531332] Code: be 33 01 00 00 48 89 fb 48 c7 c7 67 31 7a 81 e8
b0 2d f1 ff e8 90 f2 33 00 48 89 df e8 86 db 00 00 48 83 bb 60 01 00 00 00
74 02 0f 0b 48 8b 83 10 02 00 00 a8 20 75 02 0f 0b a8 40 74 02 0f 0b


On Wed, Aug 3, 2011 at 10:46 AM, Gökhan Sever gokhanse...@gmail.com wrote:

 Here are my values for your comparison:

 test.nc file is about 715 MB. The details are below:

 In [21]: netCDF4.__version__
 Out[21]: '0.9.4'

 In [22]: np.__version__
 Out[22]: '2.0.0.dev-b233716'

 In [23]: from netCDF4 import Dataset

 In [24]: f = Dataset(test.nc)

 In [25]: f.variables['reflectivity'].shape
 Out[25]: (6, 18909, 506)

 In [26]: f.variables['reflectivity'].size
 Out[26]: 57407724

 In [27]: f.variables['reflectivity'][:].dtype
 Out[27]: dtype('float32')

 In [28]: timeit z = f.variables['reflectivity'][:]
 1 loops, best of 3: 731 ms per loop

 How long it takes in your side to read that big array?

 On Wed, Aug 3, 2011 at 10:30 AM, Kiko kikocorre...@gmail.com wrote:

 Hi.

 I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

 The data are described as:
 *The GEBCO gridded data set is stored in NetCDF as a one dimensional
 array of 2-byte signed integers that represent integer elevations in metres.

 The complete data set gives global coverage. It consists of 21601 x 10801
 data values, one for each one minute of latitude and longitude for 233312401
 points.
 The data start at position 90°N, 180°W and are arranged in bands of 360
 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from
 180°W longitude to 180°E longitude, i.e. the 180° value is repeated.*

 The problem is that it is very slow (or I am quite newbie).

 Anyone has a suggestion to get these data in a numpy array in a faster
 way?

 Thanks in advance.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Gökhan




-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Ian Stokes-Rees
On 8/3/11 12:50 PM, Christopher Barker wrote:
 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much faster).

2.57 seconds?  or minutes?  If seconds, does it actually read the whole
thing into memory in that time, or is there some kind of delayed read
going on?

Ian
attachment: ijstokes.vcf___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker
On 8/3/11 11:09 AM, Ian Stokes-Rees wrote:
 On 8/3/11 12:50 PM, Christopher Barker wrote:
 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much faster).

 2.57 seconds? or minutes?

sorry -- seconds.

If seconds, does it actually read the whole
 thing into memory in that time, or is there some kind of delayed read
 going on?

I think it reads it all in. However, now that you bring it up, I think 
timeit does it a few times, and after the first time, there may well 
be disk cache that speeds things up.

In fact, as I recently wrote the file, there may be disk cache issues 
even on the first read.

I'm no timing expert, but there must be ways to get a clean time.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker
On 8/3/11 9:46 AM, Gökhan Sever wrote:
 In [23]: from netCDF4 import Dataset

 In [24]: f = Dataset(test.nc http://test.nc)

 In [25]: f.variables['reflectivity'].shape
 Out[25]: (6, 18909, 506)

 In [26]: f.variables['reflectivity'].size
 Out[26]: 57407724

 In [27]: f.variables['reflectivity'][:].dtype
 Out[27]: dtype('float32')

 In [28]: timeit z = f.variables['reflectivity'][:]
 1 loops, best of 3: 731 ms per loop

that seems pretty fast, actually -- are you sure that [:] forces the 
full data read? It probably does, but I'm not totally sure.

is z a numpy array object at that point?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Gökhan Sever
This is what I get here:

In [1]: a = np.zeros((21601, 10801), dtype=np.uint16)

In [2]: a.tofile('temp.npa')

In [3]: del a

In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 251 ms per loop


On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker
chris.bar...@noaa.govwrote:

 On 8/3/11 9:30 AM, Kiko wrote:
  I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

 I've never noticed that netCDF4 was particularly slow for reading
 (writing can be pretty slow some times). How slow is slow?

  The data are described as:

 please post the results of:

 ncdump -h the_file_name.nc

 So we can see if there is anything odd in the structure (though I don't
 know what it might be)

 Post your code (in the simnd pplest form you can).

 and post your timings and machine type

 Is the file netcdf4 or 3 format? (the python lib will read either)

 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much  faster). YOu can test that like this:

 a = np.zeros((21601, 10801), dtype=np.uint16)

 a.tofile('temp.npa')

 del a

 timeit a = np.fromfile('temp.npa', dtype=np.uint16)

 (using ipython's timeit)

 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Gökhan Sever
I think these answer your questions.

In [3]: type f.variables['reflectivity']
-- type(f.variables['reflectivity'])
Out[3]: type 'netCDF4.Variable'

In [4]: type f.variables['reflectivity'][:]
-- type(f.variables['reflectivity'][:])
Out[4]: type 'numpy.ndarray'

In [5]: z = f.variables['reflectivity'][:]

In [6]: type z
-- type(z)
Out[6]: type 'numpy.ndarray'

In [10]: id f.variables['reflectivity'][:]
--- id(f.variables['reflectivity'][:])
Out[10]: 37895488

In [11]: id z
--- id(z)
Out[11]: 37901440


On Wed, Aug 3, 2011 at 12:40 PM, Christopher Barker
chris.bar...@noaa.govwrote:

 On 8/3/11 9:46 AM, Gökhan Sever wrote:
  In [23]: from netCDF4 import Dataset
 
  In [24]: f = Dataset(test.nc http://test.nc)
 
  In [25]: f.variables['reflectivity'].shape
  Out[25]: (6, 18909, 506)
 
  In [26]: f.variables['reflectivity'].size
  Out[26]: 57407724
 
  In [27]: f.variables['reflectivity'][:].dtype
  Out[27]: dtype('float32')
 
  In [28]: timeit z = f.variables['reflectivity'][:]
  1 loops, best of 3: 731 ms per loop

 that seems pretty fast, actually -- are you sure that [:] forces the
 full data read? It probably does, but I'm not totally sure.

 is z a numpy array object at that point?

 -Chris


 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker
On 8/3/11 1:57 PM, Gökhan Sever wrote:
 This is what I get here:

 In [1]: a = np.zeros((21601, 10801), dtype=np.uint16)

 In [2]: a.tofile('temp.npa')

 In [3]: del a

 In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
 1 loops, best of 3: 251 ms per loop

so that's about 10 times faster than my machine. I didn't think disks 
had gotten much faster -- they are still generally 7200 rpm (or slower 
in laptops).

So I've either got a really slow disk, or you have a really fast one (or 
both), or maybe you're getting cache effect, as you wrote the file just 
before reading it.

repeating, doing just what you did:

In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.53 s per loop

then I wrote a bunch of others to disk, and tried again:

In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.45 s per loop

so ti seems I'm not seeing cache effects, but maybe you are.

Anyway, we haven't heard from the OP -- I'm not sure what s/he thought 
was slow.

-Chris




 On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker
 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov wrote:

 On 8/3/11 9:30 AM, Kiko wrote:
   I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

 I've never noticed that netCDF4 was particularly slow for reading
 (writing can be pretty slow some times). How slow is slow?

   The data are described as:

 please post the results of:

 ncdump -h the_file_name.nc http://the_file_name.nc

 So we can see if there is anything odd in the structure (though I don't
 know what it might be)

 Post your code (in the simnd pplest form you can).

 and post your timings and machine type

 Is the file netcdf4 or 3 format? (the python lib will read either)

 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much  faster). YOu can test that like this:

 a = np.zeros((21601, 10801), dtype=np.uint16)

 a.tofile('temp.npa')

 del a

 timeit a = np.fromfile('temp.npa', dtype=np.uint16)

 (using ipython's timeit)

 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR (206) 526-6959 tel:%28206%29%20526-6959   voice
 7600 Sand Point Way NE (206) 526-6329 tel:%28206%29%20526-6329   fax
 Seattle, WA  98115 (206) 526-6317 tel:%28206%29%20526-6317   main
 reception

 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Gökhan



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Eric Firing
On 08/03/2011 11:24 AM, Gökhan Sever wrote:

 I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
 1 loops, best of 3: 263 ms per loop

You need to clear your cache and then run timeit with options -n1 -r1.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion