from:"Christopher Barker"

Re: [Numpy-discussion] 1.7.x release of NumPy

2011-09-15 Thread Christopher Barker

On 9/14/11 8:32 PM, Travis Oliphant wrote:
 but my mind is racing around NumPy 3.0 at this point.

Travis, could you clarify this point? Do you mean:

There are so many ideas my mind is racing around too much to even think 
about 3.0?

or

My mind is racing with ideas so much that I want to dive in and get 
started working on 3.0?

just wondering,

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-14 Thread Christopher Barker

On 9/14/11 1:01 PM, Christopher Barker wrote:
 numpy.ndarray.resize is a different method, and I'm pretty sure it
 should be as fast or faster that np.empty + np.append.

My profile:

In [25]: %timeit f1 # numpy.resize()
1000 loops, best of 3: 163 ns per loop

In [26]: %timeit f2 #numpy.ndarray.resize()
1000 loops, best of 3: 136 ns per loop

In [27]: %timeit f3 # numpy.empty() + append()
1000 loops, best of 3: 136 ns per loop


those last two couldn't b more identical!

(though this is an excercise in unrequired optimization!)


My test code:

#!/usr/bin/env python


test_resize

A test of various numpy re-sizing options



import numpy

def f1():
 
 numpy.resize
 

 l = 100
 a = numpy.zeros((l,))
 for i in xrange(1000):
 l += l
 a = numpy.resize(a, (l,) )

 return None

def f2():
 
 numpy.ndarray.resize
 

 l = 100
 a = numpy.zeros((l,))
 for i in xrange(1000):
 l += l
 a.resize(a, (l,) )

 return None

def f3():
 
 numpy.empty + append
 

 l = 100
 a = numpy.zeros((l,))
 for i in xrange(1000):
 b = np.empty((l,))
 a.append(b)

 return None





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-14 Thread Christopher Barker

On 9/14/11 2:41 PM, Benjamin Root wrote:
 Are you sure the f2 code works?  a.resize() takes only a shape tuple.  As
 coded, you should get an exception.

wow, what an idiot!

I think I just timed how long it takes to raise that exception...

And when I fix that, I get a memory error.

When I fix that, I find that f3() wasn't doing the right thing. What an 
astonishing lack of attention on my part!

Here it is again, working, I hope!

In [107]: %timeit f1()
10 loops, best of 3: 50.7 ms per loop

In [108]: %timeit f2()
1000 loops, best of 3: 719 us per loop

In [109]: %timeit f3()
100 loops, best of 3: 19 ms per loop


So:
numpy.resize() is the slowest
numpy.empty+ numpy.append() is faster
numpy.ndarray.resize() is the fastest

Which matches my expectations, for once!

-Chris
The code:

#!/usr/bin/env python


test_resize

A test of various numpy re-sizing options



import numpy

def f1():
 
 numpy.resize
 

 extra = 100
 l = extra
 a = numpy.zeros((l,))
 for i in xrange(100):
 l += extra
 a = numpy.resize(a, (l,) )

 return a

def f2():
 
 numpy.ndarray.resize
 

 extra = 100
 l = extra
 a = numpy.zeros((l,))
 for i in xrange(100):
 l += extra
 a.resize( (l,) )

 return a

def f3():
 
 numpy.empty + append
 

 extra = 100
 l = extra
 a = numpy.zeros((l,))
 for i in xrange(100):
 b = numpy.empty((extra,))
 a = numpy.append(a, b)
 return a

a1 = f1()
a2 = f2()
a3 = f3()

if a1.shape == a2.shape == a3.shape:
 print they are all returning the same size array
else:
 print Something is wrong!


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [ANN] Cython 0.15

2011-08-08 Thread Christopher Barker

On 8/8/11 1:21 AM, Sebastian Haase wrote:

 b) What is the status of supporting multi-type Cython functions -- ala
 C++ templates ?

You might want to take a look at what Keith Goodman has done with the 
Bottleneck project -- I think he used a generic template tool to 
generate Cython code for a variety of types from a single definition.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-04 Thread Christopher Barker

On 8/3/11 3:56 PM, Gökhan Sever wrote:
 Back to the reality. After clearing the cache using Warren's suggestion:

 In [1]: timeit -n1 -r1 a = np.fromfile('temp.npa', dtype=np.uint16)
 1 loops, best of 1: 7.23 s per loop

yup -- that cache sure can be handy!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-04 Thread Christopher Barker

On 8/4/11 10:02 AM, Christopher Barker wrote:
 On 8/4/11 8:53 AM, Jeff Whitaker wrote:
 Kiko: I think the difference may be that when you read the data with
 netcdf4-python, it tries to unpack the short integers to a float32
 array.

 Jeff, why is that? is it an netcdf4 convention? I always thought that
 the netcdf data model matched numpy's quite well, including the clear
 choice and specification of data type. I guess I've mostly used float
 data anyway, so hadn't noticed this, but ti comes as a surprise to me!

 gebco4.set_automaskandscale(False)

OK -- looked at this a bit more, and see in the OP's ncdump:

variables:
 short z(xysize) ;
 z:scale_factor = 1. ;
 z:add_offset = 0. ;
 z:node_offset = 0 ;

so I presume netCDF4 is seeing the scale_factor and offsets, and thus 
converting to float.

In this case, the scale factor is 1.0, and the offsets are 0.0, so there 
isn't any need to convert, but that may be too smart!

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice

2011-08-03 Thread Christopher Barker

On 8/2/11 1:40 PM, Jeremy Conlin wrote:

 Thanks for that information. It helps greatly in understanding what is
 happening. Next time I'll put my data into tuples.

I don't remember where they all are, but there are a few places in numpy 
where tuples and lists are interpreted differently (fancy indexing?). It 
kind of breaks python duck typing (a sequence is a sequence), but it's 
useful, too.

So when a list fails to do what you want, try a tuple.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker

On 8/3/11 9:30 AM, Kiko wrote:
 I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

I've never noticed that netCDF4 was particularly slow for reading 
(writing can be pretty slow some times). How slow is slow?

 The data are described as:

please post the results of:

ncdump -h the_file_name.nc

So we can see if there is anything odd in the structure (though I don't 
know what it might be)

Post your code (in the simnd pplest form you can).

and post your timings and machine type

Is the file netcdf4 or 3 format? (the python lib will read either)

As a reference, reading that much data in from a raw file into a numpy 
array takes 2.57 on my machine (a rather old Mac, but disks haven't 
gotten much  faster). YOu can test that like this:

a = np.zeros((21601, 10801), dtype=np.uint16)

a.tofile('temp.npa')

del a

timeit a = np.fromfile('temp.npa', dtype=np.uint16)

(using ipython's timeit)

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker

On 8/3/11 11:09 AM, Ian Stokes-Rees wrote:
 On 8/3/11 12:50 PM, Christopher Barker wrote:
 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much faster).

 2.57 seconds? or minutes?

sorry -- seconds.

If seconds, does it actually read the whole
 thing into memory in that time, or is there some kind of delayed read
 going on?

I think it reads it all in. However, now that you bring it up, I think 
timeit does it a few times, and after the first time, there may well 
be disk cache that speeds things up.

In fact, as I recently wrote the file, there may be disk cache issues 
even on the first read.

I'm no timing expert, but there must be ways to get a clean time.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker

On 8/3/11 9:46 AM, Gökhan Sever wrote:
 In [23]: from netCDF4 import Dataset

 In [24]: f = Dataset(test.nc http://test.nc)

 In [25]: f.variables['reflectivity'].shape
 Out[25]: (6, 18909, 506)

 In [26]: f.variables['reflectivity'].size
 Out[26]: 57407724

 In [27]: f.variables['reflectivity'][:].dtype
 Out[27]: dtype('float32')

 In [28]: timeit z = f.variables['reflectivity'][:]
 1 loops, best of 3: 731 ms per loop

that seems pretty fast, actually -- are you sure that [:] forces the 
full data read? It probably does, but I'm not totally sure.

is z a numpy array object at that point?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

2011-08-03 Thread Christopher Barker

On 8/3/11 1:57 PM, Gökhan Sever wrote:
 This is what I get here:

 In [1]: a = np.zeros((21601, 10801), dtype=np.uint16)

 In [2]: a.tofile('temp.npa')

 In [3]: del a

 In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
 1 loops, best of 3: 251 ms per loop

so that's about 10 times faster than my machine. I didn't think disks 
had gotten much faster -- they are still generally 7200 rpm (or slower 
in laptops).

So I've either got a really slow disk, or you have a really fast one (or 
both), or maybe you're getting cache effect, as you wrote the file just 
before reading it.

repeating, doing just what you did:

In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.53 s per loop

then I wrote a bunch of others to disk, and tried again:

In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.45 s per loop

so ti seems I'm not seeing cache effects, but maybe you are.

Anyway, we haven't heard from the OP -- I'm not sure what s/he thought 
was slow.

-Chris




 On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker
 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov wrote:

 On 8/3/11 9:30 AM, Kiko wrote:
   I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.

 I've never noticed that netCDF4 was particularly slow for reading
 (writing can be pretty slow some times). How slow is slow?

   The data are described as:

 please post the results of:

 ncdump -h the_file_name.nc http://the_file_name.nc

 So we can see if there is anything odd in the structure (though I don't
 know what it might be)

 Post your code (in the simnd pplest form you can).

 and post your timings and machine type

 Is the file netcdf4 or 3 format? (the python lib will read either)

 As a reference, reading that much data in from a raw file into a numpy
 array takes 2.57 on my machine (a rather old Mac, but disks haven't
 gotten much  faster). YOu can test that like this:

 a = np.zeros((21601, 10801), dtype=np.uint16)

 a.tofile('temp.npa')

 del a

 timeit a = np.fromfile('temp.npa', dtype=np.uint16)

 (using ipython's timeit)

 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR (206) 526-6959 tel:%28206%29%20526-6959   voice
 7600 Sand Point Way NE (206) 526-6329 tel:%28206%29%20526-6329   fax
 Seattle, WA  98115 (206) 526-6317 tel:%28206%29%20526-6317   main
 reception

 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Gökhan



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice

2011-08-02 Thread Christopher Barker

On 8/2/11 8:38 AM, Jeremy Conlin wrote:
 Thanks, Brett. Using StringIO and numpy.loadtxt worked great. I'm
 still curious why what I was doing didn't work. Everything I can see
 indicates it should work.

In [11]: tfc_dtype
Out[11]: dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')])

In [15]: n = numpy.fromstring(l, dtype=tfc_dtype, sep=' ')
---
ValueErrorTraceback (most recent call last)

/Users/cbarker/ipython console in module()

ValueError: don't know how to read character strings with that array type

means just what it says. In theory, numpy.fromstring() (and fromfile() ) 
provides a way to quickly and efficiently generate arrays from text, but 
it practice, the code is quite limited (and has a bug or two). I don't 
think anyone has gotten around to writing the code to use structured 
dtypes with it -- so it can't do what you want (rational though that 
expectation is)

In [21]: words
Out[21]: ['32000', '7.89131E-01', '8.05999E-03', '3.88222E+03']

In [22]: p =
Display all 249 possibilities? (y or n)

In [22]: p = numpy.array(words, dtype=tfc_dtype)

In [23]: p
Out[23]:
array([(3689064028291727360L, 0.0, 0.0, 0.0),
(3976177339304456517L, 4.967820413490985e-91, 0.0, 0.0),
(4048226120204106053L, 4.970217431784588e-91, 0.0, 0.0),
(3687946958874489413L, 1.1572189237420885e-100, 0.0, 0.0)],
   dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')])

similarly here -- converting from text to structured dtypes is not fully 
supported

In [29]: a
Out[29]: [32000, 0.789131, 0.00805999, 3882.22]

In [30]: r = numpy.array(a)

In [31]: r
Out[31]:
array([  3.2000e+04,   7.89131000e-01,   8.05999000e-03,
  3.88222000e+03])

sure -- numpy's default behavior is to find a dtype that will hold all 
the input array -- this pre-dates structured dtypes, and probably what 
you would want b default anyway.

In [32]: s = numpy.array(a, dtype=tfc_dtype)
---
TypeError Traceback (most recent call last)

/Users/cbarker/ipython console in module()

TypeError: expected a readable buffer object

OK -- I can see why you'd expect that to work. However, the trick with 
structured dtypes is that the dimensionality of the inputs can be less 
than obvious -- you are passing in a 1-d list of 4 numbers -- do you 
want a 1-d array? or ? -- in this case, it's pretty obvious (as a human) 
what you would want -- you have a dtype with four fields, and you're 
passing in four numbers, but there are so many possible combinations 
that numpy doesn't try to be smart about it. So as a rule, you need to 
be quite specific when working with structured dtypes.

However, the default is for numpy to map tuples to dtypes, so if you 
pass in a tuple instead, it works:

In [34]: t = tuple(a)

In [35]: s = numpy.array(t, dtype=tfc_dtype)

In [36]: s
Out[36]:
array((32000L, 0.789131, 0.00805999, 3882.22),
   dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')])

you were THIS close!

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Segmentation Fault in Numpy.test()

2011-08-02 Thread Christopher Barker

On 8/2/11 10:14 AM, Thomas Markovich wrote:

 Just to recap, the issue appears to stem from using the scipy superpack
 with python 2.7 from python.org http://python.org. This was solved by
 using the apple python along with the scipy superpack.

This sure sounds like a bug in the sciy superpack installer -- if it was 
build for the system python2.6, it should not get installed into 2.7.

Unless you did something to force that.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] recommendation for saving data

2011-08-01 Thread Christopher Barker

On 7/31/11 5:48 AM, Brian Blais wrote:
 I was wondering if there are any recommendations for formats for saving 
 scientific data.

every field has it's own standards -- I'd try to find one that is likely 
to be used by folks that may care about your results.

For Oceanographic and Atmospheric modeling data, netcdf is a good 
option. I like the NetCDF4 python lib:

http://code.google.com/p/netcdf4-python/

(there are others)

For broader use, and a bit more flexibility, HDF is a good option. There 
are at least two ways to use it with numpy:

PyTables: http://www.pytables.org

(Nice higher-level interface)

hf5py:
http://alfven.org/wp/hdf5-for-python/

(a more raw HDF5 wrapper)

There is also the npz format, built in to numpy, if you are happy with 
requiring python to read the data.

-Chris


  I am running a simulation, which has many somewhat-indepedent parts 
which have their own internal state and parameters.  I've been using 
pickle (gzipped) to save the entire object (which contains subobjects, 
etc...), but it is getting too unwieldy and I think it is time to look 
for a more robust solution.  Ideally I'd like to have something where I 
can call a save method on the simulation object, and it will call the 
save methods on all the children, on down the line all saving into one 
file.  It'd also be nice if it were cross-platform, and I could depend 
on the files being readable into the future for a while.

 Are there any good standards for this?  What do you use for saving scientific 
 data?


   thank you,

   Brian Blais





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems with numpy binary for Python2.7 + OS X 10.6

2011-07-26 Thread Christopher Barker

On 7/25/11 1:00 PM, Ian Stokes-Rees wrote:
 As best I can tell, I have Python 2.7.2 for my system Python:

 [ijstokes@moose ~]$ python -V
 Python 2.7.2

 [ijstokes@moose ~]$ which python
 /Library/Frameworks/Python.framework/Versions/2.7/bin/python

yup -- that is probably the python.org python.

However, there are now two different builds of 2.7 for OS-X the 10.3 
one, which is 32 bit, PPC+Intel, 10.3.9 and above, and the 10.6 build, 
which is 32bit_64bit, Intel only, and only runs on 10.6 and above.

 however when I attempt to install the recent numpy binary
 python-2.7.2-macosx10.6.dmg I get stopped at the first stage of the
 install procedure with the error:

 numpy 1.6.1 can't be installed on this disk.  numpy requires System
 Python 2.7 to install.

You need a numpy build that matches the python build, I suspect you have 
a mis-match. Check carefully which ones you downloaded and make sure 
they match.

  Is it looking for
 /usr/bin/python2.7?  For that, I only have up to 2.6 available. (and 2.5)

No. The System Python term is a misnomer that somehow has never gotten 
fixed in the installer -- what it is really looking for is the 
python.org build.

HTH,
-Chris


 Cheers,

 Ian
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array vectorization in numpy

2011-07-19 Thread Christopher Barker

Carlos Becker wrote:
 Besides the matlab/numpy comparison, I think that there is an inherent 
 problem with how expressions are handled, in terms of efficiency.
 For instance, k = (m - 0.5)*0.3 takes 52msec average here (2000x2000 
 array), while k = (m - 0.5)*0.3*0.2 takes 0.079, and k = (m - 
 0.5)*0.3*0.2*0.1 takes 101msec.
 Placing parentheses around the scalar multipliers shows that it seems to 
 have to do with how expressions are handled, is there sometihng that can 
 be done about this so that numpy can deal with expressions rather than 
 single operations chained by python itself?


well, it is Python, and Python itself does not know anything about array 
math -- so you need to be careful to do that correctly yourself. Python 
aside, understanding how parentheses effect computation, even for 
algebraically equal expressions is a good thing to understand.

Aside from issues with scalars, a Python expression like:

a = a * b * c

does:
   multiply a and b and put it in a temporary
   multiply that temporary by c and put that in a temporary
   assign the final temporary to a

so it does, in fact, create two unnecessary temporaries for this 
simple expression.

If you are concerned about performance, there are ways to control that. 
in-place operators is one:

a *= b
a *= c

will not create any temporaries, and will probably be faster.

It still does two loops through the data, though. If your arrays are too 
big to fit in cache, that could effect performance. To get around that 
you need to get fancy. One option is numexpr:

http://code.google.com/p/numexpr/

numexpr takes an entire expression as input, and can thus optimize some 
of this at the expression level, make good use of cache, etc. -- it's 
pretty cool.

There are a few other options, including weave, of course.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread Christopher Barker

Christopher Jordan-Squire wrote:
 Here's a short-ish summary of the topics discussed in the conference 
 call this afternoon.

Thanks, this is great! And thanks to all who participated in the call.

 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], 
 [IGNORE, 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] 
 ].

whoooa!

I actually have been looking for, and thinking about jagged arrays a 
fair bit lately, so this is kind of exciting, but this looks like a bad 
idea to me. The above indicates that:

a = np.array( [ [1, 2, np.IGNORE],
 [np.IGNORE, 3, 4] ]

a[:,1] would yield:

array([2, 4])

which seems really wrong -- you've tossed out the location information 
altogether.

( think it should be: array([2, 3])


I could see a jagged array being represented by IGNOREs all at the END 
of each row, but putting items in the middle, and shifting things to the 
left strikes me as a plain old bad idea (and a pain to implement)

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread Christopher Barker

Dag Sverre Seljebotn wrote:
 Here's an HPC perspective...:

 At least I feel that the transparency of NumPy is a huge part of its 
 current success. Many more than me spend half their time in C/Fortran 
 and half their time in Python.

Absolutely -- and this point has been raised a couple times in the 
discussion, so I hope it is not forgotten.

   I tend to look at NumPy this way: Assuming you have some data in memory
 (possibly loaded by a C or Fortran library). (Almost) no matter how it 
 is allocated, ordered, packed, aligned -- there's a way to find strides 
 and dtypes to put a nice NumPy wrapper around it and use the memory from 
 Python.

and vice-versa -- Assuming you have some data in numpy arrays, there's a 
way to process it with a C or Fortran library without copying the data.

And this is where I am skeptical of the bit-pattern idea -- while one 
can expect C and fortran and GPU, and ??? to understand NaNs for 
floating point data, is there any support in compilers or hardware for 
special bit patterns for NA values to integers? I've never seen in my 
(very limited experience).

Maybe having the mask option, too, will make that irrelevant, but I want 
to be clear about that kind of use case.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] using the same vocabulary for missing value ideas

2011-07-06 Thread Christopher Barker

Mark Wiebe wrote:
 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any 
 combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and 
 IGNORE as mask are reasonable.

Is this really true? if you use a bitpattern for IGNORE, haven't you 
just lost the ability to get the original value back if you want to stop 
ignoring it? Maybe that's not inherent to what an IGNORE means, but it 
seems pretty key to me.

-Chris








-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread Christopher Barker

Christopher Jordan-Squire wrote:
 If we follow those rules for IGNORE for all computations, we sometimes 
 get some weird output. For example:
 [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix 
 multiply and not * with broadcasting.) Or should that sort of operation 
 through an error?

That should throw an error -- matrix computation is heavily influenced 
by the shape and size of matrices, so I think IGNORES really don't make 
sense there.


Nathaniel Smith wrote:
 It's exactly this transparency that worries Matthew and me -- we feel
 that the alterNEP preserves it, and the NEP attempts to erase it. In
 the NEP, there are two totally different underlying data structures,
 but this difference is blurred at the Python level. The idea is that
 you shouldn't have to think about which you have, but if you work with
 C/Fortran, then of course you do have to be constantly aware of the
 underlying implementation anyway. 

I don't think this bothers me -- I think it's analogous to things in 
numpy like Fortran order and non-contiguous arrays -- you can ignore all 
that when working in pure python when performance isn't critical, but 
you need a deeper understanding if you want to work with the data in C 
or Fortran or to tune performance in python.

So as long as there is an API to query and control how things work, I 
like that it's hidden from simple python code.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing/accumulating data

2011-07-05 Thread Christopher Barker

Mark Wiebe wrote:
 Speaking of which, would we make the NA value be false?
 
 For booleans, it works out like this:
 
 http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic

That's pretty cool!

 In R, trying to test the truth value of NA (if (NA) ...) raises an 
 exception. Adopting this behavior seems reasonable to me.

I'm not so sure. the other president is Python, where None is 
interpreted as False.

In general, in non-numpy code, I use None to mean not set yet or I'm 
not sure, or, whatever. It's pretty useful to have it be false.

However, I also do:

if x is not None:

rather than-

if x:

so as to be unambiguous about what I'm testing for (and because if x == 
0, I don't want the test to fail), so I guess:

if arr[i] is np.NA:

would be perfectly analogous.

-Chris






 -Mark
  
 
 
 -Chris
 
 
 --
 Christopher Barker, Ph.D.
 Oceanographer
 
 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959 tel:%28206%29%20526-6959  
 voice
 7600 Sand Point Way NE   (206) 526-6329 tel:%28206%29%20526-6329   fax
 Seattle, WA  98115   (206) 526-6317 tel:%28206%29%20526-6317  
 main reception
 
 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] suggestions on optimising a special matrix reshape

2011-07-05 Thread Christopher Barker

qu...@gmx.at wrote:
 i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N)
 into betam like (16**N, 16**N) following:
 
 betam = np.zeros((16**N,16**N), dtype = complex)
 for k in xrange(16**N):
 ind1 = np.mod(k,4**N)
 ind2 = k/4**N
 for l in xrange(16**N):
 betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2]
 
 is there a smarter/faster way of getting the above done?

no time to check if this is what you want, but is this it?

a = np.arange((4**(4*N))).reshape(4**N,4**N,4**N,4**N)

b = a.reshape((16**N, 16**N))

If that doesn't do it right, you may be able to mess with the strides, 
etc. do some googling, and check out:

numpy.lib.stride_tricks

-Chris



 for N=2, that already takes 0.5 seconds but i intend to use it
 for N=3 and N=4 ...
 
 thanks for your input,
 q
 


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-01 Thread Christopher Barker

Matthew Brett wrote:
 should raise an error.  On the other hand, if I make a normal array:
 
 arr = np.array([1.0, 2.0, 7.0])
 
 and then do this:
 
 arr.visible[2] = False
 
 then either I should raise an error (it's not a masked array), or,
 more magically, construct a mask on the fly. 

maybe it's too much Magic, but it seems reasonable to me that for an 
array without a mask, arr.visible[i] is simply True for all values of i 
-- no need to create a mask to determine that.

does arr[i] = np.IGNORE

auto-create a mask if there is not one there already? I think it should.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing/accumulating data

2011-07-01 Thread Christopher Barker

Joe Harrington wrote:
  All
 that has to happen is to allow the sense of the mask to be FALSE = the
 data are bad, TRUE = the data are good, and allow (not require) the
 mask to be of any numerical type, or at least of integer type as well
 as boolean. 

quick note on this: I like the FALSE == good way, because:

instead of good and bad we think masked and unmasked, then we have:

False = unmasked = regular old data
True = masked = something special about the data

The default for something special is bad (or missing , or 
ignore), but the cool thing is that if you use an int:

0 = unmasked
1 = masked because of one thing
2 = masked because of another
etc., etc.

This could be pretty powerful

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] feedback request: proposal to add masks to the core ndarray

2011-06-24 Thread Christopher Barker

Nathaniel Smith wrote:
 The 'dtype factory' idea builds on the way I've structured datetime as a
 parameterized type,

...

Another disadvantage is that we get further from Gael Varoquaux's point: 
  Right now, the numpy array can be seen as an extension of the C
 array, basically a pointer, a data type, and a shape (and strides).
  This enables easy sharing with libraries that have not been
 written with numpy in mind.

and also PEP 3118 support

It is very useful that a numpy array has a pointer to a regular old C 
array -- if we introduce this special dtype, that will break (well, not 
really, put the the c array would be of this particular struct). 
Granted, any other C code would properly have to do something with the 
mask anyway, but I still think it'd be better to keep that raw data 
array standard.

This applies to switching between masked and not-masked numpy arrays 
also -- I don't think I'd want the performance hot of that requiring a 
data copy.

Also the idea was posted here that you could use views to have the same 
data set with different masks -- that would break as well.

Nathaniel Smith wrote:

 If we think that the memory overhead for floating point types is too
 high, it would be easy to add a special case where maybe(float) used a
 distinguished NaN instead of a separate boolean.

That would  be pretty cool, though in the past folks have made a good 
argument that even for floats, masks have significant advantages over 
just using NaN. One might be that you can mask and unmask a value for 
different operations, without losing the value.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] feedback request: proposal to add masks to the core ndarray

2011-06-24 Thread Christopher Barker

Robert Kern wrote:

 It's worth noting that this is not a replacement for masked arrays,
 nor is it intended to be the be-all, end-all solution to missing data
 problems. It's mostly just intended to be a focused tool to fill in
 the gaps where masked arrays are less convenient for whatever reason;

I think we have enough problems with ndarray vs numpy.ma -- this is 
introducing a third option? IMHO, and from the discussion, it seems this 
proposal should be a uniter, not a divider.

While masked arrays may still exist, it would be nice if they were an 
extension of the new built-in thingie, not yet another implementation.

  Not every dtype
 would have an NA-aware counterpart.

One of the great things about numpy is the full range of data types. I 
think it would be surprising and frustrating not to have masked versions 
of them all.

By the way, what might be the performance hit of a new dtype -- 
wouldn't we lose all sort of opportunities for the compiler and hardware 
to optimize? I can only image an if statement with every single 
computation. But maybe that isn't any more of a hit that a separate mask.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] feedback request: proposal to add masks to the core ndarray

2011-06-23 Thread Christopher Barker

Nathaniel Smith wrote:
 IME
 it's important for the default behavior to be to fail loudly when
 things are wonky, not to silently patch them up, possibly
 incorrectly!)

+1 for default behavior to fail, raise errors, or return unknown 
values is these situations.

Though that behavior should be over-ridable with a flag or addition 
functions (like nan-sum)

Robert Kern wrote:
 How your proposal
 improves on the status quo, i.e. numpy.ma. 

I've thought for years that masked arrays should be better integrated 
with numpy -- it seems there is a lot of code duplication and upkeep 
required to keep masked arrays working as much like regular arrays as 
possible. Particularly having third-party code work with them.

And I'm not sure I can explain why, but I find myself hardly ever using 
masked arrays -- it just feels like added complication to use them -- I 
tend to work with floats and use NaN instead, even though it's often not 
as good an option.

Gael Varoquaux wrote:
 Right now, the numpy array can be seen as an extension of the C array,
 basically a pointer, a data type, and a shape (and strides). This enables
 easy sharing with libraries that have not been written with numpy in
 mind.

Isn't that what the PEP 3118 extended buffer protocol is supposed to be for?

Anyway, good stuff!

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast numpy i/o

2011-06-21 Thread Christopher Barker

Neal Becker wrote:
 I'm wondering what are good choices for fast numpy array serialization?
 
 mmap: fast, but I guess not self-describing?
 hdf5: ?

Should be pretty fast, and self describing -- advantage of being a 
standard. Disadvantage is that it requires an hdf5 library, which can b 
a pain to install on some systems.


 pickle: self-describing, but maybe not fast?
 others?

there is .tofile() and .fromfile() -- should be about as fast as you can 
get, not self-describing.

Then there is .save(), savez() and .load() (.npz format)? It should be 
pretty fast, and self-describing (but not a standard outside of numpy).

I doubt pickle will ever be your best bet.


-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast numpy i/o

2011-06-21 Thread Christopher Barker

Neal Becker wrote:
 I'm wondering what are good choices for fast numpy array serialization?

 mmap: fast, but I guess not self-describing?
 hdf5: ?
 pickle: self-describing, but maybe not fast?
 others?
 
 I think, in addition, that hdf5 is the only one that easily interoperates 
 with 
 matlab?

netcdf is another option if you want an open standard supported by 
Matlab and the like. I like the netCDF4 package:

http://code.google.com/p/netcdf4-python/

Though is can be kind of slow to write (I have no idea why!)

plain old tofile() should be readable by other tools as well, as long as 
you have a way to specify what is in the file.

 speaking of hdf5, I see:
 
 pyhdf5io  0.7 - Python module containing high-level hdf5 load and save 
 functions.
 h5py  2.0.0 - Read and write HDF5 files from Python

There is also pytables, which uses HDF5 under the hood.


-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast numpy i/o

2011-06-21 Thread Christopher Barker

Robert Kern wrote:
 https://raw.github.com/numpy/numpy/master/doc/neps/npy-format.txt

Just a note. From that doc:


 HDF5 is a complicated format that more or less implements
 a hierarchical filesystem-in-a-file.  This fact makes satisfying
 some of the Requirements difficult.  To the author's knowledge, as
 of this writing, there is no application or library that reads or
 writes even a subset of HDF5 files that does not use the canonical
 libhdf5 implementation.


I'm pretty sure that the NetcdfJava libs, developed by Unidata, use 
their own home-grown code. netcdf4 is built on HDF5, so that qualifies 
as a library that reads or writes a subset of HDF5 files. Perhaps 
there are lessons to be learned there. (too bad it's Java)


 Furthermore, by
 providing the first non-libhdf5 implementation of HDF5, we would
 be able to encourage more adoption of simple HDF5 in applications
 where it was previously infeasible because of the size of the
 library.


I suppose this point is still true -- a C lib that supported a subset of 
hdf would be nice.

That being said, I like the simplicity of the .npy format, and I don't 
know that anyone wants to take any of this on anyway.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast grayscale conversion

2011-06-20 Thread Christopher Barker

Alex Flint wrote:
 Thanks, that's helpful. I'm now getting comparable times on a different 
 machine, it must be something else slowing down my machine more 
 generally, not just numpy.

you also might want to get a bit fancier than simply scaling linearly 
R,G, and B don't necessarily all contribute equally to our sense of 
whiteness

For instance, PIL uses:


When from a colour image to black and white, the library uses the ITU-R 
601-2 luma transform:

 L = R * 299/1000 + G * 587/1000 + B * 114/1000


which would be easy enough to do with numpy.

-Chris


 
 On Mon, Jun 20, 2011 at 5:11 PM, Eric Firing efir...@hawaii.edu 
 mailto:efir...@hawaii.edu wrote:
 
 On 06/20/2011 10:41 AM, Zachary Pincus wrote:
   You could try:
   src_mono = src_rgb.astype(float).sum(axis=-1) / 3.
  
   But that speed does seem slow. Here are the relevant timings on
 my machine (a recent MacBook Pro) for a 3.1-megapixel-size array:
   In [16]: a = numpy.empty((2048, 1536, 3), dtype=numpy.uint8)
  
   In [17]: timeit numpy.dot(a.astype(float), numpy.ones(3)/3.)
   10 loops, best of 3: 116 ms per loop
  
   In [18]: timeit a.astype(float).sum(axis=-1)/3.
   10 loops, best of 3: 85.3 ms per loop
  
   In [19]: timeit a.astype(float)
   10 loops, best of 3: 23.3 ms per loop
  
  
 
 On my slower machine (older laptop, core2 duo), you can speed it up
 more:
 
 In [3]: timeit a.astype(float).sum(axis=-1)/3.0
 1 loops, best of 3: 235 ms per loop
 
 In [5]: timeit b = a.astype(float).sum(axis=-1); b /= 3.0
 1 loops, best of 3: 181 ms per loop
 
 In [7]: timeit b = a.astype(np.float32).sum(axis=-1); b /= 3.0
 10 loops, best of 3: 148 ms per loop
 
 If you really want float64, it is still faster to do the first operation
 with single precision:
 
 In [8]: timeit b = a.astype(np.float32).sum(axis=-1).astype(np.float64);
 b /= 3.0
 10 loops, best of 3: 163 ms per loop
 
 Eric
 
 
  
  
   On Jun 20, 2011, at 4:15 PM, Alex Flint wrote:
  
   At the moment I'm using numpy.dot to convert a WxHx3 RGB image
 to a grayscale image:
  
   src_mono = np.dot(src_rgb.astype(np.float), np.ones(3)/3.);
  
   This seems quite slow though (several seconds for a 3 megapixel
 image) - is there a more specialized routine better suited to this?
  
   Cheers,
   Alex
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] get a dtypes' range?

2011-06-17 Thread Christopher Barker

Hi all,

I'm wondering if there is a way to get the range of values a given dtype 
can hold?

essentially, I'm looking for something like sys.maxint, for for whatever 
numpy dtype I have n hand (and eps for floating point types).

I was hoping there would be something like

a_dtype.max_value
a_dtype.min_value

etc.

In this case, I have a uint16, so I can hard code it, but it would be 
nice to be able to be write the code in a more generic fashion.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] get a dtypes' range?

2011-06-17 Thread Christopher Barker

Fabrice Silva wrote:
 I'm wondering if there is a way to get the range of values a given dtype 
 can hold?

 http://docs.scipy.org/doc/numpy/reference/routines.dtype.html#data-type-information

yup, that does it, I hadn't found that, as I was expecting a more OO 
way, i.e. as a method of the dtype.

Also, I need to know in advance if it's a int or a float, but I suppose 
you can't really do much without knowing that anyway.

Actually, I'm a bit confused about dtypes from an OO design perspective 
anyway. I note that the dtypes seem to have all (most?) of the methods 
of ndarrays (or placeholders, anyway), which I don't quite get.

oh well, I've found what I need, thanks.

-Chris







-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] get a dtypes' range?

2011-06-17 Thread Christopher Barker

Robert Kern wrote:
 On Fri, Jun 17, 2011 at 13:27, Christopher Barker chris.bar...@noaa.gov 
 wrote:
 
 Actually, I'm a bit confused about dtypes from an OO design perspective
 anyway. I note that the dtypes seem to have all (most?) of the methods
 of ndarrays (or placeholders, anyway), which I don't quite get.
 
 No, they don't.
|33 d = np.dtype(float)

[~]
|34 dir(d)...

 The numpy *scalar* types do, 

OK -- that is, indeed, what I observed, from:

dir(np.uint16)

...

 because they are actual Python types just
 like any other. They provide the *unbound* methods (i.e. you cannot
 call them) that will be bound when an instance of it gets created. And
 the scalar types have many of the same methods as ndarray to allow
 some ability to write generic functions that will work with either
 arrays or scalars.

That's handy.

I guess I'm confused about the difference between:

np.dtype(uint16)

and

np.uint16

since I always pass in the later when a dtype is called for.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker

NOTE: I'm only taking part in this discussion because it's interesting 
and I hope to learn something. I do hope the OP chimes back in to 
clarify his needs, but in the meantime...




Bruce Southey wrote:

Remember that is what the OP wanted to do, not me.


Actually, I don't think that's what the OP wanted -- I think we have a 
conflict between the need for concrete examples, and the desire to find 
a generic solution, so I think this is what the OP wants:



How to best multiprocess a _generic_ operation that needs to be 
performed on a lot of arrays. Something like:



output = []
for a in a_bunch_of_arrays:
  output.append( a_function(a) )


More specifically, a_function() is an inner product, *defined by the user*.

So there is no way to optimize the inner product itself (that will be up 
to the user), nor any way to generally convert the bunch_of_arrays to a 
single array with a single higher-dimensional operation.


In testing his approach, the OP used a numpy multiply, and a simple, 
loop-through-the elements multiply, and found that with his 
multiprocessing calls, the simple loop was a fair bit faster with two 
processors, but that the numpy one was slower with two processors. Of 
course, the looping method was much, much, slower than the numpy one in 
any case.


So Sturla's comments are probably right on:

Sturla Molden wrote:


innerProductList = pool.map(myutil.numpy_inner_product, arrayList)

1.  Here we potentially have a case of false sharing and/or mutex 
contention, as the work is too fine grained.  pool.map does not do any 
load balancing. If pool.map is to scale nicely, each work item must take 
a substantial amount of time. I suspect this is the main issue.


2. There is also the question of when the process pool is spawned. 
Though I haven't checked, I suspect it happens prior to calling 
pool.map. But if it does not, this is a factor as well, particularly on 
Windows (less so on Linux and Apple).


It didn't work well on my Mac, so ti's either not an issue, or not 
Windows-specific, anyway.


3.  arrayList is serialised by pickling, which has a significan 
overhead.  It's not shared memory either, as the OP's code implies, but 
the main thing is the slowness of cPickle.


I'll bet this is a big issue, and one I'm curious about how to address, 
I have another problem where I need to multi-process, and I'd love to 
know a way to pass data to the other process and back *without* going 
through pickle. maybe memmapped files?



IPs = N.array(innerProductList)

4.  numpy.array is a very slow function. The benchmark should preferably 
not include this overhead.


I re-ran, moving that out of the timing loop, and, indeed, it helped a 
lot, but it still takes longer with the multi-processing.


I suspect that the overhead of pickling, etc. is overwhelming the 
operation itself. That and the load balancing issue that I don't understand!


To test this, I did a little experiment -- creating a fake operation, 
one that simply returns an element from the input array -- so it should 
take next to no time, and we can time the overhead of the pickling, etc:


$ python shared_mem.py

Using 2 processes
No shared memory, numpy array multiplication took 0.124427080154 seconds
Shared memory, numpy array multiplication took 0.586215019226 seconds

No shared memory, fake array multiplication took 0.000391006469727 seconds
Shared memory, fake array multiplication took 0.54935503006 seconds

No shared memory, my array multiplication took 23.5055780411 seconds
Shared memory, my array multiplication took 13.0932741165 seconds

Bingo!

The overhead of the multi-processing takes about .54 seconds, which 
explains the slowdown for the numpy method


not so mysterious after all.

Bruce Southey wrote:

But if everything is *single-threaded* and 
thread-safe, then you just create a function and use Anne's very useful 
handythread.py (http://www.scipy.org/Cookbook/Multithreading).


This may be worth a try -- though the GIL could well get in the way.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation.


yup -- clearly the case here. I wonder if it's just array size though -- 
won't cPickle time scale with array size? So it may not be size pe-se, 
but rather how much computation you need for a given size array.


-Chris

[I've enclosed the OP's slightly altered code]




--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov


myutil.py
Description: application/python


shared_mem.py
Description: application/python
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker

Sturla Molden wrote:
 On Windows this sucks, because there is no fork system call.

Darn -- I do need to support Windows.

 Here we are 
 stuck with multiprocessing and pickle, even if we use shared memory.

What do you need to pickle if you're using shared memory?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker

Sturla Molden wrote:
 Den 16.06.2011 19:55, skrev Sturla Molden:
 The meta-data for the ndarray (strides, shape, dtype, etc.) and the name
 of the shared memory segment. As there is no fork, we must tell the
 other process where to find the shared memory and what to do with it.

got it, thanks.

 Which by the way is what the shared memory arrays I and Gaël made will do,
 but we still have the annoying pickle overhead.

Do you have a pointer to that code? It sounds handy.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-15 Thread Christopher Barker

Sturla Molden wrote:
 IMHO, trying to beat Intel or AMD performance library developers with 
 Python, NumPy and multiprocessing is just silly. Nothing we do with 
 array operator * and np.sum is ever going to compare with BLAS functions 
 from these libraries.

I think the issue got confused -- the OP was not looking to speed up a 
matrix multiply, but rather to speed up a whole bunch of independent 
matrix multiplies.

 Sometimes we need a little bit more course-grained parallelism. Then 
 it's time to think about Python threads and releasing the GIL or use 
 OpenMP with C or Fortran.
 
 multiprocessing is the last tool to think about. It is mostly 
 approproate for 'embarassingly parallel' paradigms, and certainly not 
 the tool for parallel matrix multiplication.

I think this was, in fact, an embarrassingly parallel example. But when 
the OP put it on two processors, it was slower than one -- hence his 
question.

I got the same result on my machine as well.

I'm not sure he tried python threads, that may be worth a shot.

It would also would be great if someone that actually understands this 
stuff could look at his code and explain why the slowdown occurs (hint, 
hint!)

-CHB





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram: upper range bin

2011-06-13 Thread Christopher Barker

Peter Butterworth wrote:
 Consistent bin width is important for my applications. With floating
 point numbers I usually shift my bins by a small offset to ensure
 values at bin edges always fall in the correct bin.
 With the current np.histogram behavior you _silently_ get a wrong
 count in the top bin if a value falls on the upper bin limit.
 Incidentally this happens by default with integers. ex: x=range(4);
 np.histogram(x)

Again, the trick here is that histogram really is for floating point 
numbers, not integers.


  Likely I will test for the following condition when using
  np.histogram(x):

  max(x) == top bin limit

from the docstring:


range : (float, float), optional
 The lower and upper range of the bins.  If not provided, range
 is simply ``(a.min(), a.max())``.


So, in fact, if you don't specify the range, the top of the largest bin 
will ALWAYS be the max value in your data. You seem to be advocating 
that that value not be included in any bins -- which would surely not be 
a good option.

  With the current np.histogram behavior you _silently_ get a wrong
  count in the top bin if a value falls on the upper bin limit.

It is not a wrong count -- it is absolutely correct. I think the issue 
here is that you are binning integers, which is really a categorical 
binning, not a value-based one. If you want categorical binning, there 
are other ways to do that.

I suppose having np.histogram do something different if the input is 
integers might make sense, but that's probably not worth it.

By the way, what would you have it do it you had integers, but a large 
number, over a large range of values:

In [68]: x = numpy.random.randint(0, 1, size=(100,) )

In [70]: np.histogram(x)
Out[70]:
(array([10, 10, 12, 16, 12,  9, 11,  4,  9,  7]),
  array([   712131.,  10437707.4   ,  20163283.8   ,
 2960.2   ,  39614436.6   ,  49340013.,
 59065589.4001,  68791165.8   ,  78516742.2   ,
 88242318.6001,  97967895.]))






 I guess it is better to always specify the correct range, but wouldn't
 it be preferable if the function provided a
 warning when this case occurs ?
 

 
 ---
 Re: [Numpy-discussion] np.histogram: upper range bin
 Christopher Barker
 Thu, 02 Jun 2011 09:19:16 -0700
 
 Peter Butterworth wrote:
 in np.histogram the top-most bin edge is inclusive of the upper range
 limit. As documented in the docstring (see below) this is actually the
 expected behavior, but this can lead to some weird enough results:

 In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3)
 Out[72]: (array([1, 1, 2]), array([ 1.,  2.,  3., 4.]))

 Is there any way round this or an alternative implementation without
 this issue ?
 
 The way around it is what you've identified -- making sure your bins are
 right. But I think the current behavior is the way it should be. It
 keeps folks from inadvertently loosing stuff off the end -- the lower
 end is inclusive, so the upper end should be too. In the middle bins,
 one has to make an arbitrary cut-off, and put the values on the line
 somewhere.
 
 
 One thing to keep in mind is that, in general, histogram is designed for
 floating point numbers, not just integers -- counting integers can be
 accomplished other ways, if that's what you really want (see
 np.bincount). But back to your example:
 
   In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3)
 
 Why do you want only 3 bins here? using 4 gives you what you want. If
 you want more control, then it seems you really want to know how many of
 each of the values 1,2,3,4 there are. so you want 4 bins, each
 *centered* on the integers, so you might do:
 
 In [8]: np.histogram(x, bins=4, range=(0.5, 4.5))
 Out[8]: (array([1, 1, 1, 1]), array([ 0.5,  1.5,  2.5,  3.5,  4.5]))
 
 or, if you want to be more explicit:
 
 In [14]: np.histogram(x, bins=np.linspace(0.5, 4.5, 5))
 Out[14]: (array([1, 1, 1, 1]), array([ 0.5,  1.5,  2.5,  3.5,  4.5]))
 
 
 HTH,
 
 -Chris
 


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-13 Thread Christopher Barker

Brandt Belson wrote:
 Unfortunately I can't flatten the arrays. I'm writing a library where 
 the user supplies an inner product function for two generic objects, and 
 almost always the inner product function does large array 
 multiplications at some point. The library doesn't get to know about the 
 underlying arrays.

Now I'm confused -- if the user is providing the inner product 
implementation, how can you optimize that? Or are you trying to provide 
said user with an optimized large array multiplication that he/she can 
use?

If so, then I'd post your implementation here, and folks can suggest 
improvements.

If it's regular old element-wise multiplication:

a*b

(where a and b are numpy arrays)

then you are right, numpy isn't using any fancy multi-core aware 
optimized  package, so you should be able to make a faster version.

You might try numexpr also -- it's pretty cool, though may not help for 
a single operation. It might give you some ideas, though.

http://www.scipy.org/SciPyPackages/NumExpr


-Chris



 Thanks,
 Brandt
 
 
 
 Message: 2
 Date: Fri, 10 Jun 2011 09:23:10 -0400
 From: Olivier Delalleau sh...@keba.be mailto:sh...@keba.be
 Subject: Re: [Numpy-discussion] Using multiprocessing (shared memory)
with numpy array multiplication
 To: Discussion of Numerical Python numpy-discussion@scipy.org
 mailto:numpy-discussion@scipy.org
 Message-ID: banlktikjppc90ye56t1mr+byaxxaw32...@mail.gmail.com
 mailto:banlktikjppc90ye56t1mr%2bbyaxxaw32...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1
 
 It may not work for you depending on your specific problem
 constraints, but
 if you could flatten the arrays, then it would be a dot, and you
 could maybe
 compute multiple such dot products by storing those flattened arrays
 into a
 matrix.
 
 -=- Olivier
 
 2011/6/10 Brandt Belson bbel...@princeton.edu
 mailto:bbel...@princeton.edu
 
   Hi,
   Thanks for getting back to me.
   I'm doing element wise multiplication, basically innerProduct =
   numpy.sum(array1*array2) where array1 and array2 are, in general,
   multidimensional. I need to do many of these operations, and I'd
 like to
   split up the tasks between the different cores. I'm not using
 numpy.dot, if
   I'm not mistaken I don't think that would do what I need.
   Thanks again,
   Brandt
  
  
   Message: 1
   Date: Thu, 09 Jun 2011 13:11:40 -0700
   From: Christopher Barker chris.bar...@noaa.gov
 mailto:chris.bar...@noaa.gov
   Subject: Re: [Numpy-discussion] Using multiprocessing (shared
 memory)
  with numpy array multiplication
   To: Discussion of Numerical Python numpy-discussion@scipy.org
 mailto:numpy-discussion@scipy.org
   Message-ID: 4df128fc.8000...@noaa.gov
 mailto:4df128fc.8000...@noaa.gov
   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
  
   Not much time, here, but since you got no replies earlier:
  
  
  I'm parallelizing some code I've written using the built in
multiprocessing
  module. In my application, I need to multiply many
 large arrays
together
  
   is the matrix multiplication, or element-wise? If matrix, then numpy
   should be using LAPACK, which, depending on how its built, could be
   using all your cores already. This is heavily dependent on your your
   numpy (really the LAPACK it uses0 is built.
  
  and
  sum the resulting product arrays (inner products).
  
   are you using numpy.dot() for that? If so, then the above applies to
   that as well.
  
   I know I could look at your code to answer these questions, but I
   thought this might help.
  
   -Chris
  
  
  
  
  
   --
   Christopher Barker, Ph.D.
   Oceanographer
  
   Emergency Response Division
   NOAA/NOS/ORR(206) 526-6959
 tel:%28206%29%20526-6959   voice
   7600 Sand Point Way NE   (206) 526-6329
 tel:%28206%29%20526-6329   fax
   Seattle, WA  98115   (206) 526-6317
 tel:%28206%29%20526-6317   main reception
  
   chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
  
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman

Re: [Numpy-discussion] fixing up datetime

2011-06-09 Thread Christopher Barker


















-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-09 Thread Christopher Barker

Not much time, here, but since you got no replies earlier:


   I'm parallelizing some code I've written using the built in
 multiprocessing
   module. In my application, I need to multiply many large arrays
 together

is the matrix multiplication, or element-wise? If matrix, then numpy 
should be using LAPACK, which, depending on how its built, could be 
using all your cores already. This is heavily dependent on your your 
numpy (really the LAPACK it uses0 is built.

   and
   sum the resulting product arrays (inner products).

are you using numpy.dot() for that? If so, then the above applies to 
that as well.

I know I could look at your code to answer these questions, but I 
thought this might help.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code review for datetime arange

2011-06-09 Thread Christopher Barker

Mark Wiebe wrote:
 Because of the nature of datetime and timedelta, arange has to be 
 slightly different than with all the other types. In particular, for 
 datetime the primary signature is np.arange(datetime, datetime, timedelta).
 
 I've implemented a simple extension which allows for another way to 
 specify a date range, as np.arange(datetime, timedelta, timedelta).

instead of, or in addition to, the above?

it seems you can pass in the following types:

strings
np.datetime64
np.timedelta64
integers
(floats ?)

Are you essentially doing method overloading to determine what it all means?

How do you know if:

np.arange('2011', '2020', dtype='M8[Y]')

means you want from the years 2011 to 2020 or from 2011 to 4031?

   np.arange('today', 10, 3, dtype='M8')
 array(['2011-06-09', '2011-06-12', '2011-06-15', '2011-06-18'], 
 dtype='datetime64[D]')

so dtype 'M8' defaults to increments of days?

of course, I've lost track of the difference between 'M' and 'M8'

(I've never liked the dtype code anyway -- I far prefer np.float64 to 
'd', for instance)

Will there be a linspace for datetimes?

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy c api general array handling

2011-06-08 Thread Christopher Barker

Mark Wiebe wrote:
 I believe what you may want is PyArray_ContiguousFromAny instead of 
 PyArray_ContiguousFromObject.

I would also strongly encourage you to take a look at Cython:

http://cython.org/

It has built-in support for numpy arrays, so it can take care of a lot 
of this bookkeeping for you.

http://wiki.cython.org/tutorials/numpy

-Chris



 Cheers,
 Mark
 
 On Wed, Jun 8, 2011 at 2:58 AM, Yoshi Rokuko yo...@rokuko.net 
 mailto:yo...@rokuko.net wrote:
 
 hey,
 
 i'm writing my first python module in c using numpy/arrayobject.h
 and i have some problems with different platforms (both linux but
 really different setup) so i suspect my array handling is not cor-
 rect.
 
 i'm used to c arrays and want to access large numpy arrays from
 within my c module with no strange array-iter-methods. So what are
 your remarks to the following:
 
   PyArg_ParseTuple(args, OO|ii, arg1, arg2, start, stop);
   index = (PyArrayObject *) PyArray_ContiguousFromObject(arg1,
 PyArray_INT, 1, 1);
   ix = (int *)index-data;
 
 then using like:
 
   for(k = ix[i]; k  ix[i+1]; k++) l = ix[k];
 
 this seems to work pretty well, but real problems come with:
   PyArg_ParseTuple(args, O, arg);
   adjc = (PyArrayObject *) PyArray_ContiguousFromObject(arg,
PyArray_INT, 2, 2);
   a = (int *)adjc-data;
 
 and then using like:
 
   aik = a[i + n * k];
 
 it seems like on one system it does not accept 2d numpy arrays just
 1d ones or i must hand him a list of 1d numpy arrays like that:
 
A = np.array([[1,0],[0,1]])
B = my.method(list(A))
 
 i would prefer to avoid the list() call in python.
 
 what are your remarks on performance would it be faster to do:
 
   PyArg_ParseTuple(args, OO|ii, arg1, arg2, start, stop);
   index = (PyArrayObject *) PyArray_ContiguousFromObject(arg1,
 PyArray_INT, 1, 1);
   ix = malloc(n * sizeof(int));
   for(i = 0; i  n; i++)
   ix[i] = (int *)index-data[i];
 
 and then use ix[] (i call a lot on ix).
 
 thank you and best regards, yoshi
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-07 Thread Christopher Barker

Pierre GM wrote:
 Using the ISO as reference, you have a good definition of months.

Yes, but only one. there are others. For instance, the climate modelers 
like to use a calendar that has 360 days a year: 12 30 day months. That 
way they get something with the same timescale as months and years, but 
have nice, linear, easy to use units (differentiable, and all that).

Mark Wiebe wrote:
 CodeInterpreted as
 Y   12M, 52W, 365D
 M   4W, 30D, 720h
 
 This is even self inconsistent:
 
 1Y == 365D
 
 1Y == 12M == 12 * 30D == 360D
 
 1Y == 12M == 12 * 4W == 12 * 4 * 7D == 336D
 
 1Y == 52W == 52 * 7D == 364D
 
 Is it not clear from this what a mess of mis-interpretation might result
 from all that?
 
 
 This part of the code is used for mapping metadata like [Y/4] - [3M], 
 or [Y/26] - [2W]. I agree that this '/' operator in the unit metadata 
 is weird, and wouldn't object to removing it.

Weird, dangerous, and unnecessary. I can see how some data may be on, 
for example quarters, but that should require a definition of quarters 
that's more defined.

 This goes to heck is the data is expressed in something like months
 since 1995-01-01
 
 Because months are only defined on a Calendar.
 
 
 Here's what the current implementation can do with that one:
 
   np.datetime64('1995-01-01', 'M') + 13
 numpy.datetime64('1996-02','M')

I see -- I have a better idea of the intent here, and I can see that as 
long as you keep everything in the same unit (say, months, in this 
case), then this can be a clean and effective way to deal with this sort 
of data.

As I said, the netcdf case is a different use case, but I think the 
issue there was that the creator of the data was thinking of it as being 
used like above: months since January, 1995, and the data was all 
integer values for months, it makes perfect sense, and is well defined.

The problem in that case is that the standard does not have a 
specification that enforces that the units stay months, and that the 
intervals are integers -- so software looked at that, converted it to, 
for example, python datetime instances, using some pre-defined 
definition for the length of a month), and gt something that 
mis-represented the data.

The numpy use-case is different, but it's my concern that that same kind 
of thing could easily happen, because people want to write generic code 
that deals with arbitrary np.datetime64 instances.

I suppose we could consider this analogous to issues with integer an 
floating point dtypes -- when you convert between those, it's 
user-beware, but I think that would be more clear if we had a set of dtypes:

datetime_months
datetime_hours
datetime_seconds

But that list would get big in a hurry!

Also, with the Python datetime module, for instance, what I like about 
it is that I don't have to know or care how it's stored internally -- 
all I need to know is what range and precision it can deal with. numpy 
has performance issues that may not make that possible, but I still like it.

maybe two types:

datetime_calendar: for Calendar-type units (months, business days, ...)

datetime_continuous: for linear units (seconds, hours, ...)

or something like that?


-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-07 Thread Christopher Barker

Dave Hirschfeld wrote:
 Robert Kern robert.kern at gmail.com writes:

 Well, [D/100] doesn't represent [864s]. It represents something that
 happens 100 times a day, but not necessarily at precise regular
 intervals.

 For example, suppose that I am representing payments that
 happen twice a month, say on the 1st and 15th of every month, or the
 5th and 20th. I would use [M/2] to represent that. It's not [2W], and
 it's not [15D]. It's twice a month.

Got it -- very useful concept, of course.

 The default conversions may seem to imply that [D/100] is equivalent
 to [864s], but they are not intended to. They are just a starting
 point for one to write one's own, more specific conversions.

Here is my issue -- I don't think there should be default conversions at 
all -- as you point out, twice a month is a well defined concept, but 
it is NOT the same as every 15 days (or any other set interval). I'm not 
sure it should be possible to convert to a regular interval at all.

 That would be one way of dealing with irregularly spaced data. I would argue
 that the example is somewhat back-to-front though. If something happens
 twice a month it's not occuring at a monthly frequency, but at a higher
 frequency.

And that frequency is 2/month.

 In this case the lowest frequency which can capture this data is
 daily frequency so it should be stored at daily frequency

I don't think it should, because it isn't 1/15 days, or, indeed, an 
frequency that can be specified as days. Sure you can specify the 5th 
and 20th of each month in a given time span in terms of days since an 
epoch, but you've lost some information there. When you want to do math 
-- like add a certain number of 1/2 months -- when is the 100th payment 
due? It seems keeping it in M/2 is the most natural way to deal with 
that -- then you don't need special code to do that addition, only when 
converting to a string (or other format) datetime.

Anyway, my key point is that converting to/from calendar-based units and 
linear time units is fraught with peril -- it needs to be really clear 
when that is happening, and the user needs to have a clear and ideally 
easy way to define how it should happen.

-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-06 Thread Christopher Barker

Mark Wiebe wrote:
 I'm wondering if removing the business-day unit from datetime64, and 
 adding a business-day API would be a good approach to support all the 
 things that are needed?

That sounds like a good idea to me -- and perhaps it could be a general 
Calendar Functions API, to handle other issues as well.

Looking again at the NEP, I see business day as a time unit, then 
below, that B is interpreted as 24h, 1440m, 86400s i.e. a day.

This seems like a really bad idea -- it implies that you can convert 
from business day to hours, and back again, but, of course, if you do:


start_date + n business_days

You should not get the same thing as:

start_date + (n * 24) hrs


which is why I think that a business day is not really a unit -- it is 
a specification for a Calendar operation.

My thought is that a datetime should not be able to be expressed in 
units like business days. A timedelta should, but then there needs to be 
a Calendar(i.e. a set of rules) associated with it, and it should be 
clearly distinct from linear unit timedeltas.


Business days aside, I also see that months and years are given 
Interpreted as definitions:

CodeInterpreted as
Y   12M, 52W, 365D
M   4W, 30D, 720h

This is even self inconsistent:

1Y == 365D

1Y == 12M == 12 * 30D == 360D

1Y == 12M == 12 * 4W == 12 * 4 * 7D == 336D

1Y == 52W == 52 * 7D == 364D

Is it not clear from this what a mess of mis-interpretation might result 
from all that?


In thinking more, numpy's needs are a little different that the netcdf 
standards -- netcdf is a way to transmit and store information -- some 
of the key problems that come up is that people develop tools to do 
things with the data that make assumptions. For instance, my tools that 
work with CF-compliant netcdf pretty much always convert the time axis 
(expresses as time_units since a_datetime) into python datetime objects, 
and then go from there.

This goes to heck is the data is expressed in something like months 
since 1995-01-01

Because months are only defined on a Calendar.

Anyway, this kind of thing may be less of an issue because we use numpy 
to write the tools, not to store and share the data, so hopefully, the 
tool author knows what she/he is working with. But I still think the 
distinction between linear, convertible, time units, and time units that 
vary depending on where you are on which calendar, and what calendar you 
are using, should be kept clearly distinct.


-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.histogram: upper range bin

2011-06-02 Thread Christopher Barker

Peter Butterworth wrote:
 in np.histogram the top-most bin edge is inclusive of the upper range
 limit. As documented in the docstring (see below) this is actually the
 expected behavior, but this can lead to some weird enough results:
 
 In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3)
 Out[72]: (array([1, 1, 2]), array([ 1.,  2.,  3., 4.]))
 
 Is there any way round this or an alternative implementation without
 this issue ?

The way around it is what you've identified -- making sure your bins are 
right. But I think the current behavior is the way it should be. It 
keeps folks from inadvertently loosing stuff off the end -- the lower 
end is inclusive, so the upper end should be too. In the middle bins, 
one has to make an arbitrary cut-off, and put the values on the line 
somewhere.


One thing to keep in mind is that, in general, histogram is designed for 
floating point numbers, not just integers -- counting integers can be 
accomplished other ways, if that's what you really want (see 
np.bincount). But back to your example:

  In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3)

Why do you want only 3 bins here? using 4 gives you what you want. If 
you want more control, then it seems you really want to know how many of 
each of the values 1,2,3,4 there are. so you want 4 bins, each 
*centered* on the integers, so you might do:

In [8]: np.histogram(x, bins=4, range=(0.5, 4.5))
Out[8]: (array([1, 1, 1, 1]), array([ 0.5,  1.5,  2.5,  3.5,  4.5]))

or, if you want to be more explicit:

In [14]: np.histogram(x, bins=np.linspace(0.5, 4.5, 5))
Out[14]: (array([1, 1, 1, 1]), array([ 0.5,  1.5,  2.5,  3.5,  4.5]))


HTH,

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-02 Thread Christopher Barker

Charles R Harris wrote:
  Good support for units and delta times is very useful, but
 parsing dates and times and handling timezones, daylight savings, leap 
 seconds, business days, etc., is probably best served by addon packages 
 specialized to an area of interest. Just my $.02

I agree here -- I think for numpy, what's key is to focus on the kind of 
things needed for computational use -- that is the performance critical 
stuff.

I suppose business-day type calculations would be both key and 
performance-critical, but that sure seems like the kind of thing that 
should go in an add-on package, rather than in numpy.

The stdlib datetime package is a little bit too small for my taste 
(couldn't I at least as for a TimeDelta to be expressed in, say, 
seconds, without doing any math on my own?), but the idea is good -- 
create the core types, let add-on packages do the more specialized stuff.

 * The existing datetime-related API is probably not useful, and in fact 
 those functions aren't used internally anymore. Is it reasonable to 
 remove the functions, or do we just deprecate them?

I say remove, but some polling to see if anyone is using it might be in 
order first.

 * Leap seconds probably deserve a rigorous treatment, but having an 
 internal representation with leap-seconds overcomplicates otherwise very 
 simple and fast operations.

could you explain more? I don't get the issues -- leap seconds would com 
e in for calculations like: a_given_datetime + a_timedelta, correct? 
Given leap years, and all the other ugliness, does leap seconds really 
make it worse?

 * Default conversion to string - should it be in UTC or with the local 
 timezone baked in?

most date_time handling should be time-zone neutral --i.e. assume 
everything is in the same timezone (and daylight savings status). Libs 
that assume you want the locale setting do nothing but cause major pain 
if you have anything out of the ordinary to do (and sometimes ordinary 
stuff, too).

If you MUST include time-zone, exlicite is better than implicit -- have 
the user specify, or, at the very least make it easy for the user to 
override any defaults.

 As UTC it may be confusing because 'now' will print 
 as a different time than people would expect.

I think now should be expressed (but also stored) in the local time, 
unless the user asks for UTC. This is consistent with the std lib 
datetime.now(), if nothing else.

 * Should the NaT (not-a-time) value behave like floating-point NaN? i.e. 
 NaT == NaT return false, etc. Should operations generating NaT trigger 
 an 'invalid' floating point exception in ufuncs?

makes sense to me -- at least many folks are used to NaN symantics.

 And after the removal of datetime from 1.4.1 and now this, I'd be in
 favor of putting a large experimental sticker over the whole thing
 until further notice.
  
 Do we have a good way to do that?

Maybe a experimental warning, analogous to the deprecation warning.

 Good support for units and delta times is very useful,

 This part works fairly well now, except for some questions like what
 should datetime(2011-01-30, D) + timedelta(1, M) produce. Maybe
 2011-02-28, or 2011-03-02?

Neither -- month should not be a valid unit to express a timedelta in. 
Nor should year, or anything else that is not clearly defined (we can 
argue about day, which does change a bit as the earth slows down, yes?)

Yes, it's nice to be able to easily have a way of expressing things like 
every month, or a month from now when you mean a calendar month, but 
it's a heck of a can of worms.

We just had a big discussion about this in the netcdf CF metadata 
standards list:

http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007807.html

We more or less came to the conclusion (I did, anyway) that there were 
two distinct, but related concepts:

1) time as a strict unit of measurement, like length, mass, etc. In that 
case, don't use months as a unit.

2) Calendars -- these are what months, days of week, etc, etc, etc. are 
from, and these get ugly. I also learned that there are even more 
calendars than I thought. Beyond the Julian, Gregorian, etc, there are 
special ones used for climate modeling and the like, that have nice 
properties like all months being 30 days long, etc. Plus, as discussed, 
various business calendars.

So: I think that the calendar-related functions need fairly self 
contained library, with various classes for the various calendars one 
might want to use, and a well specified way to define new ones.


-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-02 Thread Christopher Barker

Benjamin Root wrote:
 Just throwing it out there.  Back in my Perl days (I do not often admit 
 this), the one module that I thought was really well done was the 
 datetime module.  Now, I am not saying that we would need to support the 
 calendar from Lord of the Rings or anything like that, but its floating 
 timezone concept and other features were quite useful.  Maybe there are 
 lessons to be learned from there?

Another one to look at is the classes in the Earth System Modeling 
Framework (ESMF). That will give you a good idea what the 
atmospheric/oceanographic/climate modelers need, and how at least one 
well-respected group has addresses these issues.

http://www.earthsystemmodeling.org/esmf_releases/non_public/ESMF_5_1_0/ESMC_crefdoc/node6.html

(If that long URL breaks, I found that by googling: ESMF calendar date 
time)

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-02 Thread Christopher Barker

Pierre GM wrote:
 I also don't think that units like month, year, business day 
 should be allowed -- it just adds confusion. It's not a killer if they 
 are defined in the spec:
 
 -1 In scikits.timeseries, not only did we have years,months and
 business days, but we also had weeks, that proved quite useful sometimes,

Actually, I have little issue with weeks -- it can be clearly defined as 
7 * 24 hours. The problem with months is that they are not all the same 
length.

One of the issues pointed out in the discussion on the CF list was that 
for the most part, scientist expect that a time series has nice 
properties, like differentiability, etc. That all goes to heck if you 
have months as a unit. Unless a month is clearly defined as a particular 
number of hours, and ONLY means that, but then some folks may not get 
what they expect.

 Anyhow, years and months are simple enough.

no, they are not -- they are fundamentally different than hours, days, etc.

 In case of ambiguities  like Mark's example, we can always raise an exception.

sure -- but what I'm suggesting is that rather than have one thing (a 
datetime array) that behaves differently according to what units it 
happens to use, we clearly separate what I'm calling Calendar 
functionality for basic time functionality. What that means to me is 
that a TimeDelta is ALWAYS an unambiguous time unit, and a datetime is 
ALWAYS expressed as an unambiguous time unit since a well-defined epoch.

If you want something like 6 months after a datetime, or 14 business 
days after a datetime that should be handles by a calendar class of 
some sort.

(hmm, maybe I'm being too pedantic here -- the Calendar class could be 
associated with the datetime and timedelta objects, I suppose) so a 
given timedelta knows how to do math with itself. But I think this is 
fraught with problems:

one of the keys to usability ond performance of the datetiem arrays is 
that they are really arrays of integers, and you can do math with them 
just as though they are integers. That will break if you allow these 
non-linear time steps.

So, thinking out loud here, maybe:

datetime
timedelta
calendartimedelta

to make it very clear what one is working with?

as for calendardatetime, at first blush, I don't think there is a 
need. Do you need to express a datetime as n months from the epoch, 
rather than 12:00am on July 1, 1970 -- which could be represented in 
hours, minutes, seconds, etc?

What I'm getting at is that the difference between calendars is in what 
timedeltas mean, not what a unit in time is.


 ISO8601 seems quite OK.

All that does is specify a string representation, no?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fixing up datetime

2011-06-02 Thread Christopher Barker

Mark Wiebe wrote:

 It is possible to implement the system so that if you don't use Y/M/B, 
 things work out unambiguously, but if you do use them you get a behavior 
 that's a little weird, but with rules to eliminate the calendar-created 
 ambiguities. 

yes, but everyone wants different rules -- so it needs to be very clear 
which rules are in place, and there needs to be a way for a user to 
specify his/her own rules.

 For the business day unit, what I'm currently trying to do 
 is get an assessment of whether my proposed design the right abstraction 
 to support all the use cases of people who want it.

Good plan.

 I rather agree here, adding the 'origin' back in is definitely worth 
 considering. How is the origin represented in the CF netcdf code?

As a ISO 1601 string. I can't recall if you have the option of 
specifying a non-standard calendar.

 So using the calendar specified by ISO 8601 as the default for the 
 calendar-based functions is undesirable? 

no -- that's fine -- but does ISO 8601 specify stuff like business day?

 I think supporting it to a 
 small extent is reasonable, and support for any other calendars or more 
 advanced calendar-based functions would go in support libraries.

yup -- what I'm trying to press here is the distinction between linear 
time units and the weird concepts, like business day, month, etc.

I think there are two related, but distinct issues:

1) representation/specification of a datetime. The idea here is that 
imagine that there is a continuous property called time (which I suppose 
has a zero at the Big Bang). We need a way to define where (when) in 
that continuum a given event, or set of events occurred. This is what 
the datetime dtype is about. I think the standard of some-time-unit 
since some-reference-datetime, in some-calendar is fine, but that the 
time-unit should be unambiguously and clearly defined, and not change 
with when it occurs, i.e. seconds, hours, days, but not months, years, 
or business days.

2) time spans, and math with time: i.e. timedeltas --- this falls into 2 
categories:

   a) simple linear time units: seconds, hours, etc. This is quite 
straightforward, if working with other time deltas and datetimes all 
expressed in well-defined linear units.

   b) calendar manipulations: months since, business days since, 
once a month, the first sunday of teh month, next monday. These 
require a well defined and complex Calendar, and there are many possible 
such Calendars.

What I'm suggesting is that (a) and (b) should be kept quite distinct, 
and that it should be fairly easy to define and use custom Calendars 
defined for (b).

(a) and (b) could be merged, with various defaults and exceptions raised 
for poorly defined operations, but I think that'll be less clear, harder 
to implement, and more prone to error.


A little example, again from the CF mailing list (which spawned the 
discussion). In the CF standard the units available are defined as 
those supported by the udunits library:

http://www.unidata.ucar.edu/software/udunits/

It turns out that udunits only supports time manipulation as I specified 
as (a) i.e. only clearly defined linear time units. However, they do 
define months and years, as specific values (something like 365.25 
days/year and 12 months/year -- though they also have Julian-year, 
leap_year, etc)

So folks would specify a time axes as : months since 2010-01 and 
expect that they were getting calandar months, like 1 would mean Feb, 
2010, instaed of January 31, 2010 (or whatever).

Anyway, lots of room for confusion, so whatever we come up with needs to 
be clearly defined.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New functions.

2011-06-01 Thread Christopher Barker

On 5/31/11 6:08 PM, Charles R Harris wrote:
 2) Ufunc fadd (nanadd?) Treats nan as zero in addition.

so:

In [53]: a
Out[53]: array([  1.,   2.,  nan])

In [54]: b
Out[54]: array([0, 1, 2])

In [55]: a + b
Out[55]: array([  1.,   3.,  nan])

and nanadd(a,b) would yield:

array([  1.,   3.,  2.)

I don't see how that is particularly useful, at least not any more 
useful that nanprod, nandiv, etc, etc...

What am I missing?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] what does in do with numpy arrays?

2011-05-31 Thread Christopher Barker

Hi folks,

I've re-titled this thread, as it's about a new question, now:

What does:

something in a_numpy_array

mean? i.e. how has __contains__ been defined?

A couple of us have played with it, and can't make sense of it:

 In [24]: a
 Out[24]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

 In [25]: 3 in a
 Out[25]: True

 So the simple case works just like a list. But what If I look for an array in 
 another array?

 In [26]: b
 Out[26]: array([3, 6, 4])

 In [27]: b in a
 Out[27]: False

 OK, so the full b array is not in a, and it doesn't vectorize it,
 either. But:

 In [29]: a
 Out[29]:
 array([[ 0,  1,  2],
  [ 3,  4,  5],
  [ 6,  7,  8],
  [ 9, 10, 11]])

 In [30]: b in a
 Out[30]: True

 HUH?

 I'm not sure by what definition we would say that b is contained in a.

 but maybe..

 In [41]: b
 Out[41]: array([  4,   2, 345])

 In [42]: b in a
 Out[42]: False

 so it's are all of the elements in b in a somewhere? but only for 2-d
 arrays?


 So what does it mean?

 The docstring is not helpful:

 In [58]: np.ndarray.__contains__?
 Type: wrapper_descriptor
 Base Class:   type 'wrapper_descriptor'
 String Form:  slot wrapper '__contains__' of 'numpy.ndarray' objects
 Namespace:Interactive
 Docstring:
   x.__contains__(y)==  y in x

On 5/29/11 2:50 PM, eat wrote:
 FWIW, a short prelude on the theme seems quite promising, indeed:
 In []: A
 Out[]:
 array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
 In []: [0, 1, 2] in A
 Out[]: True
 In []: [0, 3, 6] in A
 Out[]: True
 In []: [0, 4, 8] in A
 Out[]: True
 In []: [8, 4, 0] in A
 Out[]: True
 In []: [2, 4, 6] in A
 Out[]: True
 In []: [6, 4, 2] in A
 Out[]: True
 In []: [3, 1, 5] in A
 Out[]: True
 In [1061]: [3, 1, 4] in A
 Out[1061]: True
 But
 In []: [1, 2, 3] in A
 Out[]: False
 In []: [3, 2, 1] in A
 Out[]: True

 So, obviously the logic behind __contains__ is not so very
 straightforward. Perhaps just a bug?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy question

2011-05-27 Thread Christopher Barker

On 5/27/11 11:06 AM, Talla wrote:
 I am using python24 and Numeric under windows XP and I am trying to
 create *.agr file from *.dbs file, I followed the instructions very
 carefully and every time I am getting the following message:
 Traceback (most recent call last):
File AbinitBandStructuremaker.py, line 1274, in ?
  data.fermienergy = max(maxlist)
 ValueError: max() arg is an empty sequence

 Nothing wrong in the .dbs file

This is really a question for the author of AbinitBandStructuremaker.py. 
If there is no one familiar with that code tha can help you'll need to 
debug it yourself.

That error means that in this case, maxlist is an empty list, so tring 
to find teh maximum value in it is not possible. It also looks like that 
is the built-in python max(0, not the Numeric one, so this isn't really 
a numpy-related question anyway.

without more context, we can't even begin to figure out what's wrong.

Good luck,

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] finding elements that match any in a set

2011-05-27 Thread Christopher Barker

On 5/27/11 9:48 AM, Michael Katz wrote:
 I have a numpy array, records, with named fields including a field named
 integer_field. I have an array (or list) of values of interest, and I
 want to get the indexes where integer_field has any of those values.

 Because I can do

 indexes = np.where( records.integer_field  5 )

 I thought I could do

 indexes = np.where( records.integer_field in values )

 But that doesn't work. (As a side question I'm interested in why that
 doesn't work, when values is a python list.)

that doesn't work because the python list in operator doesn't 
understand arrays -- so it is looking ot see if the entire array is in 
the list. actually, it doesn't even get that far:

In [16]: a in l
---
ValueErrorTraceback (most recent call last)

/Users/chris.barker/ipython console in module()

ValueError: The truth value of an array with more than one element is 
ambiguous. Use a.any() or a.all()

The ValueError results because it was decided that numpy array should 
not have a boolean value to avoid confusion -- i.e. is na array true 
whenever it is non-empty (like a list), or when all it's elements are 
true, or

When I read this question, I thought -- hmmm, numpy needs something like 
in, as the usual way: np.any(), would require a loop in this case. 
Then I read Skipper's message:

On 5/27/11 9:55 AM, Skipper Seabold wrote:
 Check out this recent thread. I think the proposed class does what you
 want. It's more efficient than in1d, if values is small compared to
 the length of records.

So that class may be worthwhile, but I think np.in1d is exactly what you 
are looking for:

indexes = np.in1d( records.integer_field, values )

Funny I'd never noticed that before.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Less dimensions than expected with record array

2011-05-02 Thread Christopher Barker

 ndarray has 2 dimensions but record array has 1 dimensions

 This makes seemingly reasonable things, like using apply_along_axis()
 over a table of data with named columns, impossible:

 each row (record) is treated as one array element, so the structured
 array is only 1d.

 If you have rows/records with content that is not homogenous, then
 working along axis=1 (across elements of a record) doesn't make sense.
 for example I just struggle with 2 datetime columns and the rest are
 integers.

 If you want an array with homogenous elements (all floats or all ints)
 with operations along axis, then larry (la) is, I think, still the
 best bet.

another option is to use views. There are time when I want the same 
array visible as both a structured array, and a regular old array, 
depending on what I'm doing with it. and you can do that:

In [77]: data
Out[77]: [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [80]: dt = [('a',int),('b',int),('c',int)]

In [81]: record_array = np.array(data, dtype=dt)

In [84]: array = record_array.view(dtype=np.int).reshape(-1,3)

In [85]: array
Out[85]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# array and record_array share the same data:

In [88]: array[:,1] *= 2

In [89]: array
Out[89]:
array([[ 1,  4,  3],
[ 4, 10,  6],
[ 7, 16,  9]])

In [90]: record_array
Out[90]:
array([(1, 4, 3), (4, 10, 6), (7, 16, 9)],
   dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'i4')])


views are one of the really cool things about numpy.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array of size 'n' with common difference 1

2011-04-29 Thread Christopher Barker

On 4/29/11 12:31 AM, pratik wrote:
 On Friday 29 April 2011 12:56 PM, dileep kunjaai wrote:
 Dear sir,
 I am trying to make an array of varies from -60 to 90 with difference
 0.25. I tried the following command ...

 import numpy as N
 lat=N.array(xrange(-6000, 9000, 25), dtype=float)
 print lat/100

xrange() (or range(), or np.arange()) is almost never the right solution 
for floating point ranges, due to the intricacies of floating point 
precision.

 lat =numpy.mgrid[-60:90:.25]

or np.linspace:

np.linspace(-60,90,((60.+90.)*4. + 1))

((60.+90.)*4. + 1) is the number of points you want -- the +1 because 
you want both end points.

mgrid is usually used for 2-d (or higher) grids, though it looks like it 
makes sense for this use, too, though note that it doesn't give you both 
endpoints in this case. From the docs:

If the step length is not a
 complex number, then the stop is not inclusive.


and an example:

In [15]: np.mgrid[-1:3:.25]
Out[15]:
array([-1.  , -0.75, -0.5 , -0.25,  0.  ,  0.25,  0.5 ,  0.75,  1.  ,
 1.25,  1.5 ,  1.75,  2.  ,  2.25,  2.5 ,  2.75])

I think this is too bad, actually, because we're back to range()-type 
tricks to get the end point:

In [20]: np.mgrid[-1:3.25:.25]
Out[20]:
array([-1.  , -0.75, -0.5 , -0.25,  0.  ,  0.25,  0.5 ,  0.75,  1.  ,
 1.25,  1.5 ,  1.75,  2.  ,  2.25,  2.5 ,  2.75,  3.  ])


-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array of size 'n' with common difference 1

2011-04-29 Thread Christopher Barker

On 4/29/11 1:27 PM, Sebastian Haase wrote:

 Just for completeness, note this paragraph from the mgrid docs:

 However, if the step length is a *complex number* (e.g. 5j), then the
 integer part of its magnitude is interpreted as specifying the number
 of points to create between the start and stop values, where the stop
 value *is inclusive*.

OK -- for a kluge, I figure you could do complex, then take the real 
part, but I can't seem to get a complex grid to work at all:

In [37]: np.mgrid[(0+0j):(1+0j):(0.1+0j)]
Out[37]: array([], dtype=complex128)

What am I doing wrong?

Go it -- the docs could use some clarification here -- Actually, passing 
in a complex nubmer is a FLAG, indicating that the index means somethign 
different, but it's still doing everything with real numbers (floats). 
Wow, klunky API!

but here is what the OP would want:

np.mgrid[-60:90:((60.j+90.j)*4. + 1j)]

which is klunkier that linspace.

I'd have used two different function names for the different mgrid 
functionality, rather that than the complex kludge

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Numpy 1.6.0 beta 2

2011-04-06 Thread Christopher Barker

Sorry to keep harping on this, but for history's sake, I was one of the 
folks that got 'U' introduced in the first place. I was dealing with a 
nightmare of unix, mac and dos test files, 'U' was a  godsend.


On 4/5/11 4:51 PM, Matthew Brett wrote:
 The difference between 'rt' and 'U' is (this is for my own benefit):

 For 'rt', a '\r' does not cause a line break - with 'U' - it does.

Perhaps semantics, but what 'U' does is actually change any of the line 
breaks to '\n' -- any line breaking happens after the fact. In Py2, the 
difference between 'U' and 't' is that 't' assumes that any file read 
uses the native line endings -- a bad idea, IMHO. Back in the day, Guido 
argued that text file line ending conversion was the job of file 
transfer tools. The reality, however, is that users don't always use 
file transfer tools correctly, nor even understand the implications of 
line endings.

All that being said,  mac-style files are pretty rare these days. 
(though I bet I've got a few still kicking around)

 Right - my argument is that the behavior implied by 'U' and 't' is
 conceptually separable.   'U' is for how to do line-breaks, and
 line-termination translations, 't' is for whether to decode the text
 or not.  In python 3.

but 't' and 'U' are the same in python 3 -- there is no distinction. It 
seems you are arguing that there could/should be a way to translate line 
termination without decoding the text, but ...

 In python 3 a binary file is a file which is not decoded, and returns
 bytes.  It still has a concept of a 'line', as defined by line
 terminators - you can iterate over one, or do .readlines().

I'll take your word for it that it does, but that's not really a binary 
file then, it's a file that you are assuming is encoded in an 
ascii-compatible way.

While I know that practicality beats purity, we really should be 
opening the file as a text file (it is text, after all), and specifying 
utf-8 or latin-1 or something as the encoding.

However, IIUC, then the issue here is later on down the line, numpy uses 
regular old C code, which expects ascii strings. In that case, we could 
encode the text as ascii, into a bytes object.

That's a lot of overhead for line ending translation, so probably not 
worth it. But if nothing else, we should be clear in the docs that numpy 
text file reading code is expecting ascii-compatible data.

(and it would be nice to get the line-ending translation)

 Right - so obviously if you open a utf-16 file as binary, terrible
 things may happen - this was what Pauli was pointing out before.  His
 point was that utf-8 is the standard,

but it's not the standard -- it's a common use, but not a standard -- 
ideally numpy wouldn't enforce any particular encoding (though it could 
default to one, and utf-8 would be a good choice for that)

 Once you really make the distiction between text and binary, the concept
 of a line terminator doesn't really make sense anyway.

 Well - I was arguing that, given we can iterate over lines in binary
 files, then there must be the concept of what a line is, in a binary
 file, and that means that we need the concept of a line terminator.

maybe, but that concept is built on a assumption that your file is 
ascii-compatible (for \n anyway), and you know what they say about 
assumptions...

 I realize this is a discussion that would have to happen on the
 python-dev list...

I'm not sure -- I was thinking that python missed something here, but I 
don't think so anymore. In the unicode world, there is not choice but to 
be explicit about encodings, and if you do that, then python's text or 
binary distinction makes sense. .readline() for binary file doesn't, 
but so be it.

Honestly, I've never been sure in this discussion what code actually 
needs fixing, so I'm done now -- we've talked enough that the issues 
MUST have been covered by now!

-Chris

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Numpy 1.6.0 beta 2

2011-04-06 Thread Christopher Barker

On 4/5/11 10:33 PM, Matthew Brett wrote:
 Did you mean to send this just to me?  It seems like the whole is
 generally interesting and helpful, at least to me...

I did mean to send to the list -- I've done that now.

 Well, the current code doesn't split on \r in py3k, admittedly that
 must be a rare case.

I guess that's a key question here -- It really *should* split on /r, 
but maybe it's rare enough to be unimportant.

 The general point about whether to read binary or text is also in
 play.   I agree with you, reading text sounds like a better idea to
 me, but I don't know the performance hit.   Pauli had it as reading
 binary in the original conversion and was defending that in an earlier
 email...

The thing is -- we're talking about parsing text here, so we really are 
reading text files, NOT binary files.

So the question really is -- do we want py3's file reading code to 
handle encoding issues for us, or do we want to handle them ourselves. 
If we only want to support ansi encodings, then handling ourselves may 
well be easier and faster performing. If we go that way we need to 
handle line-endings, too. The easy way is to only support line endings 
with a '\n' in them -- that works out of the box. But it's not that hard 
to support 'r' also, depending on how you want to do it. Before 'U' 
existed, I did that all the time, something like:

some_text = file.read(some_size_buffer)
some_text.replace('\r\n', '\n')
some_text.replace('\r', '\n')
lines = some_text.split('\n')

(by the way, if you're going to support this, it's really nice to 
support mixed line-endings (like this approach does) -- there are a lot 
of editors that can make a mess of line endings.

If you can read the entire file into memory at once, this is almost 
trivial, if you can't -- there is a bit more bookeeping code to be written.

DARN -- I think I said my last note was the last on this topic!

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rounding the decimal part of a real number

2011-04-06 Thread Christopher Barker

On 4/6/11 6:24 AM, Alan G Isaac wrote:
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.round_.html

simple enough, of course, but just to be clear:

In [108]: np.round(1.23456789, 3)
Out[108]: 1.2351

so the number is rounded to the requested number of decimal places, but 
then stored in a binary floating point format, which may not be able to 
exactly represent that rounded number -- hence the 1 at the end. This 
is simply how floating point works.

and that slight difference _probably_ doesn't matter, but it's worth 
being aware of, because is does make a difference occasionally.

python has a decimal type that can work with exact decimal numbers -- 
numpy doesn't support that, as there is no hardware support for it (at 
least on most platforms).

If you want to display it differently, you can use the string formatters:

In [110]: %.3f%np.round(1.23456789, 3)
Out[110]: '1.235'

HTH, Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Numpy 1.6.0 beta 2

2011-04-05 Thread Christopher Barker

On 4/4/11 10:35 PM, Charles R Harris wrote:
 IIUC, Ub is undefined -- U means universal newlines, which makes no
 sense when used with b for binary. I looked at the code a ways back,
 and I can't remember the resolution order, but there isn't any checking
 for incompatible flags.

 I'd expect that genfromtxt, being txt, and line oriented, should use
 'rU'. but if it wants the raw line endings (why would it?) then rb
 should be fine.


 U has been kept around for backwards compatibility, the python
 documentation recommends that it not be used for new code.

That is for  3.*  -- the 2.7.* docs say:


In addition to the standard fopen() values mode may be 'U' or 'rU'. 
Python is usually built with universal newline support; supplying 'U' 
opens the file as a text file, but lines may be terminated by any of the 
following: the Unix end-of-line convention '\n', the Macintosh 
convention '\r', or the Windows convention '\r\n'. All of these external 
representations are seen as '\n' by the Python program. If Python is 
built without universal newline support a mode with 'U' is the same as 
normal text mode. Note that file objects so opened also have an 
attribute called newlines which has a value of None (if no newlines have 
yet been seen), '\n', '\r', '\r\n', or a tuple containing all the 
newline types seen.

Python enforces that the mode, after stripping 'U', begins with 'r', 'w' 
or 'a'.


which does, in fact indicate that 'Ub' is NOT allowed. We should be 
using 'Ur', I think. Maybe the python enforces is what we saw the 
error from -- it didn't used to enforce anything.


On 4/5/11 7:12 AM, Charles R Harris wrote:

 The 'Ub' mode doesn't work for '\r' on python 3. This may be a bug in
 python, as it works just fine on python 2.7.

Ub never made any sense anywhere -- U means universal newline text 
file. b means binary -- combining them makes no sense. On older 
pythons, the behaviour of 'Ub' was undefined -- now, it looks like it is 
supposed to raise an error.

does 'Ur' work with \r line endings on Python 3?

According to my read of the docs, 'U' does nothing -- universal 
newline support is supposed to be the default:


On input, if newline is None, universal newlines mode is enabled. Lines 
in the input can end in '\n', '\r', or '\r\n', and these are translated 
into '\n' before being returned to the caller.


 It may indeed be desirable
 to read the files as text, but that would require more work on both
 loadtxt and genfromtxt.

Why can't we just open the file with mode 'Ur'? text is text, messing 
with line endings shouldn't hurt anything, and it might help.

If we stick with binary, then it comes down to:
- will having an extra \r with Windows files hurt anything? -- probably not.
- Are there many mac-style text files out there anymore? not many.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] convert array to tuples?

2011-04-05 Thread Christopher Barker

HI folks,

What's the fastest way to convert a numpy array to tuples?

Unfortunately, not all packages take arbitrary sequences when they are 
expecting a list of tuples. In this case, I'm using PIL's ImageDraw, and 
want to do:

draw.line(points)

(points is an Nx2 numpy array of ints)

but if I pass in an array, I get, oddly enough, nothing. NO error, just 
nothing drawn.

if I do:

draw.line(points.tolist())

SystemError: new style getargs format but argument is not a tuple

So I seem to need tuples. This works:

draw.line([tuple(p) for p in points.tolist()])

but that seems like it would be slower than it should be (to be honest, 
not profiled).

Is there a faster way?

Maybe numpy should have a ndarray.totuple() method.

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] convert array to tuples?

2011-04-05 Thread Christopher Barker

On 4/5/11 10:52 AM, Sebastian Haase wrote:
 How about fixing PIL... I know that there is not a good track record
 of getting patches into PIL ,
 but did you get to the bottom of it and find how draw.line is implemented?

no. I haven't. And yes adapting PIL to use the buffer interface for this 
kind of thing would be great. I also have wanted to do that for 
wxPython, where it would be welcome, but still haven't gotten around to it.

 BTW, is it drawing anti-aliased lines ?

Not with the ImageDraw module, but there is a new AggDraw module that 
should be more pretty.

On 4/5/11 11:49 AM, Mark S Niemczyk wrote:
 Just a suggestion (I am very new to Numpy), but wouldn't

   draw.line(tuple(points.tolist()))

 accomplish what you want.

nope, tolist() creates a list of lists -- tuple() only converts the 
outer list to a tuple, which I why I did the list comprehension.


On 4/5/11 10:53 AM, josef.p...@gmail.com wrote:
 does a structured dtype work?

Good idea -- I'll give it a try -- indeed it does:

dt = np.dtype([('x', np.int32),('y',np.int32)])

draw.line(points.view(dtype=dt).reshape((-1,)).tolist(), 
fill=rgb(0,255,0), width=4)

but it's not any faster, alas.

The fastest I've come up with is putting the elements in a single 1-d 
list. PIL can take either a sequence of tuples:

[(x,y),(x,y),...]

or the flattened version:

[x,y,x,y,x,y,x,y,...]

So this is the fastest:

draw.line(points.flatten().tolist(), fill=rgb(0,255,0), width=4)


Thanks all,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: Numpy 1.6.0 beta 2

2011-04-05 Thread Christopher Barker

On 4/5/11 3:36 PM, josef.p...@gmail.com wrote:
 I disagree that U makes no sense for binary file reading.

I wasn't saying that it made no sense to have a U mode for binary file 
reading, what I meant is that by the python2 definition, it made no 
sense. In Python 2, the ONLY difference between binary and text mode is 
line-feed translation.

As for Python 3:

 In python 3:

 'b' means, return byte objects
 't' means return decoded strings

 'U' means two things:

 1) When iterating by line, split lines at any of '\r', '\r\n', '\n'
 2) When returning lines split this way, convert '\r' and '\r\n' to '\n'

a) 'U' is default -- it's essentially the same as 't' (in PY3), so 't' 
means return decoded and line-feed translated unicode objects

b) I think the line-feed conversion is done regardless of if you are 
iterating by lines, i.e. with a full-on .read(). At least that's how it 
works in py2 -- not running py3 here to test.

 If you support returning lines from a binary file (which python 3
 does), then I think 'U' is a sensible thing to allow - as in this
 case.

but what is a binary file?

I THINK what you are proposing is that we'd want to be able to have both 
linefeed translation and no decoding done. But I think that's impossible 
-- aren't the linefeeds themselves encoded differently with different 
encodings?

 U looks appropriate in this case, better than the workarounds.
 However, to me the python 3.2 docs seem to say that U only works for
 text mode

Agreed -- but I don't see the problem -- your files are either encoded 
in something that might treat newlines differently (UCS32, maybe?), in 
which case you'd want it decoded, or you are working with ascii or ansi 
or utf-8, in which case you can specify the encoding anyway.

I don't understand why we'd want a binary blob for text parsing -- the 
parsing code is going to have to know something about the encoding to 
work -- it might as well get passed in to the file open call, and work 
with unicode. I suppose if we still want to assume ascii for parsing, 
then we could use 't' and then re-encode to ascii to work with it. Which 
I agree does seem heavy handed just for fixing newlines.

Also, one problem I've often had with encodings is what happens if I 
think I have ascii, but really have a couple characters above 127 -- 
then the default is to get an error in decoding. I'd like to be able to 
pass in a flag that either skips the un-decodable characters or replaces 
them with something, but it doesn't look like you can do that with the 
file open function in py3.

 The line terminator is always b'\n' for binary files;

Once you really make the distiction between text and binary, the concept 
of a line terminator doesn't really make sense anyway.

In the ansi world, everyone should always have used 'U' for text. It 
probably would have been the default if it had been there from the 
beginning. People got away without it because:
  1) dos line feeds have a \n in them anyway
  2) most if the time it doesn't matter that there is an extra 
whitespace charater inther
  3) darn few of us ever had to deal with the mac \r

Now that we are in a unicode world (at least a little) there is simply 
no way around the fact that you can't reliably read a file without 
knowing how it is encoded.

My thought at this point is to say that the numpy text file reading 
stuff only works on 1byte, ansi encoding (nad maybe only ascii), and be 
done with it. utf-8 might be OK -- I don't know if there are any valid 
files in, say latin-1 that utf-8 will barf on -- you may not get the 
non-ascii symbols right, but that's better than barfing.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Old tickets

2011-03-31 Thread Christopher Barker

On 3/31/11 8:41 AM, Benjamin Root wrote:
 The ticket is about the functions np.divide and np.power, not / and
 **. This currently does not work the same, unlike what's said above:

 Hmm, I didn't notice that distinction.  So long as there is a consistent
 difference between the function and the operator, I am fine with that.

   whether we want
   np.power to behave differently from ** (if they are internally handled
   separately at all)...

Please, no! the operator and functions should behave the same. I 
certainly always expect them to. For instance, I often write code with 
operators, then replace that with the functions so that i can add an 
out parameter for performance reasons -- I would be very surprised if 
the function were defined differently -- it would be a source of 
surprising bugs.

 So, in this case, the ** operator and np.power are identical, but
 diverges from the behavior of python's operator.

I think consistency within numpy is more important that consistency with 
python -- I expect differences like this from python (long integers, 
different data types, etc)

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Willing to contribute to SciPy NumPy ...

2011-03-31 Thread Christopher Barker

On 3/31/11 12:04 PM, Anthony Scopatz wrote:
 A good place to get started is helping out with documentation (see
 http://docs.scipy.org/numpy/Front%20Page/).


 Actually, according to the Jacob Kaplan-Moss's talk at PyCon this year
 (http://pycon.blip.tv/file/4881071/)

Nice talk -- thanks for the link.

 (with which I am inclined to agree), documentation is probably not the
 best way to get started.

I agree as well. However, I think newbies can be a great help in 
identifying holes in the documentation -- so when documentation is 
missing or unclear, ask here, and then you'll have something to contribute.

-Chris

 Do check out the developer zone, maybe search the the open tickets and
 see if there are any bugs you want to tackle.
 Maybe write some tests for untested code you find.

Good ideas as well, of course.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array views

2011-03-26 Thread Christopher Barker

On 3/26/11 10:32 AM, srean wrote:
   I am also interested in this. In my application there is a large 2d
 array, lets call it 'b' to keep the notation consistent in the thread.
 b's  columns need to be recomputed often. Ideally this re-computation
 happens in a function. Lets call that function updater(b, col_index):
 The simplest example is where
 updater(b, col_index) is a matrix vector multiply, where the matrix or
 the vector changes.

   Is there anyway apart from using ufuncs that I can make updater()
 write the result directly in b and not create a new temporary column
 that is then copied into b ?  Say for the matrix vector multiply example.

Probably not -- the trick is that when an array is a view of a slice of 
another array, it may not be laid out in memory in a way that other libs 
(like LAPACK, BLAS, etc) require, so the data needs to be copied to call 
those routines.

To understand all this, you'll need to study up a bit on how numpy 
arrays lay out and access the memory that they use: they use a concept 
of strided memory. It's very powerful and flexible, but most other 
numeric libs can't use those same data structures. Im not sure what a 
good doc is to read to learn about this -- I learned it from messing 
with the C API. TAke a look at any docs that talk about strides, and 
maybe playing with the stride tricks tools will help.

A simple example:

In [3]: a = np.ones((3,4))

In [4]: a
Out[4]:
array([[ 1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.]])

In [5]: a.flags
Out[5]:
   C_CONTIGUOUS : True
   F_CONTIGUOUS : False
   OWNDATA : True
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

So a is a (3,4) array, stored in C_contiguous fashion, jsut like a 
regular old C array. A lib expecting data in this fashion could use 
the data pointer just like regular C code.

In [6]: a.strides
Out[6]: (32, 8)

this means is is 32 bytes from the start of one row to the next, and 8 
bytes from the start of one element to the next -- which makes sense for 
a 64bit double.


In [7]: b = a[:,1]

In [10]: b
Out[10]: array([ 1.,  1.,  1.])

so b is a 1-d array with three elements.

In [8]: b.flags
Out[8]:
   C_CONTIGUOUS : False
   F_CONTIGUOUS : False
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

but it is NOT C_Contiguous - the data is laid out differently that a 
standard C array.

In [9]: b.strides
Out[9]: (32,)

so this means that it is 32 bytes from one element to the next -- for a 
8 byte data type. This is because the elements are each one element in a 
row of the a array -- they are not all next to each other. A regular C 
library generally won't be able to work with data laid out like this.

HTH,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array views

2011-03-26 Thread Christopher Barker

On 3/26/11 10:12 AM, Pauli Virtanen wrote:
 On Sat, 26 Mar 2011 13:10:42 -0400, Hugo Gagnon wrote:
 [clip]
 a1 = b[:,0]
 a2 = b[:,1]
 ...

 and it works but that doesn't help me for my problem. Is there a way to
 reformulate the first code snippet above but with shallow copying?

 No. You need an 2-D array to own the data. The second way is the
 approach to use if you want to share the data.

exactly -- but to clarify, it's not just about ownership, it's about 
layout of the data in memory. the data in a numpy array needs to be laid 
out in memory as one block, with consitent strides from one element to 
the next, one row to the next, etc.

When you create an array from scratch (like your 2-d array here), you 
get one big block of memory. If you create each row separately, they 
each have their own block of memory that are unrelated -- there is no 
way to put those together into one block with consistent strides.

So you need to create that big block first (the 2-d array), then you can 
reference parts of it for each row.

See my previous note for a bit more discussion.

Oh, and maybe the little presentation and sample code I gave to the 
Seattle Python Interest group will help:

http://www.seapig.org/November2010Notes

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Scientific Software developer wanted in Seattle.

2011-03-23 Thread Christopher Barker

Scientific Software Developer
NOAA Emergency Response Division

Help us develop our next-generation oil spill transport model.

Background:
The Emergency Response Division (ERD) of NOAA's Office of Response and 
Restoration (ORR) provides scientific expertise to support the response 
to oil and chemical spills in the coastal environment. We played a major 
role in the recent Deepwater Horizon oil spill in the Gulf of Mexico. 
In order to fulfill our mission, we develop many of the software tools 
and models required to support a response to hazardous material spills. 
  In the wake of the Deepwater horizon incident, we are embarking on a 
program to develop our next-generation oil spill transport model, taking 
into account lessons learned from years of response and this major incident.

General Characteristics:
The incumbent of this position will provide software development 
services to support the mission of the Emergency Response Division of 
NOAA's Office of Response and Restoration. As part of his/her efforts, 
independent evaluation and application of development techniques, 
algorithms, software architecture, and programming patterns will be 
required.  The incumbent will work with the staff of ERD to provide 
analysis on user needs and software, GUI, and library design. He/she 
will be expect to work primarily on site at NOAA's facility in Seattle.

Knowledge:
The incumbent must be able to apply modern concepts of software 
engineering and design to the development of computational code, desktop 
applications, web applications, and libraries. The incumbent will need 
to be able to design, write, refactor, and implement code for a complex 
desktop and/or web application and computational library.  The incumbent 
will work with a multi-disciplinary team including scientists, users, 
and other developers, utilizing software development practices such as 
usability design, version control, bug and issue tracking, and unit 
testing.  Good communication skills and the knowledge of working as part 
of a team are required.

Direction received:
The incumbent will participate on various research and development 
teams.  While endpoints will be identified through Division management 
and some direct supervision will be provided, the incumbent will be 
responsible for progressively being able to take input from team 
meetings and design objectives and propose strategies for reaching 
endpoints.

Typical duties and responsibilities:
The incumbent will work with the oil and chemical spill modeling team to 
improve and develop new tools and models used in fate and transport 
forecasting.  Different components of the project will be written in 
C++, Python, and Javascript.

Education requirement, minimum:
Bachelor's degree in a technical discipline.

Experience requirement, minimum:
One to five years experience in development of complex software systems 
in one or more full-featured programming languages (C, C++, Java, 
Python, Ruby,  Fortran, etc.)

The team requires experience in the following languages/disciplines. 
Each incumbent will need experience in some subset:

  * Computational/Scientific programming
  * Numerical Analysis/Methods
  * Parallel processing
  * Desktop GUI
  * Web services
  * Web clients: HTML/CSS/Javascript
  * Python
  * wxPython
  * OpenGL
  * C/C++
  * Python--C/C++ integration
  * Software development team leadership


While the incumbent will work on-site at NOAA, directly with the NOAA 
team, this is a contract position with General Dynamics Information 
Technology:

http://www.gdit.com/default.aspx

For more information and to apply, use the GDIT web site:
https://secure.resumeware.net/gdns_rw/gdns_web/job_detail.cfm?key=59436show_cart=0referredId=20

if that long url doesn't work, try:

http://www.resumeware.net/gdns_rw/gdns_web/job_search.cfm

and search for job ID: 179178


NOTE: This is a potion being hired by GDIT to work with NOAA, so any 
questions about salary, benefits, etc, etc should go to GDIT. However, 
feel free to send me questions about our organization, working 
conditions, more detail about the nature of the projects etc.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] fast robus implementation of float()

2011-03-22 Thread Christopher Barker

On 3/22/11 1:54 PM, Christian K. wrote:
 I wonder if someone has a good solution for a fast conversion of gridded
 ascii data to ndarray.

the fastest out of the box way is with np.fromfile(input_file, sep= , 
dtype=np.float)

It will only read multiple lines if the separater is whitespace, but 
it's pretty fast if it does.

 It should manage ',' as decimal point (on demand)

I think that will work ig the locale is set right, though I don't know 
how to do that on demand

 and special windows numbers as 1.#INF.

I can't remember if it does that -- give it a try. It does use 
ascii-to-float function written for numpy to handle things like that.

 Of course, this is easy to wrap
 in a small function but I expect it to be slow when the input size is in
 the Mb range.

Never expect -- do a simple solution, then see if it's too slow for 
youre needs.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] avoid a line...

2011-03-17 Thread Christopher Barker

On 3/17/11 12:46 AM, eat wrote:
   I am try to read a file of the following format, I want to avoid
 the first line and read the remaining as 'float' .
 Please help me...


 RainWindTempPrSal
 0.11.10.020.2   0.2
 0.50.   0.  0.4   0.8
 0.55.51.50.5   1.5

It's not as robust, but if performance matters, fromfile() should be faster:

f.readline() # to skip the first line

arr = np.fromfile(f, sep=' ', dtype=np.float64).reshape((-1, 5))

(untested)

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] hashing dtypes, new variation, old theme

2011-03-17 Thread Christopher Barker

On 3/17/11 2:57 PM, Mark Wiebe wrote:

 Dtypes being mutable looks like a serious bug to me, it's violating the
 definition of 'hashable' given here:

I can imagine other problems is would cause, as well -- is there any 
reason that dtypes should be mutable?

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Nonzero behaving strangely?

2011-03-17 Thread Christopher Barker

On 3/17/11 3:17 PM, santhu kumar wrote:

 nid = (res[:,4]==2).nonzero()
 nid tuple turns out to be empty. But the very first row satisfies the
 criteria.

it works for me:

In [45]: arr
Out[45]:
array([[ 33.35053669,  49.4615004 ,  44.27631299,   1.,   2. 
 ],
[ 32.84263059,  50.24752036,  43.92291659,   1.,   0. 
  ],
[ 33.68999668,  48.90554673,  43.51746687,   1.,   0. 
  ],
[ 34.11564931,  49.77487763,  44.83843076,   1.,   0. 
  ],
[ 32.4641859 ,  48.65469145,  45.09300791,   1.,   3. 
  ],
[ 32.15428526,  49.26922262,  45.92959026,   1.,   0. 
  ],
[ 31.23860825,  48.21824628,  44.30816331,   1.,   0. 
  ],
[ 30.71171138,  47.45600573,  44.9282456 ,   1.,   0. 
  ],
[ 30.53843426,  49.07713258,  44.20899822,   1.,   0. 
  ],
[ 31.54722284,  47.61953925,  42.95235178,   1.,   0. 
  ],
[ 32.44334635,  48.10500653,  42.51103537,   1.,   0. 
  ],
[ 31.77269609,  46.53603145,  43.06468455,   1.,   0. 
  ],
[ 30.1820843 ,  47.80819604,  41.77667819,   1.,   0. 
  ],
[ 30.78652668,  46.82907769,  40.38586451,   1.,   0. 
  ],
[ 30.05963091,  46.84268609,  39.54583693,   1.,   0. 
  ],
[ 31.75239177,  47.22768463,  40.00717713,   1.,   0. 
  ],
[ 30.94617127,  45.76986265,  40.68226643,   1.,   0. 
  ],
[ 33.20069679,  47.42127403,  45.66738249,   1.,   0. 
  ],
[ 34.39608116,  47.25481126,  45.4438599 ,   1.,   0. 
  ]])

In [46]: arr.shape
Out[46]: (19, 5)

In [47]: nid = (arr[:,4]==2).nonzero()

In [48]: nid
Out[48]: (array([0]),)

Maybe you are having a floating point comparison problem -- i.e.e that 
2.0 is really 1.99 or something.

Checking equality with floating point numbers is fraught with problems.

Also, y9ou amy not need teh nonzero:

In [56]: arr[:,4]==2
Out[56]:
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, 
False], dtype=bool)

that's a boolean array that can be used for indexing, operating with 
where, etc.

HTH,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] *= operator not intuitive

2011-03-16 Thread Christopher Barker

On 3/16/11 9:22 AM, Paul Anton Letnes wrote:

 This comes up for discussion on a fairly regular basis. I tend towards the 
 more warnings side myself, but you aren't going to get the current behavior 
 changed unless you can convince a large bunch of people that it is the right 
 thing to do, which won't be easy.

Indeed, I don't think I'd want a warning for this -- though honestly, 
I'm not sure what you think should invoke a warning:


a = np.ones((3,), dtype=np.float32)

a + = 2.0

Should that be a warning? you are adding a 64 bit float to a 32bit float 
array.

a = np.ones((3,), dtype=np.float32)

a + = 2

now you are adding an integer -- should that be a warning?

a = np.ones((3,), dtype=np.int32)

a + = 2.1

now a float to an integer array -- should that be a warning?

As you can see -- there are a lot of options. As there are defined as 
in-place operators, I don't expect any casting to occur.

I think this is the kind of thing that would be great to have a warning 
the first time you do it, but once you understand, the warnings would be 
really, really annoying!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fortran was dead ... [was Re: rewriting NumPy code in C or C++ or similar]

2011-03-15 Thread Christopher Barker

On 3/15/11 8:33 AM, Charles R Harris wrote:
 There really isn't a satisfactory array library for C++. The fact that
 every couple of years there is another project to produce one testifies
 to that fact.

And I think not just the fact that there is not one, but that perhaps 
C++ the language, or maybe the culture, simply doesn't support that way 
of thinking.

I've been slowly arriving to the conclusion that that is no place for 
C++ in programming. If you really need to twiddle bits, use C. If you 
need high performance numerics, use Fortran. If you need high level 
complex data structures, use Python.

And the fact that using all these in the same program is pretty easy 
makes it all possible.

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-12 Thread Christopher Barker

Francesc Alted wrote:
 Just a C pointer to a malloc'ed area (it cannot be an ndarray, as the 
 data area might be compressed).

got it.

 Ok.  Thanks.  The code is really simple, and that's great.  However, 
 carray is quite more sophisticated, as it supports not only enlarging 
 arrays, but also shrinking,

that's handy.

 and since 0.4 on, it also supports n-
 dimensional arrays (although you can only resize along the first 
 dimension).

I've been meaning to add that, too - not much to it really, but haven't 
had a need yet, so haven't gotten around to it.

I hope it was a bit helpful, and I'm looking forward to seeing what's 
next for carray!

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-10 Thread Christopher Barker

On 3/7/11 5:51 PM, Sturla Molden wrote:
 Den 07.03.2011 18:28, skrev Christopher Barker:
 1, 2, 3, 4
 5, 6
 7, 8, 9, 10, 11, 12
 13, 14, 15
 ...


 A ragged array, as implemented in C++, Java or C# is just an array of
 arrays (or 'a pointer to an array of pointers').

Sure, but as a rule I don't find direct translation of C++ or Java code 
to Pyton the best approach ;-)


 Basically, that is an
 ndarray of ndarrays (or a list of ndarrays, whatever you prefer).

 ra = np.zeros(4, dtype=np.ndarray)
 ra[0] = np.array([1,2,3,4])
 ra[1] = np.array([5,6])
 ra[2] = np.array([7,8,9,10,11,12])
 ra[3] = np.array([13,14,15])
 ra
 array([[1 2 3 4], [5 6], [ 7  8  9 10 11 12], [13 14 15]], dtype=object)
 ra[1][1]
 6
 ra[2][:]
 array([ 7,  8,  9, 10, 11, 12])

yup -- or I could use a list to store the rows, which would add the 
ability to append rows.

 Slicing in two dimensions does not work as some might expect:

 ra[:2][:2]
 array([[1 2 3 4], [5 6]], dtype=object)

yup -- might want to overload indexing to do something smarter about 
that, though in myuse-case, slicing vertically isn't really useful 
anyway -- the nth element in one row doesn't neccessarily have anything 
to do with the nth element in another row.

However, asside from the slicing syntax issue, what I lose with the 
approach is the ability to get reasonable performance on operations on 
the entire array:

ra *= 3.3

Id like that to be numpy-efficient.

What I need to grapple with is:

1) Is there a point to trying to build a general purpose ragged array? 
Or should I jsut build something that satisfies my use-case at hand?

2) What's the balance I need between performance and flexibility? 
putting the rows in a list give a lot more flexibility, putting it all 
in one 1-d numpy array could give better performance.

NOTE: this looks like it could use a growable numpy array, much like 
one I've written before -- maybe it's time to revive that project and 
use it here, fixing some performance issues while I'm at it.

Thanks for all your ideas,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-10 Thread Christopher Barker

On 3/10/11 9:51 AM, Francesc Alted wrote:
 A Thursday 10 March 2011 18:05:11 Christopher Barker escrigué:
 NOTE: this looks like it could use a growable numpy array, much
 like one I've written before -- maybe it's time to revive that
 project and use it here, fixing some performance issues while I'm at
 it.

 A growable array would be indeed very useful in many situations.  My
 carray project [1] is another attempt to create such an object.  It
 grows (or shrink) arrays by adding (removing) chunks, so they are fast
 beasts for performing these operations.  I suppose this is more or less
 like your approach.

I don't think so -- my approach is a lot simpler that I think carray is:

it acts much like a python list:

1) it pre-allocates extra space when an array is created.
2) when you append, it uses that extra space
3) when the extra space is used up, it re-allocates the entire array, 
with some more extra room

And to keep it really simple, I'm using a numpy array internally, and 
np.ndarray.resize to grow it. It can only be grown on the first axis 
(C-order).

IIUC, carray is doing a lot more smarts about keeping the array in 
chunks that can be individually manipulated, which is very cool.NOt to 
mention the compression!

 I also thought about implementing ragged arrays in carray, like those
 you are suggesting, but not sure when they will be ready for public
 consumption.

That would be very cool, and I think you are right, the facilities in 
carray could make a really nice ragged array.

 Perhaps it is a good time to join efforts and create a nice 'growable'
 array package?

Sure, but I don't know that I have much to offer, except maybe some test 
and timing code.

By the way, it would be great to have a growable array that could be 
efficiently used in Cython code -- so you could accumulate a bunch of 
native C datatype data easily.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-10 Thread Christopher Barker

On 3/10/11 11:29 AM, Christopher Barker wrote:
 By the way, it would be great to have a growable array that could be
 efficiently used in Cython code -- so you could accumulate a bunch of
 native C datatype data easily.

I just noticed that carray is written in Cython, so that part should be 
easy!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.6 schedule

2011-03-07 Thread Christopher Barker

On 3/6/11 5:54 AM, Charles R Harris wrote:
 I suppose this might cause a problem with lazy/quick c extensions that
 expected elements in a certain order, so some breakage could occur.

absolutely!

(I've gotten a bit confused about this thread, but if this is about the 
question of whether structured dtypes are dtypes are assumed to always 
keep order, read on -- otherwise, ignore)..

I have always assumed a structured dtype was much like a C struct: i.e. 
arrangement in memory is fixed, and the named fields are just for 
convenience.

I was very surprised to learn that they could ever be used in more a a 
dict-like manner.

Also, I noted a recent post to this list suggesting that folks sometime 
miss-use stuctured arrays (particularly folks coming from MATLAB, etc), 
by using them when what they really want is a dict. In that post, the 
poster suggested that the primary use of structured arrays was for 
interacting with data files and C code that has arrays of structures, 
and indeed, that is how I have primarily used them.

So I think for performance and consistency with C code, keeping dtype 
order is the best way to go. If that breaks backward compatibility too 
much, then it shouldn't be changed, but I think it better reflects what 
dtypes are all about.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ragged array implimentation

2011-03-07 Thread Christopher Barker

Hi folks,

I'm setting out to write some code to access and work with ragged arrays 
stored in netcdf files. It dawned on me that ragged arrays are not all 
that uncommon, so I'm wondering if any of you have any code you've 
developed that I could learn-from borrow from, etc.

note that when I say a ragged array, I mean a set of data where the 
each row could be a different arbitrary length:

1, 2, 3, 4
5, 6
7, 8, 9, 10, 11, 12
13, 14, 15
...

In my case, these will only be 2-d, though I suppose one could have a 
n-d version where the last dimension was ragged (or any dimension, I 
suppose, though I'm having trouble wrapping my brain around what that 
would look like...

I'm not getting more specific about what I think the API should look 
like -- that is part of what I'm looking for suggestions, previous 
implementations, etc for.

Is there any standard way to work with such data?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-07 Thread Christopher Barker

On 3/7/11 9:33 AM, Francesc Alted wrote:
 A Monday 07 March 2011 18:28:11 Christopher Barker escrigué:
 I'm setting out to write some code to access and work with ragged
 arrays stored in netcdf files. It dawned on me that ragged arrays
 are not all that uncommon, so I'm wondering if any of you have any
 code you've developed that I could learn-from borrow from, etc.

 A list of numpy arrays would not be enough?  Or you want something more
 specific?

Maybe that would, but in mapping to the netcdf data model, I'm thinking 
more like a big 1-d numpy array, with a way to index into it. Also, that 
would allow you to do some math with the arrays, if the broad casting 
made sense, anyway.

But now that you've entered the conversation, does HDF and/or pytables 
have a standard way of dealing with this?

On 3/7/11 9:37 AM, Jeff Whitaker wrote:
 Chris:  The netcdf4-python modules reads netcdf vlen arrays

arethose a netcdf4 feature? So far, I'm still workign with netcdf3 -- 
though this could be a compelling reason to move on!

We've talked about this some on the CF list, and I don't think anyone 
brought that up.

 and returns
 numpy object arrays, where the elements of the object arrays are
 themselves 1d numpy arrays. I don't think there is any other way to do
 it.  In your example, the 'ragged' array would be a 1d numpy array with
 dtype='O', and the individual elements would be 1d numpy arrays with
 dtype=int.  Of course, these arrays are very awkward to deal with and
 operations will be slow.

Depends some of the operations, but yes.

That still may be the best option, but I'm exploring others.

is a vlen array stored contiguously in netcdf?

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ragged array implimentation

2011-03-07 Thread Christopher Barker

On 3/7/11 11:18 AM, Francesc Alted wrote:

 Well, I don't think there is such a 'standard' way for dealing with
 ragged arrays, but yes, pytables has support for them.  Creating them is
 easy:

 # Create a VLArray:
 fileh = tables.openFile('vlarray1.h5', mode='w')
 vlarray = fileh.createVLArray(fileh.root, 'vlarray1',
tables.Int32Atom(shape=()),
ragged array of ints,
filters=tables.Filters(1))
 # Append some (variable length) rows:
 vlarray.append(array([5, 6]))
 vlarray.append(array([5, 6, 7]))
 vlarray.append([5, 6, 9, 8])

 Then, you can access the rows in a variety of ways, like iterators:

 print '--', vlarray.title
 for x in vlarray:
  print '%s[%d]--  %s' % (vlarray.name, vlarray.nrow, x)

 --  ragged array of ints
 vlarray1[0]--  [5 6]
 vlarray1[1]--  [5 6 7]
 vlarray1[2]--  [5 6 9 8]

 or via __getitem__, using general fancy indexing:

 a_row = vlarray[2]
 a_list = vlarray[::2]
 a_list2 = vlarray[[0,2]]   # get list of coords
 a_list3 = vlarray[[0,-2]]  # negative values accepted
 a_list4 = vlarray[numpy.array([True,...,False])]  # array of bools


 but, instead of returning a numpy array of 'object' elements, plain
 python lists are returned instead.

which gives you the append option -- I can see how that would be 
usefull. Though I'd kind of like to have numpy ufunc/broadcasting 
performance. i.e.:

vlarray * some_constant

Be fast, and work out of the box!

  More info on VLArray object in:

 http://www.pytables.org/docs/manual/ch04.html#VLArrayClassDescr

 is a vlen array stored contiguously in netcdf?

great, thanks! that gives me an example of one API I might want to use.

 I don't really know, but one limitation of variable length arrays in
 HDF5 (and hence NetCDF4) is that they cannot be compressed (but that
 should be addressed in the future).

good to know.

Thanks to both you and Jeff, This has given me some things to ponder.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] rewriting NumPy code in C or C++ or similar

2011-03-07 Thread Christopher Barker

On 3/7/11 3:36 PM, Dan Halbert wrote:
 We currently have some straightforward NumPy code that indirectly implements 
 a C API defined by a third party. We built a Cython layer that directly 
 provides the API in a .a library, and then calls Python. The layering looks 
 like this:

C main program -  API in Cython -  Python -  NumPy

 This is difficult to package for distribution, because of the Python and 
 NumPy dependencies.

I'd say learn py2exe, py2app and friends, and be done with it.

Otherwise, I wonder if you could re-factor your Cython code enough to 
remove all python dependencies -- or at least have something you could 
compile and run without a python interpreter running.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How to get the prices of Moving Averages Crosses?

2011-03-01 Thread Christopher Barker

)
 # dates_ma20_greater_than_ma50  = (dates[ma20_greater_than_ma50])
 # prices_ma20_greater_than_ma50 = (prices[ma20_greater_than_ma50])

 print dates_cross
 print prices_cross
 #print dates_ma20_greater_than_ma50
 #print prices_ma20_greater_than_ma50


 plot.plot(prices)
 plot.plot(ma20)
 plot.plot(ma50)
 plot.show()
 [/quote]

 Someone can give me some clues?

 Best Regards,

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] using loadtxt() for given number of rows?

2011-02-14 Thread Christopher Barker

On 2/14/11 8:28 AM, Stéfan van der Walt wrote:
 You can always read chunks of the file into StringIO objects, and pass
 those into loadtxt.

true, but there isn't any single method for loading n lines of a file 
into a StringIO object, either. Ive always thought that 
file.readlines() should take a number of rows as an optional parameter.

not a big deal, now that we have list comprehensions, but still it would 
be nice, and it makes sense to put it into loadtxt() for sure.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] conditional array indexing

2011-02-14 Thread Christopher Barker

On 2/14/11 2:39 PM, Bryan Woods wrote:
 Thanks for your reply. Unfortunately it is not working that simply.

 This tells me that only integer arrays with one element can be
 converted to an index

Examples, example, examples!

I think this is what you want:

In [15]: land_cover
Out[15]:
array([[4, 2, 0, 4],
[0, 2, 1, 1],
[1, 1, 4, 2]])

In [16]: z0_legend
Out[16]: array([ 3.4,  5.2,  1.3,  4.2,  6.4])

In [17]: roughness = z0_legend[land_cover]

In [18]: roughness
Out[18]:
array([[ 6.4,  1.3,  3.4,  6.4],
[ 3.4,  1.3,  5.2,  5.2],
[ 5.2,  5.2,  6.4,  1.3]])

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] create a numpy array of images

2011-02-02 Thread Christopher Barker

 It seems that using 64 bit python is the solution.

It's certainly the easy way to access a lot of memory -- and memory is 
cheap these days.

 But the thing is i would
 compile my code and wanna distribute it to the clients..

I don't think 64 bit gets in the way of that -- except that it will only 
runon 64 bit systems, which may be an issue.

 only reason why i want to work on 32 bit system. Sturla, how I can make it
 sure that some part of the data is kept on the disk and only the necessary
 one in the memory; as this seems to be a solution to my problem.

You can roll your own and have a disk cache of some sort -- it would 
be pretty easy to store each image in a *.npz file and load the up as 
you need them.

But it would probably be even easier to use one of the hdf-based 
libarries, such as pytables -- I htink it will do it all for you.

One other option, that I've never tried, is carray, which is an array 
compressed in memory. Depending on your images, perhaps they would 
compress a lot (or not ):

https://github.com/FrancescAlted/carray
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052378.html



 As i said i
 want a 3d visualization out of the numpy array. it works fine for the
 downsampled dataset. And to visualize, i have to convert the 16bit data into
 8bit as PIL doesnt support 16 bit data.

It's unclear to me what you native data really is: 16 bitgreyscale? 8bit 
greyscale? either one should fit OK into 32 bit memory, and if 8bit is 
accurate enough foryour needs, then it should be pretty easy.


 stack = numpy.empty((120, 1024, 1024))

numpy defaults to double precision float, np.float64, i.e. 8 bytes per 
element -- you probably don't want that if you are concerned about 
memoery use, and have 8 or 16 bit greyscale images. Try:

stack = np.empty((120, 1024, 1024), dtype=np.uint8) # (or dtype=np.uint16)

 i = 0

 os.chdir(dirr)
 for f in os.listdir(dirr):

 im = Image.open(f)
 im = im.convert(L)

you ight want to try mode I. That should give you a 32 bit integer 
grey scale, which should hold all the 16 bit data without loss -- then 
you can convert to 16 bit when you bring it into numpy.

 a = numpy.asarray(im)
 print a.dtype

what does this print? it should be np.uint8

 stack[i] = a

here, if a is a uint8, numpy will convert it to a float64, to fit into 
array a -- that's why you want to set the dtype of array a when you 
create it.

 one more thing, it really doesnt work for tiff files at all, i have to
 convert them into jpgs as a prior step to this.

probably not the best choice either, jpeg is lossy -- is will smear 
things out a bit, which you may not want. It also only holds 24 bit RGB 
(I think), which is both a waste and will lose information from 16 bit 
greyscale. If you have to convert, try PNG, though I'm not sure if it 
handles 16 bit greyscale, either.

I'd look at a lib that can read tiff properly -- some have been 
suggested here, and you can also use GDAL, which is meant for 
geo-referenced data, but you can ignore the geo information and just get 
an image if you want.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] create a numpy array of images

2011-02-01 Thread Christopher Barker

On 2/1/11 12:39 AM, Asmi Shah wrote:
 I have one more question: how to avoid the limitation of memoryerror in
 numpy. as I have like 200 images to stack in the numpy array of say
 1024x1344 resolution.. have any idea apart from downsampling?

If I'm doing my math right, that's 262 MB, shouldn't be a problem in 
modern systems. That's 8bit, but 786MB if 24 bit RGB.

If you are careful about how many copies you're keeping around 
(including temporaries), you mau be OK still.

But if you really have big collections of images, you might try memory 
mapped arrays -- as Sturla pointed out they wont' let you create monster 
arrays on a 32 bit python, but maybe they do help with not clogging up 
memory too much? I don't know -- I haven't used them -- presumably they 
have a purpose.

Also, pytables is worth a look, as another way to get HDF5 on disk, but 
I think more natural access.

-Chris







-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] create a numpy array of images

2011-02-01 Thread Christopher Barker

On 2/1/11 8:31 AM, Friedrich Romstedt wrote:
 In case you *have* to downsample:

 I also ran into this, with the example about my 5 images ...
 im.resize((newx newy), PIL.Image.ANTIALIAS) will be your friend.
 http://www.pythonware.com/library/pil/handbook/image.htm.

If you want to downsample by a integer amount (i.e a factor of 2) in 
each dimension, I have some Cython code that optimizes that. I'm happy 
to send it along.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] using loadtxt() for given number of rows?

2011-01-31 Thread Christopher Barker

On 1/31/11 4:39 AM, Robert Cimrman wrote:
 I work with text files which contain several arrays separated by a few
 lines of other information, for example:

 POINTS 4 float
 -5.00e-01 -5.00e-01 0.00e+00
 5.00e-01 -5.00e-01 0.00e+00
 5.00e-01 5.00e-01 0.00e+00
 -5.00e-01 5.00e-01 0.00e+00

 CELLS 2 8
 3 0 1 2
 3 2 3 0

 I have used custom Python code with loops to read similar files, so the
 speed was not too good. Now I wonder if it would be possible to use the
 numpy.loadtxt() function for the array-like parts. It supports passing
 an open file object in, which is good, but it wants to read the entire
 file, which does not work in this case.

 It seems to me, that an additional parameter to loadtxt(), say nrows or
 numrows, would do the job,

I agree that that would be a useful feature. However, I'm not sure it 
would help performance much -- I think loadtxt is written in python as well.

One option in the meantime. If you know how many rows, you presumable 
know how many items on each row. IN that case, you can use:

np.fromfile(open_file, sep=' ', count=num_items_to_read)

It'll only work for multi-line text if the separator is whitespace, 
which it was in your example. But if it does, it should be pretty fast.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] create a numpy array of images

2011-01-28 Thread Christopher Barker

On 1/28/11 7:01 AM, Asmi Shah wrote:
 I am using python for a while now and I have a requirement of creating a
 numpy array of microscopic tiff images ( this data is 3d, meaning there are
 100 z slices of 512 X 512 pixels.) How can I create an array of images?

It's quite straightforward to create a 3-d array to hold this kind of data:

image_block = np.empty((100, 512, 512), dtype=??)

now you can load it up by using some lib (PIL, or ???) to load the tif 
images, and then:

for i in images:
 image_block[i,:,:] = i


note that I put dtype to ??? up there. What dtype you want is dependent 
on what's in the tiff images -- tiff can hold just about anything. So if 
they are say, 16 bit greyscale, you'd want:

dtype=np.uint16

if they are 24 bit rgb, you might want a custom dtype (I don't think 
there is a 24 bit dtype built in):

RGB_type = np.dtype([('r',np.uint8),('g',np.uint8),('b',np.uint8)])

for 32 bit rgba, you can use the same approach, or just a 32 bit integer.

The cool thing is that you can make views of this array with different 
dtypes, depending on what's easiest for the given use case. You can even 
break out the rgb parts into different axis:

image_block = np.empty((100, 512, 512), dtype=RGB_type)

image_block_rgb=image_block.view(dtype=np.uint8).reshape((100,512,512,3))

The two arrays now share the same data block, but you can look at them 
differently.

I think this a really cool feature of numpy.

   i then would like to use visvis for visualizing this in 3D.

you'll have to see what visvis is expecting in terms of data types, etc.

HTH,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.append numpy.where vs list.append and brute iterative for loop

2011-01-27 Thread Christopher Barker


On 1/27/11 1:53 PM, Sturla Molden wrote:

But N appends are O(N) for lists and O(N*N) for arrays.


hmmm - that doesn't seem quite right -- lists still have to re-allocate 
and copy, they just do it every n times (where n grows with the list), 
so I wouldn't expect exactly O(N).


But you never know 'till you profile. See the enclosed code and figures.

Interestingly both appear to be pretty linear, though the constant is 
Much larger for numpy arrays.


-Chris









--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov


append_time.py
Description: application/python
attachment: append_timing.png___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Developer NumPy list versus User NumPy list

2011-01-27 Thread Christopher Barker

On 1/27/11 3:35 PM, Travis Oliphant wrote:

 What is the thought about having two separate NumPy lists (one for 
 development discussions and one for user discussions)?

Speaking as someone who hasn't contributed code to numpy itself, I still 
really like to follow the development discussion, so I'll subscribe to 
both lists anyway, and filter them to the same place in my email. So it 
makes little difference to me.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.append numpy.where vs list.append and brute iterative for loop

2011-01-27 Thread Christopher Barker

On 1/27/11 3:54 PM, Sturla Molden wrote:
 Lists allocate empty slots at their back, proportional to their size. So
 as lists grows, re-allocations become rarer and rarer. Then on average
 the complexity per append becomes O(1), which is the amortised
 complexity. Appending N items to a list thus has the amortized
 complexity O(N).

I think I get that now...

 NumPy arrays are designed to be fixed size, and not designed to amortize
 the complexity of appends. So if you want to use arrays as efficient
 re-sizeable containers, you must code this logic yourself.

And I do get that. And yet, experimentally, appending numpy arrays (on 
that one simple example) appeared to be O(N). Granted, a much larger 
constant that for lists, but it sure looks linear to me.

Should it be O(N^2)? Maybe I need to run it for larger N , but I got 
impatient as it is.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Installing on CentOS 5 claims invalid Python installation

2010-12-23 Thread Christopher Barker

On 12/23/10 5:13 AM, Roger Martin wrote:
 NumPy looks like the way to get computation done in Python.

yup -- welcome!

 Now I'm
 going through the learning curve of installing the module into different
 linux OS's and Python versions.

hmm -- usually it's pretty straightforward on Linux (except maybe 
getting an optimized LAPACK, which you may or may not need).

 An extra need is to install google
 code's h5py http://code.google.com/p/h5py/ which depends on numpy.

I'll leave that for the next step.

 In trying a number of Python versions the 2.x's are yielding the message
  invalid Python installation
 ---
 raise DistutilsPlatformError(my_msg)
 distutils.errors.DistutilsPlatformError: invalid Python installation:
 unable to open
 /home/roger/Python-2.6.6/dist/lib/python2.6/config/Makefile (No such
 file or directory)
 ---

  From reading on the web it appears a Python-2.x.x-devel version is
 needed.

yup -- many of the Linux package systems split the stuff you need to run 
Python code from what you need to compile stuff against it -- common 
with other libs, packages as well.

 Yet no search combination comes back with where to get such a
 thing

each distro has it's own naming convention -- look for anything like 
python-devel, python-dev, etc.

 (note: I need user installs/builds for security reasons).

Ahh -- a different story -- AFAIK (and I'm NOT an expert) the distro's 
packages will install python into system directories -- if you really 
need each user to have their own install, you may need to install from 
source. That should be pretty straight forward, too. Get the source 
tarball from python.org, and follow the build instructions. You'll need 
to specify a user install in that process somehow.

The latest numpy should work with any recent python -- If you are free 
to choose, use 2.7.1 -- it's the latest production version of the 2.* 
series. 3.* is still a bit bleeding edge. YOU can grab the tarball here:

http://www.python.org/download/releases/2.7.1/

Once you've got a python working, a simple python setup.py install 
should do for numpy.

HTH,

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Christopher Barker


On 12/5/10 7:56 PM, Wai Yip Tung wrote:

I'm fairly new to numpy and I'm trying to figure out the right way to do
things. Continuing on my question about using recarray as a relation.


note that recarrays (or structured arrays, AFAIK, the difference is 
atturube access only -- I don't use recarrays) are far more static than 
a database table. So you may really want to use a database, or maybe 
pytables. Or maybe even just stick with lists.


But if you are keeping things in memory, should be able to do what you want.


In [339]: arr = np.array([
 .: (1, 2.2, 0.0),
 .: (3, 4.5, 0.0)
 .: ],
 .: dtype=[
 .: ('unit',int),
 .: ('price',float),
 .: ('amount',float),
 .: ]
 .: )

In [340]: data = arr.view(recarray)


One of the most common thing I want to do is to append rows to data.


numpy arrays do not naturally support appending, as you have discovered.


 I
think concatenate() might be the method.


yes.


But I get a problem:



In [342]: np.concatenate((data0,[1,9.0,9.0]))
---
TypeError Traceback (most recent call last)

c:\Python26\Lib\site-packages\numpy\ipython console  inmodule()

TypeError: expected a readable buffer object


concatenate expects two arrays to be joined. If you pass in something 
that can easily be turned into an array, it will work, but a tuple can 
be converted to multiple types of arrays, so it doesn't know what to do. 
So you need to re-construct the second array:


a2 = np.array( [(3,5.5, 3)], dtype=dt)
arr = np.concatenate( (arr, a2) )


In [343]: data.amount = data.unit * data.price


yup


But sometimes it may require me to add a new column not already exist,
e.g.:

In [344]: data.discount_price = data.price * 0.9


How can I add a new column?


you can't. what you need to do is create a new array with a new dtype 
that includes the new field.


The trick is that numpy only supports homogenous arrays -- evey item is 
the same data type. So when you could a strut array like above, numpy 
does not define it as a 2-d table, but rather, a 1-d array, each element 
of which is a structure.


so you need to do something like:

# create a new array
data2 = np.zeros(len(data), dtype=dt2)

# fill the array:
for field_name in dt.fields.keys():
data2[field_name] = data[field_name]

# now some calculations:
data2['discount_price'] = data2['price'] * 0.9

I don't know of a way to avoid that loop when filling the array.

Better yet -- anticipate your needs and create the array with all the 
fields you need in the first place.


You can see that ndarrays are pretty static -- struct arrays can be 
useful data storage, but are not very suitable when things are changing 
much.


You could write a class that wraps an andarray, and supports what you 
need better -- it could be a pretty usefull general purpose class, too. 
I've got one that handle the appending part, but nothing with adding new 
fields.


Here's appending with my class:

data3 = accumulator.accumulator(dtype = dt2)
data3.append((1, 2.2, 0.0, 0.0))
data3.append((3, 4.5, 0.0, 0.0))
data3.append((2, 1.2, 0.0, 0.0))
data3.append((5, 4.2, 0.0, 0.0))
print repr(data3)

# convert to regular array for calculations:
data3 = np.array(data3)

# now some calculations:
data3['discount_price'] = data3['price'] * 0.9

You wouldn't have to convert to a regular array, except that I haven't 
written the code to support field access yet -- I don't think it would 
be too hard, though.


I've enclosed some test code, and my accumulator class, in case you find 
it useful.




-Chris








--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov


struct_test.py
Description: application/python


accumulator.py
Description: application/python
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-06 Thread Christopher Barker

On 12/6/10 11:00 AM, Benjamin Root wrote:

 numpy.lib.recfunctions has a method for easily adding new columns.

cool! There is a lot of other nifty- looking stuff in there too. The OP 
should really take a look.

And maybe an appending function is in order, too.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 3 4 5 6 7 >

1 - 100 of 642 matches

Mail list logo