Re: [Numpy-discussion] striding through arbitrarily large files

2014-02-05 Thread Richard Hattersley
On 4 February 2014 15:01, RayS r...@blue-cove.com wrote:

  I was struggling with  methods of reading large disk files into numpy
 efficiently (not FITS or .npy, just raw files of IEEE floats from
 numpy.tostring()). When loading arbitrarily large files it would be nice to
 not bother reading more than the plot can display before zooming in. There
 apparently are no built in methods that allow skipping/striding...


Since you mentioned the plural files, are your datasets entirely
contained within a single file? If not, you might be interested in Biggus (
https://pypi.python.org/pypi/Biggus). It's a small pure-Python module that
lets you glue-together arrays (such as those from smmap) into a single
arbitrarily large virtual array. You can then step over the virtual array
and it maps it back to the underlying sources.

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about typenum

2013-10-08 Thread Richard Hattersley
Hi Valentin,

On 8 October 2013 13:23, Valentin Haenel valen...@haenel.co wrote:

 Certain functions, like
 `PyArray_SimpleNewFromData` `PyArray_SimpleNew` take a typeenum
 Is there any way to go from typeenum to something that can be
 passed to the dtype constructor, like mapping 12 - 'f8'?


If you just want the corresponding dtype instance (aka PyArray_Descr) then
`PyArray_DescrFromType` should be what you're after.

But if you really need the 'f8' string then I'd be tempted to get the
PyArray_Descr and then use the Python API (e.g. PyObject_GetAttrString) to
request the str attribute. Under the hood this attribute is implemented
by `arraydescr_protocol_typestr_get` but that's not part of the public API.

Regards,
Richard Hattersley
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy.correlate documentation

2013-10-08 Thread Richard Hattersley
Hi Bernard,

Looks like you're on to something - two other people have raised this
discrepancy before: https://github.com/numpy/numpy/issues/2588.
Unfortunately, when it comes to resolving the discrepancy one of the
previous comments takes the opposite view. Namely, that the docstring is
correct and the code is wrong.

Do different domains use different conventions here? Are there some
references to back up one stance or another?

But all else being equal, I'm guessing there'll be far more appetite for
updating the documentation than the code.

Regards,
Richard Hattersley


On 7 October 2013 22:09, Bernhard Spinnler bernhard.spinn...@gmx.netwrote:

 The numpy.correlate documentation says:

 correlate(a, v) = z[k] = sum_n a[n] * conj(v[n+k])

 In [1]: a = [1, 2]

 In [2]: v = [2, 1j]

 In [3]: z = correlate(a, v, 'full')

 In [4]: z
 Out[4]: array([ 0.-1.j,  2.-2.j,  4.+0.j])

 However, according to the documentation, z should be

 z[-1] = a[1] * conj(v[0]) = 4.+0.j
 z[0]  = a[0] * conj(v[0]) + a[1] * conj(v[1]) = 2.-2.j
 z[1] = a[0] * conj(v[1]) = 0.-1.j

 which is the time reversed version of what correlate() calculates.

 IMHO, the correlate() code is correct. The correct formula in the docs
 (which is also the correlation formula in standard text books) should be

 z[k] = sum_n a[n+k] * conj(v[n])

 Cheers,
 Bernhard
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about typenum

2013-10-08 Thread Richard Hattersley
On 8 October 2013 19:56, Valentin Haenel valen...@haenel.co wrote:

 I ended up using: PyArray_TypeObjectFromType
 from cython so:

 np.dtype(cnp.PyArray_TypeObjectFromType(self.ndtype)).str

 Maybe i can avoid the np.dtype call, when using PyArray_Descr?


In short: yes.

`PyArray_TypeObjectFromType` first uses `PyArray_DescrFromType` to figure
out the dtype from the type number, and then it returns the corresponding
array scalar type. Passing this array scalar type to `np.dtype` gets you
back to the dtype that had just been looked up inside TypeObjectFromType.

Regards,
Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing changes/deprecations

2013-09-27 Thread Richard Hattersley
On 27 September 2013 13:27, Sebastian Berg sebast...@sipsolutions.netwrote:

 And most importantly, is there any behaviour thing in the index
 machinery that is bugging you, which I may have forgotten until now?


Well, since you asked... I'd *love* to see the fancy indexing behaviour
moved to a separate method(s).

Yes, I know! I'm not realistically expecting that to be tackled right now.
And it sometimes seems like something of a sacred idol that one is not
supposed to question. But I've kept quiet on the issue for too long and
would love to know if anyone else thinks the same. It confuses people.
Actually, it confuses the hell out of people. I'm *still* finding out new
quirks of its behaviour and I've been using NumPy in a professional role
for years... although you should bear in mind I could just be a slow
learner. ;-)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removal of numarray and oldnumeric packages.

2013-09-24 Thread Richard Hattersley
On 23 September 2013 18:03, Charles R Harris charlesr.har...@gmail.comwrote:

 I have gotten no feedback on the removal of the numarray and oldnumeric
 packages. Consequently the removal will take place on 9/28. Scream now or
 never...


I know I always like to get feedback either way ... so +1 for removal.
Thanks.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP8

2013-09-09 Thread Richard Hattersley
 Something we have done in matplotlib is that we have made PEP8 a part of
the tests.

In Iris and Cartopy we've also done this and it works well. While we
transition we have an exclusion list (which is gradually getting shorter).
We've had mixed experiences with automatic reformatting, so prefer to keep
the human in the loop.

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic custom dtype

2013-06-28 Thread Richard Hattersley
On 21 June 2013 19:57, Charles R Harris charlesr.har...@gmail.com wrote:
 You could check the numpy/core/src/umath/test_rational.c.src code to see if
 you are missing something.

In v1.7+ the difference in behaviour between my code and the rational
test case is because my scalar type doesn't subclass np.generic (aka.
PyGenericArrType_Type).

In v1.6 this requirement doesn't exist ... mostly ... In other words,
it works as long as the supplied scalars are contained within a
sequence.
So:
np.array([scalar]) = np.array([scalar], dtype=my_dtype)
But:
np.array(scalar) = np.array(scalar, dtype=object)

For one of my scalar/dtype combos I can easily workaround the 1.7+
issue by just adding the subclass relationship. But another of my
dtypes is wrapping a third-party type so I can't modify the subclass
relationship. :-(

So I guess I have three questions.

Firstly, is there some cunning workaround when defining a dtype for a
third-party type?

Secondly, is the subclass-generic requirement in v1.7+ desirable
and/or intended? Or just an accidental regression?

And thirdly, assuming it's desirable to remove the subclass-generic
requirement, would it also make sense to make it work for scalars
which are not within a sequence?

NB. If we decide there's some work which needs doing here, then I
should be able to put time on it.

Thanks,
Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic custom dtype

2013-06-28 Thread Richard Hattersley
On 28 June 2013 17:33, Charles R Harris charlesr.har...@gmail.com wrote:
 On Fri, Jun 28, 2013 at 5:27 AM, Richard Hattersley rhatters...@gmail.com
 wrote:
 So:
 np.array([scalar]) = np.array([scalar], dtype=my_dtype)
 But:
 np.array(scalar) = np.array(scalar, dtype=object)

 So the scalar case (0 dimensional array) doesn't work right. Hmm, what
 happens when you index the first array? Does subclassing the generic type
 work in 1.6?

Indexing into the first array works fine. So something like `a[0]`
calls my_dtype-f-getitem which creates a new scalar instance, and
something like `a[:1]` creates a new view with the correct dtype.


 My impression is that subclassing the generic type should be required, but I
 don't see where it is documented :(

Can you elaborate on why the generic type should be required? Do you
think it might cause problems elsewhere? (FYI I've also tested with a
patched version of v1.6.2 which fixes the typo which prevents the use
of user-defined dtypes with ufuncs, and that functionality seems to
work fine too.)


 Anyway, what is the problem with the
 third party code? Is there no chance that you can get hold of it to fix it?

Unfortunately it's out of my control.


Regards,
Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic custom dtype

2013-06-24 Thread Richard Hattersley
On 21 June 2013 19:57, Charles R Harris charlesr.har...@gmail.com wrote:
 You could check the numpy/core/src/umath/test_rational.c.src code to see if
 you are missing something.

My code is based in large part on exactly those examples (I don't
think I could have got this far using the documentation alone!), but
I've rechecked and there's nothing obvious missing.

That said I think there may be something funny going on with error
handling within getitem and friends so I'm still following up on that.

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Automatic custom dtype

2013-06-21 Thread Richard Hattersley
Hi all,

In my continuing adventures in the Land of Custom Dtypes I've come
across some rather disappointing behaviour in 1.7  1.8.

I've defined my own class `Time360`, and a corresponding dtype
`time360` which references Time360 as its scalar type.

Now with 1.6.2 I can do:
 t = Time360(2013, 6, 29)
 np.array([t]).dtype
dtype('Time360')

And since all the instances supplied to the function were instances of
the scalar type for my dtype, numpy automatically created an array
using my dtype. Happy days!

But in 1.7 and 1.8 I get:
 np.array([t]).dtype
dtype('O')

So now I just get a plain old object array. Boo! Hiss!

Is this expected? Desirable? An unwanted regression?

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic custom dtype

2013-06-21 Thread Richard Hattersley
On 21 June 2013 14:49, Charles R Harris charlesr.har...@gmail.com wrote:
 Bit short on detail here ;) How did you create/register the dtype?

The dtype is created/registered during module initialisation with:
dtype = PyObject_New(PyArray_Descr, PyArrayDescr_Type);
dtype-typeobj = Time360Type;
...
PyArray_RegisterDataType(dtype)

Where Time360Type is my new type definition:
static PyTypeObject Time360Type = { ... }
which is initialised prior to the dtype creation.

If the detail matters then should I assume this is unexpected
behaviour and maybe I can fix my code so it works?

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Parameterised dtypes

2013-05-29 Thread Richard Hattersley
Hi Nathaniel,

Thanks for the useful feedback - it'll definitely save me some time
chasing around the code base.

 dtype callbacks and ufuncs don't in general get access to the
 dtype object, so they can't access whatever parameters exist

Indeed - it is a little awkward. But I'm hoping I can use the `data`
argument to supply this.

 You don't even need 'metadata' or 'c_metadata' -- this is Python, we
 already have a totally standard way to add new fields, just subclass
 the dumb thing.

That would be nice ... but Py_TPFLAGS_BASETYPE is not set for
PyArrayDescr_Type so that class is final.

 1) No, you can't hook into the dtype string parser. Though, are you
 sure you really want to? Surely it's nicer to use Python syntax
 instead of inventing a new syntax and then having to write a parser
 for it from scratch?

Thank you - that's good to know. As you say, I'd *far* rather avoid
parsing, but I'd like my dtype to be a good citizen so I was asking
out of completeness. (Off at a tangent: The blaze project is a good
example of what happens if you do add more parsing. In my opinion it's
not the way to go.)

 2) I have some vague plans worked out to fix all this so dtypes are
 just ordinary python objects, but haven't written it down yet due to a
 combination of lack of time to do so, and lack of anyone with time to
 actually implement the plan even if it were written down. I mention
 this just in case someone wants to volunteer, which would move it up
 my stack.

Would you have the time to sketch out the intended benefits?

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Parameterised dtypes

2013-05-29 Thread Richard Hattersley
Hi Andrew,

 Maybe a stupid question, but do you know a reference I could look at
 for the metadata and c_metadata fields you described?

Sorry ... no. I've not found anything. :-(

If I remember correctly, I got wind of the metadata aspect from the
mailing list discussions of datetime64. So for my current work I've
just been scratching around in the datetime64 code looking for example
usage.

Regards,
Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy sprints at Scipy 2013, Austin: call for topics and hands to help

2013-05-26 Thread Richard Hattersley
Hi David,

On 25 May 2013 15:23, David Cournapeau courn...@gmail.com wrote:

 As some of you may know, Stéfan and me will present a tutorial on
 NumPy C code, so if we do our job correctly, we should have a few new
 people ready to help out during the sprints.


Is there any chance you'll be repeating this at EuroSciPy?


Things I'd like to work on myself is looking into splitting things
 from multiarray, think about a better internal API for dtype
 registration/hooks (with the goal to remove any date dtype hardcoding
 in both multiarray and ufunc machinery), but I am sure others have
 more interesting ideas :)


I'm not able to get to SciPy so I understand if my vote of support doesn't
count ;-), but I'm very interested in the work on the dtype API. And if it
was on the radar for EuroSciPy there's a good chance I'd be able to help
out. (The combination of a NumPy C tutorial and dtype API work would make a
pretty compelling case for my managers.)

Regards,
Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Parameterised dtypes

2013-05-24 Thread Richard Hattersley
Hi all,

I'm in the process of defining some new dtypes to handle non-physical
calendars (such as the 360-day calendar used in the climate modelling
world). This is all going fine[*] so far, but I'd like to know a little bit
more about how much is ultimately possible.

The PyArray_Descr members `metadata` and `c_metadata` allow run-time
parametrisation, but is it possible to hook into the dtype('...') parsing
mechanism to supply those parameters? Or is there some other dtype
mechanism for supplying parameters?

As an example, would it be possible to supply month lengths?
 a = np.zeros(n, dtype='my_date[34,33,31,30,30,29,29,30,31,32,34,35]')

Or is the intended use of parametrisation more like:
 weird = my_stuff.make_dtype([34,33,31,30,30,29,29,30,31,32,34,35])
 a = np.zeros(n, dtype=weird)

[*] The docs could do with updating, and the examples would benefit from
standardising (or at least explaining the significance of the differences).
I intend to post updates where possible.

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Parameterised dtypes

2013-05-24 Thread Richard Hattersley
On 24 May 2013 15:12, Richard Hattersley rhatters...@gmail.com wrote:

 Or is the intended use of parametrisation more like:
  weird = my_stuff.make_dtype([34,33,31,30,30,29,29,30,31,32,34,35])
  a = np.zeros(n, dtype=weird)


Or to put it another way  I have a working `make_dtype` function (which
could easily be extended to do dtype caching), but is that the right way to
go about things?

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in deepcopy() of rank-zero arrays?

2013-04-30 Thread Richard Hattersley
+1 for getting rid of this inconsistency

We've hit this with Iris (a met/ocean analysis package - see github), and
have had to add several workarounds.


On 19 April 2013 16:55, Chris Barker - NOAA Federal
chris.bar...@noaa.govwrote:

 Hi folks,

 In [264]: np.__version__
 Out[264]: '1.7.0'

 I just noticed that deep copying a rank-zero array yields a scalar --
 probably not what we want.

 In [242]: a1 = np.array(3)

 In [243]: type(a1), a1
 Out[243]: (numpy.ndarray, array(3))

 In [244]: a2 = copy.deepcopy(a1)

 In [245]: type(a2), a2
 Out[245]: (numpy.int32, 3)

 regular copy.copy() seems to work fine:

 In [246]: a3 = copy.copy(a1)

 In [247]: type(a3), a3
 Out[247]: (numpy.ndarray, array(3))

 Higher-rank arrays seem to work fine:

 In [253]: a1 = np.array((3,4))

 In [254]: type(a1), a1
 Out[254]: (numpy.ndarray, array([3, 4]))

 In [255]: a2 = copy.deepcopy(a1)

 In [256]: type(a2), a2
 Out[256]: (numpy.ndarray, array([3, 4]))

 Array scalars seem to work fine as well:

 In [257]: s1 = np.float32(3)

 In [258]: s2 = copy.deepcopy(s1)

 In [261]: type(s1), s1
 Out[261]: (numpy.float32, 3.0)

 In [262]: type(s2), s2
 Out[262]: (numpy.float32, 3.0)

 There are other ways to copy arrays, but in this case, I had a dict
 with a bunch of arrays in it, and needed a deepcopy of the dict. I was
 surprised to find that my rank-0 array got turned into a scalar.

 -Chris

 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Richard Hattersley
 Since the files are huge, and would make me run out of memory, I need to
read data skipping some records

Is it possible to describe what you're doing with the data once you have
subsampled it? And if there were a way to work with the full resolution
data, would that be desirable?

I ask because I've been dabbling with a pure-Python library for handilng
larger-than-memory datasets - https://github.com/SciTools/biggus, and it
uses similar chunking techniques as mentioned in the other replies to
process data at the full streaming I/O rate. It's still in the early stages
of development so the design can be fluid, so maybe it's worth seeing if
there's enough in common with your needs to warrant adding your use case.

Richard


On 13 March 2013 13:45, Andrea Cimatoribus andrea.cimatori...@nioz.nlwrote:

 Hi everybody, I hope this has not been discussed before, I couldn't find a
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this.
 Since the files are huge, and would make me run out of memory, I need to
 read data skipping some records (I am reading data recorded at high
 frequency, so basically I want to read subsampling).
 At the moment, I came up with the code below, which is then compiled using
 cython. Despite the significant performance increase from the pure python
 version, the function is still much slower than numpy.fromfile, and only
 reads one kind of data (in this case uint32), otherwise I do not know how
 to define the array type in advance. I have basically no experience with
 cython nor c, so I am a bit stuck. How can I try to make this more
 efficient and possibly more generic?
 Thanks

 import numpy as np
 #For cython!
 cimport numpy as np
 from libc.stdint cimport uint32_t

 def cffskip32(fid, int count=1, int skip=0):

 cdef int k=0
 cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count,
 dtype=np.uint32)

 if skip=0:
 while kcount:
 try:
 data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
 fid.seek(skip, 1)
 k +=1
 except ValueError:
 data = data[:k]
 break
 return data
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b2 release

2012-09-20 Thread Richard Hattersley
Hi,

[First of all - thanks to everyone involved in the 1.7 release. Especially
Ondřej - it takes a lot of time  energy to coordinate something like this.]

Is there an up to date release schedule anywhere? The trac milestone still
references June.

Regards,
Richard Hattersley

On 20 September 2012 07:24, Ondřej Čertík ondrej.cer...@gmail.com wrote:

 Hi,

 I'm pleased to announce the availability of the second beta release of
 NumPy 1.7.0b2.

 Sources and binary installers can be found at
 https://sourceforge.net/projects/numpy/files/NumPy/1.7.0b2/

 Please test this release and report any issues on the numpy-discussion
 mailing list. Since beta1, we've fixed most of the known (back then)
 issues, except:

 http://projects.scipy.org/numpy/ticket/2076
 http://projects.scipy.org/numpy/ticket/2101
 http://projects.scipy.org/numpy/ticket/2108
 http://projects.scipy.org/numpy/ticket/2150

 And many other issues that were reported since the beta1 release. The
 log of changes is attached. The full list of issues that we still need
 to work on is at:

 https://github.com/numpy/numpy/issues/396

 Any help is welcome, the best is to send a PR fixing any of the issues
 -- against master, and I'll then back-port it to the release branch
 (unless it is something release specific, in which case just send the
 PR against the release branch).

 Cheers,
 Ondrej


 * f217517 Release 1.7.0b2
 * 50f71cb MAINT: silence Cython warnings about changes dtype/ufunc size.
 * fcacdcc FIX: use py24-compatible version of virtualenv on Travis
 * d01354e FIX: loosen numerical tolerance in test_pareto()
 * 65ec87e TST: Add test for boolean insert
 * 9ee9984 TST: Add extra test for multidimensional inserts.
 * 8460514 BUG: Fix for issues #378 and #392 This should fix the
 problems with numpy.insert(), where the input values were not checked
 for all scalar types and where values did not get inserted properly,
 but got duplicated by default.
 * 07e02d0 BUG: fix npymath install location.
 * 6da087e BUG: fix custom post_check.
 * 095a3ab BUG: forgot to build _dotblas in bento build.
 * cb0de72 REF: remove unused imports in bscript.
 * 6e3e289 FIX: Regenerate mtrand.c with Cython 0.17
 * 3dc3b1b Retain backward compatibility. Enforce C order.
 * 5a471b5 Improve ndindex execution speed.
 * 2f28db6 FIX: Add a test for Ticket #2066
 * ca29849 BUG: Add a test for Ticket #2189
 * 1ee4a00 BUG: Add a test for Ticket #1588
 * 7b5dba0 BUG: Fix ticket #1588/gh issue #398, refcount error in clip
 * f65ff87 FIX: simplify the import statement
 * 124a608 Fix returned copy
 * 996a9fb FIX: bug in np.where and recarray swapping
 * 7583adc MAINT: silence DeprecationWarning in np.safe_eval().
 * 416af9a pavement.py: rename yop to atlas
 * 3930881 BUG: fix bento build.
 * fbad4a7 Remove test_recarray_from_long_formats
 * 5cb80f8 Add test for long number in shape specifier of dtype string
 * 24da7f6 Add test for long numbers in numpy.rec.array formats string
 * 77da3f8 Allow long numbers in numpy.rec.array formats string
 * 99c9397 Use PyUnicode_DecodeUTF32()
 * 31660d0 Follow the C guidelines
 * d5d6894 Fix memory leak in concatenate.
 * 8141e1e FIX: Make sure the tests produce valid unicode
 * d67785b FIX: Fixes the PyUnicodeObject problem in py-3.3
 * a022015 Re-enable unpickling optimization for large py3k bytes objects.
 * 470486b Copy bytes object when unpickling an array
 * d72280f Fix tests for empty shape, strides and suboffsets on Python 3.3
 * a1561c2 [FIX] Add missing header so separate compilation works again
 * ea23de8 TST: set raise-on-warning behavior of NoseTester to release mode.
 * 28ffac7 REL: set version number to 1.7.0rc1-dev.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] easy way to change part of only unmasked elements value?

2012-09-11 Thread Richard Hattersley
Hi Chao,

If you don't mind modifying masked values, then if you write to the
underlying ndarray it won't touch the mask:

 a = np.ma.masked_less(np.arange(10),5)
 a.base[3:6] = 1
 a
masked_array(data = [-- -- -- -- -- 1 6 7 8 9],
 mask = [ True  True  True  True  True False False False False
False],
   fill_value = 99)

Regards,
Richard Hattersley


On 10 September 2012 17:43, Chao YUE chaoyue...@gmail.com wrote:

 Dear all numpy users,

 what's the easy way if I just want to change part of the unmasked array
 elements into another new value? like an example below:
 in my real case, I would like to change a subgrid of a masked numpy array
 to another value, but this grid include both masked and unmasked data.
 If I do a simple array[index1:index2, index3:index4] = another_value,
 those data with original True mask will change into False. I am using numpy
 1.6.2.
 Thanks for any ideas.

 In [91]: a = np.ma.masked_less(np.arange(10),5)

 In [92]: or_mask = a.mask.copy()
 In [93]: a
 Out[93]:
 masked_array(data = [-- -- -- -- -- 5 6 7 8 9],
  mask = [ True  True  True  True  True False False False False
 False],
fill_value = 99)


 In [94]: a[3:6]=1

 In [95]: a
 Out[95]:
 masked_array(data = [-- -- -- 1 1 1 6 7 8 9],
  mask = [ True  True  True False False False False False False
 False],
fill_value = 99)


 In [96]: a = np.ma.masked_array(a,mask=or_mask)

 In [97]: a
 Out[97]:
 masked_array(data = [-- -- -- -- -- 1 6 7 8 9],
  mask = [ True  True  True  True  True False False False False
 False],
fill_value = 99)

 Chao

 --

 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

 


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Richard Hattersley
Hi,

re: valgrind - to get better results you might try the suggestions from:
http://svn.python.org/projects/python/trunk/Misc/README.valgrind

Richard

On 31 August 2012 09:03, Ondřej Čertík ondrej.cer...@gmail.com wrote:

 Hi,

 There is segfault reported here:

 http://projects.scipy.org/numpy/ticket/1588

 I've managed to isolate the problem and even provide a simple patch,
 that fixes it here:

 https://github.com/numpy/numpy/issues/398

 however the patch simply doesn't decrease the proper reference, so it
 might leak. I've used
 bisection (took the whole evening unfortunately...) but the good news
 is that I've isolated commits
 that actually broke it. See the github issue #398 for details, diffs etc.

 Unfortunately, it's 12 commits from Mark and the individual commits
 raise exception on the segfaulting code,
 so I can't pin point the problem further.

 In general, how can I debug this sort of problem? I tried to use
 valgrind, with a debugging build of numpy,
 but it provides tons of false (?) positives:
 https://gist.github.com/3549063

 Mark, by looking at the changes that broke it, as well as at my fix,
 do you see where the problem could be?

 I suspect it is something with the changes in PyArray_FromAny() or
 PyArray_FromArray() in ctors.c.
 But I don't see anything so far that could cause it.

 Thanks for any help. This is one of the issues blocking the 1.7.0 release.

 Ondrej
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8

2012-06-28 Thread Richard Hattersley
The project/environment we work with already targets Python 2.7, so it'd be
fine for us and our collaborators. But it's hard to comment in a more
altruistic way without knowing the impact of the change. Is it possible to
summarise the benefits? (e.g. Simplifies NumPy codebase; allows better
support for XXX under 2.5+; ...)

On 28 June 2012 13:25, Travis Oliphant tra...@continuum.io wrote:

 Hey all,

 I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the
 1.7 release).  What does everyone think of that?

 -Travis

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-14 Thread Richard Hattersley
For what it's worth, I'd prefer ndmasked.

As has been mentioned elsewhere, some algorithms can't really cope with
missing data. I'd very much rather they fail than silently give incorrect
results. Working in the climate prediction business (as with many other
domains I'm sure), even the *potential* for incorrect results can be
damaging.


On 11 May 2012 06:14, Travis Oliphant tra...@continuum.io wrote:


 On May 10, 2012, at 12:21 AM, Charles R Harris wrote:



 On Wed, May 9, 2012 at 11:05 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Wednesday, May 9, 2012, Nathaniel Smith wrote:



 My only objection to this proposal is that committing to this approach
 seems premature. The existing masked array objects act quite
 differently from numpy.ma, so why do you believe that they're a good
 foundation for numpy.ma, and why will users want to switch to their
 semantics over numpy.ma's semantics? These aren't rhetorical
 questions, it seems like they must have concrete answers, but I don't
 know what they are.


 Based on the design decisions made in the original NEP, a re-made
 numpy.ma would have to lose _some_ features particularly, the ability to
 share masks. Save for that and some very obscure behaviors that are
 undocumented, it is possible to remake numpy.ma as a compatibility layer.

 That being said, I think that there are some fundamental questions that
 has concerned. If I recall, there were unresolved questions about behaviors
 surrounding assignments to elements of a view.

 I see the project as broken down like this:
 1.) internal architecture (largely abi issues)
 2.) external architecture (hooks throughout numpy to utilize the new
 features where possible such as where= argument)
 3.) getter/setter semantics
 4.) mathematical semantics

 At this moment, I think we have pieces of 2 and they are fairly
 non-controversial. It is 1 that I see as being the immediate hold-up here.
 3  4 are non-trivial, but because they are mostly about interfaces, I
 think we can be willing to accept some very basic, fundamental, barebones
 components here in order to lay the groundwork for a more complete API
 later.

 To talk of Travis's proposal, doing nothing is no-go. Not moving forward
 would dishearten the community. Making a ndmasked type is very intriguing.
 I see it as a set towards eventually deprecating ndarray? Also, how would
 it behave with no.asarray() and no.asanyarray()? My other concern is a
 possible violation of DRY. How difficult would it be to maintain two
 ndarrays in parallel?

 As for the flag approach, this still doesn't solve the problem of legacy
 code (or did I misunderstand?)


 My understanding of the flag is to allow the code to stay in and get
 reworked and experimented with while keeping it from contaminating
 conventional use.

 The whole point of putting the code in was to experiment and adjust. The
 rather bizarre idea that it needs to be perfect from the get go is
 disheartening, and is seldom how new things get developed. Sure, there is a
 plan up front, but there needs to be feedback and change. And in fact, I
 haven't seen much feedback about the actual code, I don't even know that
 the people complaining have tried using it to see where it hurts. I'd like
 that sort of feedback.


 I don't think anyone is saying it needs to be perfect from the get go.
  What I am saying is that this is fundamental enough to downstream users
 that this kind of thing is best done as a separate object.  The flag could
 still be used to make all Python-level array constructors build ndmasked
 objects.

 But, this doesn't address the C-level story where there is quite a bit of
 downstream use where people have used the NumPy array as just a pointer to
 memory without considering that there might be a mask attached that should
 be inspected as well.

 The NEP addresses this a little bit for those C or C++ consumers of the
 ndarray in C who always use PyArray_FromAny which can fail if the array has
 non-NULL mask contents.   However, it is *not* true that all downstream
 users use PyArray_FromAny.

 A large number of users just use something like PyArray_Check and then
 PyArray_DATA to get the pointer to the data buffer and then go from there
 thinking of their data as a strided memory chunk only (no extra mask).
  The NEP fundamentally changes this simple invariant that has been in NumPy
 and Numeric before it for a long, long time.

 I really don't see how we can do this in a 1.7 release.It has too many
 unknown and I think unknowable downstream effects.But, I think we could
 introduce another arrayobject that is the masked_array with a Python-level
 flag that makes it the default array in Python.

 There are a few more subtleties,  PyArray_Check by default will pass
 sub-classes so if the new ndmask array were a sub-class then it would be
 passed (just like current numpy.ma arrays and matrices would pass that
 check today).However, there is a PyArray_CheckExact macro which could
 

Re: [Numpy-discussion] record arrays and vectorizing

2012-05-03 Thread Richard Hattersley
Sounds like it could be a good match for `scipy.spatial.cKDTree`.

It can handle single-element queries...

 element = numpy.arange(1, 8)
 targets = numpy.random.uniform(0, 8, (1000, 7))
 tree = scipy.spatial.cKDTree(targets)
 distance, index = tree.query(element)
 targets[index]
array([ 1.68457267,  4.26370212,  3.14837617,  4.67616512,  5.80572286,
6.46823904,  6.12957534])

Or even multi-element queries (shown here searching for 3 elements in one
call)...

 elements = numpy.linspace(1, 8, 21).reshape((3, 7))
 elements
array([[ 1.  ,  1.35,  1.7 ,  2.05,  2.4 ,  2.75,  3.1 ],
   [ 3.45,  3.8 ,  4.15,  4.5 ,  4.85,  5.2 ,  5.55],
   [ 5.9 ,  6.25,  6.6 ,  6.95,  7.3 ,  7.65,  8.  ]])
 distances, indices = tree.query(element)
 targets[indices]
array([[ 0.24314961,  2.77933521,  2.00092505,  3.25180563,  2.05392726,
 2.80559459,  4.43030939],
   [ 4.19270199,  2.89257994,  3.91366449,  3.29262138,  3.6779851 ,
 4.06619636,  4.7183393 ],
   [ 6.58055518,  6.59232922,  7.00473346,  5.22612494,  7.07170015,
 6.54570121,  7.59566404]])

Richard Hattersley


On 2 May 2012 19:06, Moroney, Catherine M (388D) 
catherine.m.moro...@jpl.nasa.gov wrote:

 Hello,

 Can somebody give me some hints as to how to code up this function
 in pure python, rather than dropping down to Fortran?

 I will want to compare a 7-element vector (called element) to a large
 list of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the
 closest to element
 (determined by minimizing the Euclidean distance).

 For instance, in (slow) brute force form it would look like:

 element = numpy.array([1, 2, 3, 4, 5, 6, 7])
 target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

 min_length = .0
 min_index  =
 for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
  min_length = distance
  min_index  = i

 Now of course, the actual problem will be of a much larger scale.  I will
 have
 an array of elements, and a large number of potential targets.

 I was thinking of having element be an array where each element itself is
 a numpy.ndarray, and then vectorizing the code above so as an output I
 would
 have an array of the min_index and min_length values.

 I can get the following simple test to work so I may be on the right track:

 import numpy

 dtype = [(x, numpy.ndarray)]

 def single(data):
return data[0].min()

 multiple = numpy.vectorize(single)

 if __name__ == __main__:

a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=dtype)
for i in xrange(0, b.shape[0]):
b[i][x] = a[i,:]

print a
print b

x = multiple(b)
print x

 What is the best way of constructing b from a?  I tried b =
 numpy.recarray((4), dtype=dtype, buf=a)
 but I get a segmentation fault when I try to print b.

 Is there a way to perform this larger task efficiently with record arrays
 and vectorization, or
 am I off on the wrong track completely?  How can I do this efficiently
 without dropping
 down to Fortran?

 Thanks for any advice,

 Catherine
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-28 Thread Richard Hattersley
On 27 April 2012 17:42, Travis Oliphant tra...@continuum.io wrote:


 1) There is a lot of code out there that does not know anything about
 masks and is not used to checking for masks.It enlarges the basic
 abstraction in a way that is not backwards compatible *conceptually*.
  This smells fishy to me and I could see a lot of downstream problems from
 libraries that rely on NumPy.


That's exactly why I'd love to see plain arrays remain functionally
unchanged.

It's just a small, random sample, but here's how a few routines from NumPy
and SciPy sanitise their inputs...

numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray
scipy.spatial.KDTree - numpy.asarray
scipy.spatial.cKDTree - numpy.ascontiguousarray
scipy.integrate.odeint - PyArray_ContiguousFromObject
scipy.interpolate.interp1d - numpy.array
scipy.interpolate.griddata - numpy.asanyarray  numpy.ascontiguousarray

So, assuming numpy.ndarray became a strict subclass of some new masked
array, it looks plausible that adding just a few checks to numpy.ndarray to
exclude the masked superclass would prevent much downstream code from
accidentally operating on masked arrays.



 2) We cannot agree on how masks should be handled and consequently don't
 have a real plan for migrating numpy.ma to use these masks.   So, we are
 just growing the API and introducing uncertainty for unclear benefit ---
 especially for the person that does not want to use masks.


I've not yet looked at how numpy.ma users could be migrated. But if we make
masked arrays a strict superclass and leave the numpy/ndarray interface and
behaviour unchanged, API growth shouldn't be an issue. End-users will be
able to completely ignore the existence of masked arrays (except for the
minority(?) for whom the ABI/re-compile issue would be relevant).


 3) Subclassing in C in Python requires that C-structures are *binary*
 compatible.This implies that all subclasses have *more* attributes than
 the superclass.   The way it is currently implemented, that means that POAs
 would have these extra pointers they don't need sitting there to satisfy
 that requirement.   From a C-struct perspective it therefore makes more
 sense for MAs to inherit from POAs.Ideally, that shouldn't drive the
 design, but it's part of the landscape in NumPy 1.X


I'd hate to see the logical class hierarchy inverted (or collapsed to a
single class) just to save a pointer or two from the struct. Now seems like
a golden opportunity to fix the relationship between masked and plain
arrays. I'm assuming (and implicitly checking that assumption with this
statement!) that there's far more code using the Python interface to NumPy,
than there is code using the C interface. So I'm urging that the logical
consistency of the Python interface (and even the C and Cython interfaces)
takes precedence over the C-struct memory saving.

I'm not sure I agree with extra pointers they don't need. If we make
plain arrays a subclass of masked arrays, aren't these pointers essential
to ensure masked array methods can continue to work on plain arrays without
requiring special code paths?


 I have some ideas about how to move forward, but I'm anxiously awaiting
 the write-up that Mark and Nathaniel are working on to inform and enhance
 those ideas.


+1

As an aside, the implication of preserving the behaviour of the
numpy/ndarray interface is that masked arrays will need a *new* interface.

For example:
 import mumpy # Yes - I know it's a terrible name! But I had to write
*something* ... sorry! ;-)
 import numpy
 a = mumpy.array(...) # makes a masked array
 b = numpy.array(...) # makes a plain array
 isinstance(a, mumpy.ndarray)
True
 isinstance(b, mumpy.ndarray)
True
 isinstance(a, numpy.ndarray)
False
 isinstance(b, numpy.ndarray)
True

Richard Hattersley
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Richard Hattersley
I know used a somewhat jokey tone in my original posting, but fundamentally
it was a serious question concerning a live topic. So I'm curious about the
lack of response. Has this all been covered before?

Sorry if I'm being too impatient!


On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too awkward to be helpful. But ...

 Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

 In the library I'm working on, the introduction of MAs (via numpy.ma)
 required us to sweep through the library and make a fair few changes.
 That's not the sort of thing one would normally expect from the
 introduction of a subclass.

 Putting aside the ABI issue, would it help downstream API compatibility if
 the POA was a subclass of the MA? Code that's expecting/casting-to a POA
 might continue to work and, where appropriate, could be upgraded in their
 own time to accept MAs.

 Richard Hattersley

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A crazy masked-array thought

2012-04-27 Thread Richard Hattersley
Hi all,

Thanks for all your responses and for your patience with a newcomer. Don't
worry - I'm not going to give up yet. It's all just part of my learning the
ropes.

On 27 April 2012 14:05, Benjamin Root ben.r...@ou.edu wrote:

 snipYour idea is interesting, but doesn't it require C++?  Or maybe you
 are thinking of creating a new C type object that would contain all the new
 features and hold a pointer and function interface to the original POA.
 Essentially, the new type would act as a wrapper around the original
 ndarray?/snip

When talking about subclasses I'm just talking about the end-user
experience within Python. In other words, I'm starting from issubclass(POA,
MA) == True, and trying to figure out what the Python API implications
would be.


On 27 April 2012 14:55, Nathaniel Smith n...@pobox.com wrote:

 On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley
 rhatters...@gmail.com wrote:
  I know used a somewhat jokey tone in my original posting, but
 fundamentally
  it was a serious question concerning a live topic. So I'm curious about
 the
  lack of response. Has this all been covered before?
 
  Sorry if I'm being too impatient!

 That's fine, I know I did read it, but I wasn't sure what to make of
 it to respond :-)

  On 25 April 2012 16:58, Richard Hattersley rhatters...@gmail.com
 wrote:
 
  The masked array discussions have brought up all sorts of interesting
  topics - too many to usefully list here - but there's one aspect I
 haven't
  spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
 just
  too awkward to be helpful. But ...
 
  Shouldn't masked arrays (MA) be a superclass of the plain-old-array
 (POA)?
 
  In the library I'm working on, the introduction of MAs (via numpy.ma)
  required us to sweep through the library and make a fair few changes.
 That's
  not the sort of thing one would normally expect from the introduction
 of a
  subclass.
 
  Putting aside the ABI issue, would it help downstream API compatibility
 if
  the POA was a subclass of the MA? Code that's expecting/casting-to a POA
  might continue to work and, where appropriate, could be upgraded in
 their
  own time to accept MAs.

 This makes a certain amount of sense from a traditional OO modeling
 perspective, where classes are supposed to refer to sets of objects
 and subclasses are subsets and superclasses are supersets. This is the
 property that's needed to guarantee that if A is a subclass of B, then
 any code that expects a B can also handle an A, since all A's are B's,
 which is what you need if you're doing type-checking or type-based
 dispatch. And indeed, from this perspective, MAs are a superclass of
 POAs, because for every POA there's a equivalent MA (the one with the
 mask set to all-true), but not vice-versa.

 But, that model of OO doesn't have much connection to Python. In
 Python's semantics, classes are almost irrelevant; they're mostly just
 some convenience tools for putting together the objects you want, and
 what really matters is the behavior of each object (the famous duck
 typing). You can call isinstance() if you want, but it's just an
 ordinary function that looks at some attributes on an object; the only
 magic involved is that some of those attributes have underscores in
 their name. In Python, subclassing mostly does two things: (1) it's a
 quick way to define set up a class that's similar to another class
 (though this is a worse idea than it looks -- you're basically doing
 'from other_class import *' with all the usual tight-coupling problems
 that 'import *' brings). (2) When writing Python objects at the C
 level, subclassing lets you achieve memory layout compatibility (which
 is important because C does *not* do duck typing), and it lets you add
 new fields to a C struct.

 So at this level, MAs are a subclass of POAs, because MAs have an
 extra field that POAs don't...

 So I don't know what to think about subclasses/superclasses here,
 because they're such confusing and contradictory concepts that it's
 hard to tell what the actual resulting API semantics would be.


It doesn't seem essential that MAs have an extra field that POAs don't. If
POA was a subclass of MA, instances of POA could have the extra field set
to an all-valid/nothing-is-masked value. Granted, you'd want that to be
a special value so you're not lugging around a load of redundant data (and
you can optimise your processing for that), but I'm guessing you'd probably
want that kind of capability within MA anyway.


On 27 April 2012 15:33, Charles R Harris charlesr.har...@gmail.com wrote:



 On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley 
 rhatters...@gmail.com wrote:

 The masked array discussions have brought up all sorts of interesting
 topics - too many to usefully list here - but there's one aspect I haven't
 spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
 too

[Numpy-discussion] A crazy masked-array thought

2012-04-25 Thread Richard Hattersley
The masked array discussions have brought up all sorts of interesting
topics - too many to usefully list here - but there's one aspect I haven't
spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
too awkward to be helpful. But ...

Shouldn't masked arrays (MA) be a superclass of the plain-old-array (POA)?

In the library I'm working on, the introduction of MAs (via numpy.ma)
required us to sweep through the library and make a fair few changes.
That's not the sort of thing one would normally expect from the
introduction of a subclass.

Putting aside the ABI issue, would it help downstream API compatibility if
the POA was a subclass of the MA? Code that's expecting/casting-to a POA
might continue to work and, where appropriate, could be upgraded in their
own time to accept MAs.

Richard Hattersley
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib

2012-03-31 Thread Richard Hattersley
 1) The use of string constants to identify NumPy processes. It would
 seem better to use library defined constants (ufuncs?) for better
 future-proofing, maintenance, etc.

 I don't see how this would help with future-proofing or maintenance --
 can you elaborate?

 If this were C, I'd agree; using an enum would have a number of benefits:
  -- easier to work with than strings (== and switch work, no memory
 management hassles)
  -- compiler will notice if you accidentally misspell the enum name
  -- since you always in effect 'import *', getting access to
 additional constants doesn't require any extra effort
 But in Python none of these advantages apply, so I find it more
 convenient to just use strings.

Using constants provides for tab-completion and associated help text.
The help text can be particularly useful if the choice of constant
affects which extra keyword arguments can be specified.

And on a minor note, and far more subjectively (time for another
bike-shedding reference!), there's the cleanliness of API. (e.g.
Strings don't feel a good match. There are an infinite number of
strings, but only a small number are valid. There's nothing
machine-readable you can interrogate to find valid values.) Under the
hood you'll have to use the string to do a lookup, but the constant
can *be* the result of the lookup. Why re-invent the wheel when the
language gives it to you for free?

 Note also that we couldn't use ufuncs here, because we're specifying a
 rather unusual sort of operation -- there is no ufunc for padding with
 a linear ramp etc. Using mean as the example is misleading in this
 respect -- it's not really the same as np.mean.

 2) Why does only pad use this style of interface? If it's a good
 idea for pad, perhaps it should be applied more generally?
 numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone?

 The mode=foo interface style is actually used in other places, e.g.,
 np.linalg.qr.

My mistake - I misinterpreted the API earlier, so we're talking at
cross-purposes. My comment/question isn't really about pad  mode, but
about numpy more generally. But it still stands - albeit somewhat
hypothetically, since it's hard to imagine such a change taking place.

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Style for pad implementation in 'pad' namespace or functions under np.lib

2012-03-30 Thread Richard Hattersley
I like where this is going.

Driven by a desire to avoid a million different methods on a single
class, we've done something similar in our library.
So instead of
   thing.mean()
   thing.max(...)
   etc.
we have:
   thing.scrunch(MEAN, ...)
   thing.scrunch(MAX, ...)
   etc.
Where the constants like MEAN and MAX encapsulate the process to be
performed - including a reference to a NumPy/user-defined aggregation
function, as well as some other transformation details.

We then found we could reuse the same constants in other operations:
   thing.scrunch(MEAN, ...)
   thing.squish(MEAN, ...)
   thing.rolling_squish(MEAN, ...)

So I have two minor concerns with the current proposal.
1) The use of string constants to identify NumPy processes. It would
seem better to use library defined constants (ufuncs?) for better
future-proofing, maintenance, etc.
2) Why does only pad use this style of interface? If it's a good
idea for pad, perhaps it should be applied more generally?
numpy.aggregate(MEAN, ...), numpy.group(MEAN, ...), etc. anyone?

Richard Hattersley


On 30 March 2012 02:55, Travis Oliphant tra...@continuum.io wrote:

 On Mar 29, 2012, at 12:53 PM, Tim Cera wrote:

 I was hoping pad would get finished some day.  Maybe 1.9?


 You have been a great sport about this process.   I think it will result in
 something quite nice.


 Alright - I do like the idea of passing a function to pad, with a bunch of
 pre-made functions in place.

 Maybe something like:

     a = np.arange(10)
     b = pad('mean', a, 2, stat_length=3)

 where if the first argument is a string, use one of the built in functions.

 If instead you passed in a function:

     def padwithzeros(vector, pad_width, iaxis, **kwargs):
         bvector = np.zeros(pad_width[0])
         avector = np.zeros(pad_width[1])
         return bvector, avector

     b = pad(padwithzeros, a, 2)

 Would that have some goodness?


 +1

 -Travis


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy Memory Error with corrcoef

2012-03-27 Thread Richard Hattersley
 Both work on my computer, while your example indeed leads to a MemoryError
 (because shape 459375*459375 would be a decently big matrix...)

Nicely understated :)

For 32-bit values decently big = 786GB
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] label NA and datetime as experimental

2012-03-26 Thread Richard Hattersley
Hi,

My team are currently experimenting with extending datetime to allow
alternative, non-physical calendars (e.g. 360-day used by climate
modellers). Once we've got a handle on the options we'd like to
propose the extensions/changes back to NumPy. Obviously we'd like to
avoid wasted effort, so are there some aspects of datetime64 which are
more experimental than others? Is there a summary of unresolved issues
and/or plans for change?

Thanks,
Richard Hattersley

On 25 March 2012 13:57, Ralf Gommers ralf.gomm...@googlemail.com wrote:
 Hi,

 We decided to label both NA and datetime APIs as experimental for the 1.7.0
 release. I made a PR that does this, please review:
 https://github.com/numpy/numpy/pull/240

 Ralf


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] label NA and datetime as experimental

2012-03-26 Thread Richard Hattersley
OK - that's useful feedback.

Thanks!

On 26 March 2012 21:03, Ralf Gommers ralf.gomm...@googlemail.com wrote:


 On Mon, Mar 26, 2012 at 5:42 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:



 On Mon, Mar 26, 2012 at 2:29 AM, Richard Hattersley
 rhatters...@gmail.com wrote:

 Hi,

 My team are currently experimenting with extending datetime to allow
 alternative, non-physical calendars (e.g. 360-day used by climate
 modellers). Once we've got a handle on the options we'd like to
 propose the extensions/changes back to NumPy. Obviously we'd like to
 avoid wasted effort, so are there some aspects of datetime64 which are
 more experimental than others? Is there a summary of unresolved issues
 and/or plans for change?


 I believe datetime is already used by Pandas, so I don't think there will
 be major changes there. I'm not aware of open issues, but I could be wrong.
 The calenders are a bit independent, so I think the best procedure is to go
 ahead with your work. We want to leave some wiggle room since new features
 often need a little time to mature. That's how it looks to me anyway.


 That's my understanding too. Perhaps Mark can comment on the current status.
 That status and changes need to still be described in the release notes by
 the way.

 The experimental tag is mostly due to the datetime history: it was
 introduced in 1.4.0, removed again in 1.4.1, reintroduced in 1.6.0, the API
 then labeled not useful
 (http://thread.gmane.org/gmane.comp.python.numeric.general/44162/focus=44385),
 then more changes for this release. I hope it's stable now, but seeing what
 came before and that it still doesn't work with MinGW it's hard to be sure.

 Ralf


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using logical function on more than 2 arrays, availability of a between function ?

2012-03-19 Thread Richard Hattersley
What do you mean by efficient? Are you trying to get it execute
faster? Or using less memory? Or have more concise source code?

Less memory:
 - numpy.vectorize would let you get to the end result without any
intermediate arrays but will be slow.
 - Using the out parameter of numpy.logical_and will let you avoid
one of the intermediate arrays.

More speed?:
Perhaps putting all three boolean temporary results into a single
boolean array (using the out parameter of numpy.greater, etc) and
using numpy.all might benefit from logical short-circuiting.

And watch out for divide-by-zero from aNirChannel/aBlueChannel.

Regards,
Richard Hattersley

On 19 March 2012 11:04, Matthieu Rigal ri...@rapideye.net wrote:
 Dear Numpy fellows,

 I have actually a double question, which only aims to answer a single one :
 how to get the following line being processed more efficiently :

 array = numpy.logical_and(numpy.logical_and(aBlueChannel  1.0, aNirChannel 
 (aBlueChannel * 1.0)), aNirChannel  (aBlueChannel * 1.8))

 One possibility would have been to have the logical_and being able to handle
 more than two arrays

 Another one would have been to be able to make a double comparison or a
 between, like following one :

 array = numpy.logical_and((aBlueChannel  1.0), (1.0 
 aNirChannel/aBlueChannel  1.8))

 Is there any way to get the things work this way ? Would it else be a possible
 improvement for 1.7 or a later version ?

 Best Regards,
 Matthieu Rigal

 RapidEye AG
 Molkenmarkt 30
 14776 Brandenburg an der Havel
 Germany

 Follow us on Twitter! www.twitter.com/rapideye_ag

 Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
 Management Board/Vorstand: Ryan Johnson
 Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates:
 Robert Johnson
 Commercial Register/Handelsregister Potsdam HRB 24742 P
 Tax Number/Steuernummer: 048/100/00053
 VAT-Ident-Number/Ust.-ID: DE 199331235
 DIN EN ISO 9001 certified


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-03-01 Thread Richard Hattersley
+1 on the NEP guideline

As part of a team building a scientific analysis library, I'm
attempting to understand the current state of NumPy development and
its likely future (with a view to contributing if appropriate). The
proposed NEP process would make that a whole lot easier. And if
nothing else, it would reduce the chance of me posting questions about
topics that had already been discussed/decided!

Without the process the NEPs become another potential source of
confusion and mixed messages.


On 1 March 2012 03:02, Travis Oliphant wrote:
 I Would like to hear the opinions of others on that point,

 but yes,  I think that is an appropriate procedure.



 Travis



 --

 Travis Oliphant

 (on a mobile)

 512-826-7480





 On Feb 29, 2012, at 10:54 AM, Matthew Brett

 matthew.br...@gmail.com wrote:



  Hi,

 

  On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant

 tra...@continuum.io wrote:

  We already use the NEP process for such decisions.   This

 discussion came from simply from the *idea* of writing such a NEP.

 

  Nothing has been decided.  Only opinions have been shared

 that might influence the NEP.  This is all pretty premature,

 though ---  migration to C++ features on a trial branch is

 some months away were it to happen.

 

  Fernando can correct me if I'm wrong, but I think he was asking a

  governance question.   That is: would you (as BDF$N) consider the

  following guideline:

 

  As a condition for accepting significant changes to Numpy, for each

  significant change, there will be a NEP.  The NEP shall follow the

  same model as the Python PEPs - that is - there will be a summary of

  the changes, the issues arising, the for / against opinions and

  alternatives offered.  There will usually be a draft implementation.

  The NEP will contain the resolution of the discussion as it

 relates to

  the code

 

  For example, the masked array NEP, although very

 substantial, contains

  little discussion of the controversy arising, or the intended

  resolution of the controversy:

 

 

 https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52

 436fb4207e49f5/doc/neps/missing-data.rst

 

  I mean, although it is useful, it is not in the form of a PEP, as

  Fernando has described it.

 

  Would you accept extending the guidelines to the NEP format?

 

  Best,

 

  Matthew

  ___

  NumPy-Discussion mailing list

  NumPy-Discussion@scipy.org

  http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___

 NumPy-Discussion mailing list

 NumPy-Discussion@scipy.org

 http://mail.scipy.org/mailman/listinfo/numpy-discussion






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion