from:"Pauli Virtanen"

Re: [Numpy-discussion] Sorting objects with ndarrays

2010-02-28 Thread Pauli Virtanen

su, 2010-02-28 kello 10:25 +0100, Gael Varoquaux kirjoitti:
[clip]
 The problem is that ndarrays cannot be compared. So I have tried to
 override the 'cmp' in the 'sorted' function, however I am comparing
 fairly complex objects, and I am having a hard time predicting wich
 member of the object will contain the array. 

I don't understand what predicting which member of the object means?
Do you mean that in the array, you have classes that contain ndarrays as
their attributes, and the classes have __cmp__ implemented?

If not, can you tell why

def xcmp(a, b):
a_nd = isinstance(a, ndarray)
b_nd = isinstance(b, ndarray)

if a_nd and b_nd:
pass # compare ndarrays in some way
elif a_nd:
return 1  # sort ndarrays first
elif b_nd:
return -1 # sort ndarrays first
else:
return cmp(a, b) # ordinary compare

does not work?

Cheers,
Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] setting decimal accuracy in array operations (scikits.timeseries)

2010-03-03 Thread Pauli Virtanen

ke, 2010-03-03 kello 21:09 +0100, Marco Tuckner kirjoitti:
 am using the scikit.timeseries to convert a hourly timeseries to a lower
 frequency unsing the appropriate function [1].
 
 When I compare the result to the values calculated with a Pivot table in
 Excel there is a difference in the values which reaches quite high
 values in the total sum of all monthly values.
 
 I found out that the differnec arises from different decimal settings:
 
 In Python the numbers show:
 12.
 
 whereas in Excel I see:
 12.8

Typically, the internal precision used in Python and Numpy is
significantly more than what is printed. Most likely, your problem has a
different cause. Are you sure Excel is using a high enough accuracy?

If you want more help, it would be useful to post a self-contained code
that demonstrates the error.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PSF GSoC 2010 (Py3K focus)

2010-03-09 Thread Pauli Virtanen

Mon, 08 Mar 2010 22:39:00 -0700, Charles R Harris wrote:
 On Mon, Mar 8, 2010 at 10:29 PM, Jarrod Millman
 mill...@berkeley.eduwrote:
  I added Titus' email regarding the PSF's focus on Py3K-related
  projects to our SoC ideas wiki page:
  http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
 
  Given Titus' email, this is the most likely list of projects we will
  get accepted this year:
 
  - finish porting NumPy to Py3K

 I think Numpy is pretty much done. It needs use testing to wring out any
 small oddities, but it doesn't look to me like a GSOC project at the
 moment. Maybe Pauli can weigh in here.

I think it's pretty much done. Even f2py should work. What's left is 
mostly testing it, and fixing any issues that crop up.

Note that the SVN Numpy should preferably still see more testing on 
Python 2 against real-world packages that use it, before release. There's 
been a significant amount of code churn.

 - port SciPy to Py3K
 
 This project might be more appropriate, although I'm not clear on what
 needs to be done.

I think porting Scipy proceeds in these steps:

1. Set up a similar 2to3-using build system for Python 3 as currently in
   Numpy.

2. Change the code so that it works both on Python 2 and Python 3.
   This can be done one submodule at a time.

   For C code, the changes required are mainly PyString usage. Some macros
   need to be defined to allow the same code to build on Py2 and Py3.
   Scipy is probably easier to port than Numpy here, since IIRC it doesn't
   mess much with strings, and its use of the Python C-API is much more
   limited. Also, parts written with Cython need essentially no porting.

   For Python code, the main issue is again bytes vs. unicode fixes,
   mainly inserting numpy.compat.asbytes() at correct locations to make
   e.g. bytes literals. All I/O code should be changed to work solely with
   Bytes. Since 2to3 takes care of most the other things, the remaining
   work is in fixing things it mishandles.

I don't have a good estimate for how much work is in making Scipy work. 
I'd guess the work needed is probably slightly less than for Numpy.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-09 Thread Pauli Virtanen

ti, 2010-03-09 kello 21:14 +0100, Johann Cohen-Tanugi kirjoitti:
 thinking about it, this morning there was a fedora update to python, so 
 I am using 2.6.2-4.fc12.
 Looks like the problem is in python itself, hence this piece of info.

That the problem would be in Python is not so clear to me. Can you try
running it with the previous Python shipped by Fedora? Do you see the
problem then? What's the previous version, btw?

Memory errors are somewhat difficult to debug. Can you try running only
a certain subset of the tests, first

nosetests numpy.core

Be sure to set Pythonpath so that you get the correct Numpy version. If
it segfaults, proceed to (under numpy/core/tests)

nosetests test_multiarray.py
nosetests test_multiarray.py:TestNewBufferProtocol

Since the crash occurs in cyclic garbage collection, I'm doubting a bit
the numpy/core/src/multiarray/numpymemoryview.c implementation, since
that's the only part in Numpy that supports that.

Alternatively, just replace numpymemoryview.c with the attached one
which has cyclic GC stripped, and see if you still get the crash.

Cheers,
Pauli

/*
 * Simple PyMemoryView'ish object for Python 2.6 compatibility.
 *
 * On Python = 2.7, we can use the actual PyMemoryView objects.
 *
 * Some code copied from the CPython implementation.
 */

#define PY_SSIZE_T_CLEAN
#include Python.h
#include structmember.h

#define _MULTIARRAYMODULE
#define NPY_NO_PREFIX
#include numpy/arrayobject.h
#include numpy/arrayscalars.h

#include npy_config.h
#include npy_3kcompat.h

#include numpymemoryview.h


#if (PY_VERSION_HEX = 0x0206)  (PY_VERSION_HEX  0x0207)

/*
 * Memory allocation
 */

static void
memorysimpleview_dealloc(PyMemorySimpleViewObject *self)
{
Py_CLEAR(self-base);
if (self-view.obj != NULL) {
PyBuffer_Release(self-view);
self-view.obj = NULL;
}
}

static PyObject *
memorysimpleview_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds)
{
PyObject *obj;
static char *kwlist[] = {object, 0};
if (!PyArg_ParseTupleAndKeywords(args, kwds, O:memorysimpleview, kwlist,
 obj)) {
return NULL;
}
return PyMemorySimpleView_FromObject(obj);
}


/*
 * Buffer interface
 */

static int
memorysimpleview_getbuffer(PyMemorySimpleViewObject *self,
   Py_buffer *view, int flags)
{
return PyObject_GetBuffer(self-base, view, flags);
}

static void
memorysimpleview_releasebuffer(PyMemorySimpleViewObject *self,
   Py_buffer *view)
{
PyBuffer_Release(view);
}

static PyBufferProcs memorysimpleview_as_buffer = {
(readbufferproc)0,   /*bf_getreadbuffer*/
(writebufferproc)0, /*bf_getwritebuffer*/
(segcountproc)0,/*bf_getsegcount*/
(charbufferproc)0,   /*bf_getcharbuffer*/
(getbufferproc)memorysimpleview_getbuffer, /* bf_getbuffer */
(releasebufferproc)memorysimpleview_releasebuffer, /* bf_releasebuffer */
};


/*
 * Getters
 */

static PyObject *
_IntTupleFromSsizet(int len, Py_ssize_t *vals)
{
int i;
PyObject *o;
PyObject *intTuple;

if (vals == NULL) {
Py_INCREF(Py_None);
return Py_None;
}
intTuple = PyTuple_New(len);
if (!intTuple) return NULL;
for(i=0; ilen; i++) {
o = PyInt_FromSsize_t(vals[i]);
if (!o) {
Py_DECREF(intTuple);
return NULL;
}
PyTuple_SET_ITEM(intTuple, i, o);
}
return intTuple;
}

static PyObject *
memorysimpleview_format_get(PyMemorySimpleViewObject *self)
{
return PyUString_FromString(self-view.format);
}

static PyObject *
memorysimpleview_itemsize_get(PyMemorySimpleViewObject *self)
{
return PyLong_FromSsize_t(self-view.itemsize);
}

static PyObject *
memorysimpleview_shape_get(PyMemorySimpleViewObject *self)
{
return _IntTupleFromSsizet(self-view.ndim, self-view.shape);
}

static PyObject *
memorysimpleview_strides_get(PyMemorySimpleViewObject *self)
{
return _IntTupleFromSsizet(self-view.ndim, self-view.strides);
}

static PyObject *
memorysimpleview_suboffsets_get(PyMemorySimpleViewObject *self)
{
return _IntTupleFromSsizet(self-view.ndim, self-view.suboffsets);
}

static PyObject *
memorysimpleview_readonly_get(PyMemorySimpleViewObject *self)
{
return PyBool_FromLong(self-view.readonly);
}

static PyObject *
memorysimpleview_ndim_get(PyMemorySimpleViewObject *self)
{
return PyLong_FromLong(self-view.ndim);
}


static PyGetSetDef memorysimpleview_getsets[] =
{
{format, (getter)memorysimpleview_format_get, NULL, NULL, NULL},
{itemsize, (getter)memorysimpleview_itemsize_get, NULL, NULL, NULL},
{shape, (getter)memorysimpleview_shape_get, NULL, NULL, NULL},
{strides, (getter)memorysimpleview_strides_get, NULL, NULL, NULL},
{suboffsets, (getter)memorysimpleview_suboffsets_get, NULL, NULL, NULL},
{readonly, (getter)memorysimpleview_readonly_get, NULL, NULL, NULL},
{ndim,

Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-09 Thread Pauli Virtanen

 more fun :
 [co...@jarrett tests]$ pwd
 /home/cohen/sources/python/numpy/numpy/core/tests
 [co...@jarrett tests]$ python -c 'import test_ufunc'
 python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs
 != 0' failed.
 Aborted (core dumped)

What happens if you only import the umath_tests module (or something, it's a 
.so under numpy.core)?

Just importing test_ufunc.py probably doesn't run a lot of extension code. 
Since you don't get crashes only by importing numpy, I really don't see many 
alternatives any more...

Thanks,
Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-10 Thread Pauli Virtanen

Wed, 10 Mar 2010 10:28:07 +0100, Johann Cohen-Tanugi wrote:

 On 03/10/2010 01:55 AM, Pauli Virtanen wrote:
 more fun :
 [co...@jarrett tests]$ pwd
 /home/cohen/sources/python/numpy/numpy/core/tests [co...@jarrett
 tests]$ python -c 'import test_ufunc' python: Modules/gcmodule.c:277:
 visit_decref: Assertion `gc-gc.gc_refs != 0' failed.
 Aborted (core dumped)
  
 What happens if you only import the umath_tests module (or something,
 it's a .so under numpy.core)?

 [co...@jarrett core]$ export
 PYTHONPATH=/home/cohen/.local/lib/python2.6/site-packages/numpy/core:
$PYTHONPATH
 [co...@jarrett core]$ python
 Python 2.6.2 (r262:71600, Jan 25 2010, 18:46:45) [GCC 4.4.2 20091222
 (Red Hat 4.4.2-20)] on linux2 Type help, copyright, credits or
 license for more information.
   import umath_tests
  
 python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs
 != 0' failed.
 Aborted (core dumped)
 
 so this import also trigger the crash at exit...

Then it is clear that the umath_tests module does something that is not 
permitted. It's possible that there is a some sort of a refcount error 
somewhere in the generalized ufuncs mechanisms -- that part of Numpy is 
not heavily used.

Bug spotting challenge: Start from umath_tests.c.src:initumath_tests, 
follow the execution, and spot the bug (if any).

Pauli

PS. it might be a good idea to file a bug ticket now

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-10 Thread Pauli Virtanen

Wed, 10 Mar 2010 15:40:04 +0100, Johann Cohen-Tanugi wrote:
 Pauli, isn't it hopeless to follow the execution of the source code when
 the crash actually occurs when I exit, and not when I execute. I would
 have to understand enough of this umath_tests.c.src to spot a refcount
 error or things like that

Yeah, it's not easy, and requires knowing how to track this type of 
errors. I didn't actually mean that you should try do it, just posed it 
as a general challenge to all interested parties :)

On a more serious note, maybe there's a compilation flag or something in 
Python that warns when refcounts go negative (or something).

Cheers,
Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subclassing ndarray in python3

2010-03-11 Thread Pauli Virtanen

Hi Darren,

to, 2010-03-11 kello 11:11 -0500, Darren Dale kirjoitti:
 Now that the trunk has some support for python3, I am working on
 making Quantities work with python3 as well. I'm running into some
 problems related to subclassing ndarray that can be illustrated with a
 simple script, reproduced below. It looks like there is a problem with
 the reflected operations, I see problems with __rmul__ and __radd__,
 but not with __mul__ and __add__:

Thanks for testing. I wish the test suite was more complete (hint!
hint! :)

Yes, Python 3 introduced some semantic changes in how subclasses of
builtin classes (= written in C) inherit the __r*__ operations.

Below I'll try to explain what is going on. We probably need to change
some things to make things work better on Py3, within the bounds we are
able to.

Suggestions are welcome. The most obvious one could be to explicitly
implement __rmul__ etc. on Python 3.

[clip]
 class A(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)
 
 class B(A):
 def __mul__(self, other):
 return self.view(A).__mul__(other)
 def __rmul__(self, other):
 return self.view(A).__rmul__(other)
 def __add__(self, other):
 return self.view(A).__add__(other)
 def __radd__(self, other):
 return self.view(A).__radd__(other)
[clip]
 print('A __rmul__:')
 print(a.__rmul__(2))
 # yields NotImplemented
 print(a.view(np.ndarray).__rmul__(2))
 # yields NotImplemented

Correct. ndarray does not implement __rmul__, but relies on an automatic
wrapper generated by Python.

The automatic wrapper (wrap_binaryfunc_r) does the following:

1. Is `type(other)` a subclass of `type(self)`?
   If yes, call __mul__ with swapped arguments.
2. If not, bail out with NotImplemented.

So it bails out.

Previously, the ndarray type had a flag that made Python to skip the
subclass check. That does not exist any more on Python 3, and is the
root of this issue.

 print(2*a)
 # ok !!??

Here, Python checks

1. Does nb_multiply from the left op succeed? Nope, since floats don't
   know how to multiply ndarrays.

2. Does nb_multiply from the right op succeed? Here the execution
   passes *directly* to array_multiply, completely skipping the __rmul__
   wrapper.

   Note also that in the C-level number protocol there is only a single
   multiplication function for both left and right multiplication.

[clip]
 print('B __rmul__:')
 print(b.__rmul__(2))
 # yields NotImplemented
 print(b.view(A).__rmul__(2))
 # yields NotImplemented
 print(b.view(np.ndarray).__rmul__(2))
 # yields NotImplemented
 print(2*b)
 # yields: TypeError: unsupported operand type(s) for *: 'int' and 'B'

But here, the subclass calls the wrapper ndarray.__rmul__, which wants
to be careful with types, and hence fails.

Yes, probably explicitly defining __rmul__ for ndarray could be the
right solution. Please file a bug report on this.

Cheers,
Pauli



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] problem with applying patch

2010-03-13 Thread Pauli Virtanen

Sat, 13 Mar 2010 12:06:24 -0700, z99719 z99719 wrote:
[clip]
 The patch files are placed in the top level numpy sandbox directory and
 are in xml format rather than a diff file. I used save link as on the
 link to download the patch on the Attachments section of the link
 page.

That gives you the HTML page. Click on the link, and choose
Download in other formats: Original Format from the bottom of the page.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] integer division rule?

2010-03-13 Thread Pauli Virtanen

la, 2010-03-13 kello 15:32 -0500, Alan G Isaac kirjoitti:
 Francesc Altet once provided an example that for integer
 division, numpy uses the C99 rule: round towards 0.
 This is different than Python's rule for integer division:
 round towards negative infinity.
 
 But I cannot reproduce his example. (NumPy 1.3.)
 Did this behavior change at some point?

It was changed in r5888. What the rationale was is not clear from the
commit message.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

2010-03-20 Thread Pauli Virtanen

Anne Archibald wrote:
 I'm not knocking numpy; it does (almost) the best it can. (I'm not
 sure of the optimality of the order in which ufuncs are executed; I
 think some optimizations there are possible.)

Ufuncs and reductions are not performed in a cache-optimal fashion, IIRC 
dimensions are always traversed in order from left to right. Large speedups are 
possible in some cases, but in a quick try I didn't manage to come up with an 
algorithm that would always improve the speed (there was a thread about this 
last year or so, and there's a ticket). Things varied between computers, so 
this probably depends a lot on the actual cache arrangement.

But perhaps numexpr has such heuristics, and we could steal them?

-- 
Pauli Virtanen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array bug with character (regression from 1.4.0)

2010-03-21 Thread Pauli Virtanen

Ryan May wrote:
 The following code, which works with numpy 1.4.0, results in an error:

Python 2.6, I presume?

 In [1]: import numpy as np
 In [2]: v = 'm'
 In [3]: dt = np.dtype('c')
 In [4]: a = np.asarray(v, dt)

 On SVN trunk:
 ValueError: assignment to 0-d array

 In [5]: np.__version__
 Out[5]: '2.0.0.dev8297'

 Thoughts?

Nope, but it's likely my bad. Smells a bit like the dtype 'c' has size 0 that 
doesn't get automatically adjusted in the array constructor, so you end up 
assigning size-1 string to size-0 array element... Which is strange sice I 
don't think this particular code path has been changed at all.

One would have to follow the C execution path with gdb to find out what goes 
wrong.

-- 
Pauli Virtanen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array bug with character (regression from 1.4.0)

2010-03-21 Thread Pauli Virtanen

su, 2010-03-21 kello 16:13 -0600, Charles R Harris kirjoitti:
 I was wondering if this was related to Michael's fixes for
 character arrays? A little bisection might help localize the problem.

It's a bug I introduced in r8144... I forgot one *can* assign strings to
0-d arrays, and strings are indeed one sequence type.

I'm going to back that changeset out, since it was only a cosmetic fix.
That particular part of code needs some cleanup (it's a bit too hairy if
things like this can slip), but I don't have the time at the moment to
come up with a more complete fix.

Cheers,
Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

2010-03-22 Thread Pauli Virtanen

la, 2010-03-20 kello 17:36 -0400, Anne Archibald kirjoitti:
 I was in on that discussion. My recollection of the conclusion was
 that on the one hand they're useful, carefully applied, while on the
 other hand they're very difficult to reliably detect (since you don't
 want to forbid operations on non-overlapping slices of the same
 array).

I think one alternative brought up was

copy if unsure whether the slices overlap

which would make

A[whatever] = A[whatever2]

be always identical in functionality to

A[whatever] = A[whatever2].copy()

which is how things should work. This would permit optimizing simple
cases (at least 1D), and avoids running into NP-completeness (for numpy,
the exponential growth is however limited by NPY_MAXDIMS which is 64,
IIRC).

This would be a change in semantics, but in a very obscure corner that
hopefully nobody relies on.

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] StringIO test failure with Python3.1.2

2010-03-24 Thread Pauli Virtanen

ke, 2010-03-24 kello 09:20 -0600, Charles R Harris kirjoitti:
 What would be the best fix? Should we rename io to something like
 npyio?

That, or:

Disable import conversions in tools/py3tool.py for that particular file,
and fix any import errors manually so that the same code works both for
Python 2 and Python 3. Actually, I suspect there are no import errors,
since the top-level imports in that file are already absolute.

Anyway, it's a bug in 2to3. I suppose it first converts

from StringIO import StringIO

to

from io import StringIO

and then runs the import fixer, which does

from .io import StringIO

and that's a bug.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] StringIO test failure with Python3.1.2

2010-03-24 Thread Pauli Virtanen

ke, 2010-03-24 kello 10:28 -0500, Robert Kern kirjoitti:
 utils.py is the only file in there that imports StringIO. It should
 probably do a local import from io import BytesIO because io.py
 already contains some Python3-awareness:
 
 if sys.version_info[0] = 3:
 import io
 BytesIO = io.BytesIO
 else:
 from cStringIO import StringIO as BytesIO

The lookfor stuff in utils.py deals with docstrings, so it probably has
to use StringIO instead of BytesIO for unicode-cleanliness.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] StringIO test failure with Python3.1.2

2010-03-24 Thread Pauli Virtanen

Wed, 24 Mar 2010 13:35:51 -0500, Bruce Southey wrote:
[clip]
  elif isinstance(item, collections.Callable):
File /usr/local/lib/python3.1/abc.py, line 121, in
__instancecheck__
  subclass = instance.__class__
 AttributeError: 'PyCapsule' object has no attribute '__class__'

Seems like another Python bug. All objects probably should have the 
__class__ attribute...

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should ndarray implement round for py3k?

2010-03-25 Thread Pauli Virtanen

Darren Dale wrote:
 A simple test in python 3:

import numpy as np
round(np.arange(10))
 Traceback (most recent call last):
    File stdin, line 1, in module
 TypeError: type numpy.ndarray doesn't define __round__ method

I implemented this for array scalars already, but forgot about arrays. Anyway, 
it could be nice to have.

-- 
Pauli Virtanen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dealing with roundoff error

2010-03-28 Thread Pauli Virtanen

Mike Sarahan wrote:
 However, even linspace shows roundoff error:

 a=np.linspace(0.0,10.0,endpoint=False)
 b=np.linspace(0.1,10.1,endpoint=False)
 np.sum(a[1:]==b[:-1])  # Gives me 72, no 100

Are you sure equally spaced floating point numbers having this property even 
exist? 0.1 does not have a terminating representation in base-2:

0.1_10 = 0.0001100110011001100110011.._2

-- 
Pauli Virtanen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Py3k: making a py3k compat header available in installed numpy for scipy

2010-03-29 Thread Pauli Virtanen

ma, 2010-03-29 kello 19:13 +0900, David Cournapeau kirjoitti:
 I have worked on porting scipy to py3k, and it is mostly working. One
 thing which would be useful is to install something similar to
 npy_3kcompat.h in numpy, so that every scipy extension could share the
 compat header. Is the current python 3 compatibility header usable in
 the wild, or will it still significantly change (this requiring a
 different, more stable one) ?

I believe it's reasonably stable, as it contains mostly simple stuff.
Something perhaps can be added later, but I don't think anything will
need to be removed.

At least, I don't see what I would like to change there. The only thing
I wouldn't perhaps like to have in the long run are the PyString and
possibly PyInt redefinition macros. But if the header is going to be
used elsewhere, we can as well freeze it now and promise not to remove
or change anything existing.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Py3k: making a py3k compat header available in installed numpy for scipy

2010-03-30 Thread Pauli Virtanen

2010/3/30 David Cournapeau da...@silveregg.co.jp
 Pauli Virtanen wrote:
[clip]
  At least, I don't see what I would like to change there. The only thing
  I wouldn't perhaps like to have in the long run are the PyString and
  possibly PyInt redefinition macros.

 I would also prefer a new name, instead of macro redefinition, but I can
 do it.

For strings, the new names are PyBytes, PyUnicode and PyUString
(unicode on Py3, bytes on Py2). One of these should be used instead of PyString
in all places, but redefining the macro reduced the work in making a
quick'n'dirty
port of numpy.

For integers, perhaps a separate PyInt on Py2, PyLong on Py3 type should
be defined. Not sure if this would reduce work in practice.

 The header would not be part of the public API proper anyway (I
 will put it somewhere else than numpy/include), it should never be
 pulled implicitly in when including one of the .h in numpy/include.

Agreed.

Pauli
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Making a 2to3 distutils command ?

2010-03-30 Thread Pauli Virtanen

2010/3/30 David Cournapeau da...@silveregg.co.jp:
 Currently, when building numpy with python 3, the 2to3 conversion
 happens before calling any distutils command. Was there a reason for
 doing it as it is done now ?

This allowed 2to3 to also port the various setup*.py files and
numpy.distutils, and implementing it this way required the minimum
amount of work and understanding of distutils -- you need to force it
to proceed with the build using the set of output files from 2to3.

 I would like to make a proper numpy.distutils command for it, so that it
 can be more finely controlled (in particular, using the -j option). It
 would also avoid duplication in scipy.

Are you sure you want to mix distutils in this? Wouldn't it only
obscure how things work?

If the aim is in making the 2to3 processing reusable, I'd rather
simply move tools/py3tool.py under numpy.distutils (+ perhaps do some
cleanups), and otherwise keep it completely separate from distutils.
It could be nice to have the 2to3 conversion parallelizable, but there
are probably simple ways to do it without mixing distutils in.

But if you think this is really worth doing, go ahead.

Pauli
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Making a 2to3 distutils command ?

2010-03-30 Thread Pauli Virtanen

ti, 2010-03-30 kello 07:18 -0600, Ryan May kirjoitti:
 Out of curiosity, is there something wrong with the support for 2to3
 that already exists within distutils? (Other than it just being
 distutils)
 
 http://bruynooghe.blogspot.com/2010/03/using-lib2to3-in-setuppy.html

That AFAIK converts only those Python files that will be installed, and
I don't know how to tell it to disable some conversion on a per-file
basis. (Numpy also contains some files necessary for building that will
not be installed...)

But OK, to be honest, I didn't look closely if one could make it work
using the bundled 2to3 command. Things might be cleaner if it can be
made to work.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Ufunc memory access optimizations (Was: ufuncs on funny strides ...)

2010-04-01 Thread Pauli Virtanen

to, 2010-04-01 kello 11:30 -0700, M Trumpis kirjoitti:
[clip]
 Actually I realized later that the main slow-down comes from the fact
 that my array was strided in fortran order (ascending strides). But
 from the point of view of a ufunc that is operating independently at
 each value, why would it need to respect striding?

Correct. There has been discussion about improving ufunc performance by
optimizing the memory access pattern.

The main issue in your case is that the output array is in C order, so
that it is *not* possible to access both the input and the output arrays
in the optimal order. Fixing this issue requires allowing ufuncs to
allocate arrays that are in non-C order. This needs a design decision
that has not so far been made. I'd be for this, I don't think it can
break anything.

The second issue is that there is no universal access pattern choice for
every case that is optimal on all processor cache layouts. This forces
to use heuristics to determine the access pattern, which is not so
simple to get right, and usually would require some information of the
processor's cache architecture.

(Even some code has been written along these lines, though mostly
addressing the reduction:
http://github.com/pv/numpy-work/tree/ticket/1143-speedup-reduce
http://projects.scipy.org/numpy/ticket/1143
Not production quality so far, and the non-C-output order would
definitely help also here.)

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal for new ufunc functionality

2010-04-10 Thread Pauli Virtanen

la, 2010-04-10 kello 12:23 -0500, Travis Oliphant kirjoitti:
[clip]
 Here are my suggested additions to NumPy:
 ufunc methods:
[clip]
   * reducein (array, indices, axis=0)
similar to reduce-at, but the indices provide both the  
 start and end points (rather than being fence-posts like reduceat).

Is the `reducein` important to have, as compared to `reduceat`?

[clip]
 numpy functions (or methods):

I'd prefer functions here. ndarray already has a huge number of methods.

* segment(array)
 
  (produce an array of integers from an array producing the  
 different regions of an array:
 
   segment([10,20,10,20,30,30,10])  would produce ([0,1,0,1,2,2,0])

Sounds like `np.digitize(x, bins=np.unique(x))-1`. What would the
behavior be with structured arrays?

* edges(array, at=True)
   
  produce an index array providing the edges (with either fence-post  
 like syntax for reduce-at or both boundaries like reducein.

This can probably easily be based on segment().

 Thoughts on the general idea?

One question is whether these methods should be stuffed to the main
namespace, or under numpy.rec.


Another addition to ufuncs that should be though about is specifying the
Python-side interface to generalized ufuncs.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is this a small bug in numpydoc?

2010-04-14 Thread Pauli Virtanen

Wed, 14 Apr 2010 01:17:37 -0700, Fernando Perez wrote:
[clip]
 Exception occurred:
   File /home/fperez/ipython/repo/trunk-lp/docs/sphinxext/numpydoc.py,
 line 71, in mangle_signature
 'initializes x; see ' in pydoc.getdoc(obj.__init__)):
 AttributeError: class Bunch has no attribute '__init__' The full
 traceback has been saved in /tmp/sphinx-err-H1VlaY.log, if you want to
 report the issue to the developers.
 
 so it seems indeed that accessing __init__ without checking it's there
 isn't a very good idea.
 
 But I didn't write numpydoc and I'm tired, so I don't want to commit
 this without a second pair of eyes...

Yeah, it's a bug, I think.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] My GSoC Proposal to Implement a Subset of NumPy for PyPy

2010-04-18 Thread Pauli Virtanen

su, 2010-04-18 kello 19:05 +0900, David Cournapeau kirjoitti:
[clip]
 First, an aside: with the recent announcement from pypy concerning the
 new way of interfacing with C, wouldn't it make more sense to go the
 other way around  - i.e. starting from full numpy, and replace some
 parts in rpython ? Of course, this assumes that numpy can work with
 pypy using the new C API emulation, but this may not be a lot of
 work. This would gives the advantage of having something useful from
 the start.
 
 I think this project is extremely interesting in nature, and the parts
 which are the most interesting to implement are (IMHO of course):
   - indexing, fancy indexing
   - broadcasting
   - basic ufunc support, although I am not sure how to plug this with
 the C math library for math functions (cos, log, etc...)

I agree here: if it's possible to supplement the existing Numpy code
base by interpreter-level (JIT-able) support for indexing, fancy
indexing, and broadcasting, this would reduce the roadblock in writing
effective algorithms without using vectorization.

Also, if you can incrementally replace parts of Numpy, the code could be
immediately usable in real world, being less of a proof-of-concept
subset.

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Disabling Extended Precision in NumPy (like -ffloat-store)

2010-04-21 Thread Pauli Virtanen

ke, 2010-04-21 kello 10:47 -0400, Adrien Guillon kirjoitti:
[clip]
 My understanding, however, is that Intel processors may use extended
 precision for some operations anyways unless this is explicitly
 disabled, which is done with gcc via the -ffloat-store operation.
 Since I am prototyping algorithms for a different processor
 architecture, where the extended precision registers simply do not
 exist, I would really like to force NumPy to limit itself to using
 single-precision operations throughout the calculation (no extended
 precision in registers).

Probably the only way to ensure this is to recompile Numpy (and
possibly, also Python) with the -ffloat-store option turned on.

When compiling Numpy, you should be able to insert the flag via

OPT=-ffloat-store python setup.py build

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Adding an ndarray.dot method

2010-04-29 Thread Pauli Virtanen

Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote:
[clip]
 Here is a related ticket that proposes a more explicit alternative:
 adding a ``dot`` method to ndarray.
 http://projects.scipy.org/numpy/ticket/1456

I kind of like this idea. Simple, obvious, and leads
to clear code:

a.dot(b).dot(c)

or in another multiplication order,

a.dot(b.dot(c))

And here's an implementation:


http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c113050b6

I think I'm going to apply this, unless someone complains, as I
don't see any downsides (except maybe adding one more to the
huge list of methods ndarray already has).

Cheers,
Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Datetime overflow error, attn Stefan.

2010-05-09 Thread Pauli Virtanen

Sat, 08 May 2010 18:44:59 -0600, Charles R Harris wrote:
[clip]
 Because window's longs are always 32 bit, I think this is a good
 indication that somewhere a long is being used instead of an intp.

The test itself used 32-bit integers. Fix'd.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread Pauli Virtanen

Tue, 11 May 2010 00:23:52 -0700, T J wrote:
[clip]
 It seems reasonable that we might have to follow the deprecation route,
 but I'd be happier with a faster fix.
 
 1.5
   - Provide numpy.random.lomax.  Make numpy.random.pareto raise a
 DeprecationWarning and then call lomax.

 2.0 (if there is no 1.6)
   - Make numpy.random.pareto behave as Pareto distribution of 1st kind.

I think the next Numpy release will be 2.0.

How things were done with changes in the histogram function were:

1) Add a new=False keyword argument, and raise a DeprecationWarning
   if new==False. The user then must call it with pareto(..., new=True)
   to get the correct behaviour.

2) In the next release, change the default to new=True.

Another option would be to add a correct implementation with a different 
name, e.g. `pareto1` to signal it's the first kind, and deprecate the old 
function altogether.

A third option would be just to silently fix the bug. In any case the 
change should be mentioned noticeably in the release notes.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] savetxt not working with python3.1

2010-05-17 Thread Pauli Virtanen

Mon, 17 May 2010 13:22:18 -0400, Skipper Seabold wrote:
[clip]
 What version of Numpy?  Can you file a bug ticket with a failing
 example?  I couldn't replicated with r8417 on Python 3.

It was fixed in r8411.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] dtype and array creation

2010-05-20 Thread Pauli Virtanen

Thu, 20 May 2010 13:04:11 -0700, Keith Goodman wrote:
 Why do the follow expressions give different dtype?
 
 np.array([1, 2, 3], dtype=str)
 array(['1', '2', '3'],
   dtype='|S1')
 np.array(np.array([1, 2, 3]), dtype=str)
 array(['1', '2', '3'],
   dtype='|S8')

Scalars seem to be handled specially. Anyway, automatic determination of 
the string size is a bit dangerous to rely on with non-strings in the 
array:

 np.array([np.array(12345)], dtype=str)
array(['1234'], 
  dtype='|S4')

When I looked at this the last time, it wasn't completely obvious how to 
make this to do something more sensible.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Build failure at rev8246

2010-05-21 Thread Pauli Virtanen

Fri, 21 May 2010 08:09:55 -0700, T J wrote:
 I tried upgrading today and had trouble building numpy (after rm -rf
 build).  My full build log is here:
 
 http://www.filedump.net/dumped/build1274454454.txt
 

Your SVN checkout might be corrupted, containing a mix of old and new 
files. Try building from a clean checkout.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Extending documentation to c code

2010-05-25 Thread Pauli Virtanen

Hi,

ma, 2010-05-24 kello 12:01 -0600, Charles R Harris kirjoitti:
 I'm wondering if we could extend the current documentation format to
 the c source code. The string blocks would be implemented something
 like
[clip]

I'd perhaps stick closer to C conventions and use something like

/**
 * Spam foo out of the parrots.
 *
 * Parameters
 * --
 * a
 * Amount of spam
 * b
 * Number of parrots
 */
int foo(int a, int b)
{
}


Using some JavaDoc-type syntax might also be possible, although I
personally find it rather ugly. Also, parsing C might not be too
difficult, as projects aimed doing that in Python exist, for instance
http://code.google.com/p/pycparser/

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Extending documentation to c code

2010-05-26 Thread Pauli Virtanen

Wed, 26 May 2010 06:57:27 +0900, David Cournapeau wrote:
[clip: doxygen]
 It is yet another format to use inside C sources (I don't think doxygen
 supports rest), and I would rather have something that is similar,
 ideally integrated into sphinx. It also generates rather ugly doc by
 default,

Anyway, we can probably nevertheless just agree on a readable plain-text/
rst format, and then just use doxygen to generate the docs, as a band-aid.

http://github.com/pv/numpycdoc

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Introduction to Scott, Jason, and (possibly) others from Enthought

2010-05-26 Thread Pauli Virtanen

Wed, 26 May 2010 10:50:19 +0200, Sebastian Walter wrote:
 I'm a potential user of the C-API and therefore I'm very interested in
 the outcome.
 In the previous discussion
 (http://comments.gmane.org/gmane.comp.python.numeric.general/37409) many
 different views on what the new C-API should be were expressed.

I believe the aim of the refactor is to *not* change the C-API at all, 
but separate it internally from the routines that do the heavy lifting. 
Externally, Numpy would still look the same, but be more easy to maintain.

The routines that do the heavy lifting could then be good for reuse and 
be more easy to maintain, but I think how and where they would be exposed 
hasn't been discussed so far...

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading and writing binary data to/from file objects

2010-05-26 Thread Pauli Virtanen

ke, 2010-05-26 kello 14:07 +0200, Christoph Bersch kirjoitti:
 f = open(fname2, 'w')
[clip]
 Am I doing something substantially wrong or is this a bug?

You are opening files in text mode. Use mode 'wb' instead.

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Extending documentation to c code

2010-05-26 Thread Pauli Virtanen

Wed, 26 May 2010 07:15:08 -0600, Charles R Harris wrote:
 On Wed, May 26, 2010 at 2:59 AM, Pauli Virtanen p...@iki.fi wrote:
 
 Wed, 26 May 2010 06:57:27 +0900, David Cournapeau wrote: [clip:
 doxygen]
  It is yet another format to use inside C sources (I don't think
  doxygen supports rest), and I would rather have something that is
  similar, ideally integrated into sphinx. It also generates rather
  ugly doc by default,

 Anyway, we can probably nevertheless just agree on a readable
 plain-text/ rst format, and then just use doxygen to generate the docs,
 as a band-aid.

 http://github.com/pv/numpycdoc

 Neat. I didn't quite see the how how you connected the rst documentation
 and doxygen.

I didn't :)

But I just did: doing this it was actually a 10 min job since Doxygen 
accepts HTML -- now it parses the comments as RST and renders it properly 
as HTML in the Doxygen output. Of course getting links etc. to work would 
require more effort, but that's left as an exercise for someone else to 
finish.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NotImplemented returns

2010-05-27 Thread Pauli Virtanen

Thu, 27 May 2010 10:21:14 -0600, Charles R Harris wrote:
[clip]
 Maybe an enhancement ticket. The NotImplemented return is appropriate
 for some functions, but for functions with a single argument we should
 probably raise an error.

A NotImplemented value leaking to the the user is a bug. The function 
should raise a ValueError instead.

NotImplemented is meant only for use as a placeholder singleton in 
implementation of rich comparison operations etc., and we shouldn't 
introduce any new meanings IMHO.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Finding Star Images on a Photo (Video chip) Plate?

2010-05-29 Thread Pauli Virtanen

Sat, 29 May 2010 11:29:56 -0700, Wayne Watson wrote:
[clip]
 SciPy. Hmm, this looks reasonable. http://scipy.org/Getting_Started.
 Somehow I missed it and instead landed on
 http://www.scipy.org/wikis/topical_software/Tutorial, which looks
 fairly imposing. Those are tar and gzip files. 

There's also the PDF file that contains the main text.

 In any case, the whole page looks quite imposing. Where did you 
 find the simple one below?

The scipy.org wiki is organically grown and confusing at times (I moved 
the page you landed on to a more clear place).

The correct places where this stuff should be is

http://docs.scipy.org/

http://scipy.org/Additional_Documentation

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-01 Thread Pauli Virtanen

Tue, 01 Jun 2010 16:59:47 +0900, David Cournapeau wrote:
 I have looked back into the way to convert the existing numpy svn
 repository into git. It went quite smoothly using svn2git (developed by
 the KDE team for their own transition), but there are a few questions
 which need to be answered:
 
  - Shall we keep the old svn branches ? I think most of them are just
 cruft, and can be safely removed. We would then just keep the release
 branches

They mostly seem like leftovers and mostly merged to trunk, that's true. 
At least a prefix svnbranch/** should be added to them, if we are going 
to keep them at all.

Personally, I don't see problems in leaving them out.

 (in maintenance/***)

Why not release/** or releases/**? I'd suggest singular form here. What 
do other projects use?

Does having a prefix here imply something to clones?

  - Tag conversion: svn has no notion of tags, so translating them into
  git tags cannot be done automatically in a safely manner (and we do
 have some rewritten tags in the svn repo). I was thinking about creating
 a small script to create them manually afterwards for the releases, in
 the svntags/***.

Sounds OK.

  - Author conversion: according to git, there are around 50 committers
 in numpy. Several of them are double and should be be merged I think
 (kern vs rkern, Travis' accounts as well), but there is also the option
 to set up real emails. Since email are private, I don't want to just
 scrape them without asking permission first. I don't know how we
 should proceed here.

I don't think correcting the email addresses in the SVN history is very 
useful. Best probably just use some dummy form, maybe

forename.surn...@localhost

or something similar to keep things simple.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-05 Thread Pauli Virtanen

Fri, 04 Jun 2010 15:28:52 -0700, Matthew Brett wrote:
 I think it should be opt-in.  How would opt-out work?  Would someone
 create new accounts for all the contributors and then give them
 access?

 Just to be clear, this has nothing to do with accounts on github, or
 any registered thing. This is *only* about username/email as recognized
 by git itself (as recorded in the commit objects).
 
 Actually - isn't it better if people do give you their github username /
 email combo - assuming that's the easiest combo to work with later?

I think the Github user name is not really needed here, as what goes into 
the history is the Git ID: name + email address.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-05 Thread Pauli Virtanen

Fri, 04 Jun 2010 15:19:27 -0700, Dan Roberts wrote:
 Sorry to interrupt, but do you have a usable, up to date cloneable git
 repository somewhere? I noticed that you had a repository on github, but
 it's about a month out of date.  I understand if what you're working on
 isn't ready to be cloned from yet.  In the meantime I can use git-svn,
 but I figured if you had something better, I'd use that.

See also

http://projects.scipy.org/numpy/wiki/GitMirror

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Pauli Virtanen

ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti:
 On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote:
  The savez docstring should probably be clarified to provide this
  information.

 I would prefer to actually offer compression to the user.  Unfortunately, 
 adding another argument to this function will never be 100% secure, since 
 currently, all kwargs will be saved into the zip, so it could count as 
 behaviour change.

Yep, that's the only question to be resolved. I suppose compression is
not so usual as a variable name, so it probably wouldn't break anyone's
code.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Problems with get_info installing scipy

2010-06-08 Thread Pauli Virtanen

Tue, 08 Jun 2010 09:47:41 -0400, Jeff Hsu wrote:
 I tried to install scipy, but I get the error with not being able to
 find get_info() from numpy.distutils.misc_util.  I read that you need
 the SVN version of numpy to fix this.  I recompiled numpy and
 reinstalled from the SVN, which says is version 1.3.0 (was using 1.4.1
 version before) and that function is not found within either versions. 
 What version of numpy should I use? Or maybe I'm not removing numpy
 correctly.

It's included in 1.4.1 and in SVN (which is 1.5.x).

You almost certainly have an older version of numpy installed somewhere 
that overrides the new one. Check import numpy; print numpy.__file__ to 
see which one is imported.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Pauli Virtanen

Thu, 10 Jun 2010 18:48:04 +0200, Sturla Molden wrote:
[clip]
 5. Allow OpenMP pragmas in the core. If arrays are above a certain size,
 it should switch to multi-threading.

Some places where Openmp could probably help are in the inner ufunc 
loops. However, improving the memory efficiency of the data access 
pattern is another low-hanging fruit for multidimensional arrays.

I'm not sure how Openmp plays together with thread-safety; especially if 
used in outer constructs. Whether this is possible to do easily is not so 
clear, as Numpy uses e.g. iterator and loop objects, which cannot 
probably be shared between different openmp threads. So one would have to 
do some extra work to parallelize these parts manually.

But probably if multithreading is desired, openmp sounds like the way to 
go...

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen

Thu, 10 Jun 2010 23:56:56 +0200, Sturla Molden wrote:
[clip]
 Also about array iterators in NumPy's C base (i.e. for doing something
 along an axis): we don't need those. There is a different way of coding
 which leads to faster code.
 
 1. Collect an array of pointers to each subarray (e.g. using
 std::vectordtype* or dtype**)
 2. Dispatch on the pointer array...

This is actually what the current ufunc code does.

The innermost dimension is handled via the ufunc loop, which is a simple 
for loop with constant-size step and is given a number of iterations. The 
array iterator objects are used only for stepping through the outer 
dimensions. That is, it essentially steps through your dtype** array, 
without explicitly constructing it.

An abstraction for iteration through an array is useful, also for   
building the dtype** array, so we should probably retain them.

For multidimensional arrays, you run here into the optimization problem 
of choosing the order of axes so that the memory access pattern makes a 
maximally efficient use of the cache. Currently, this optimization 
heuristic is really simple, and makes the wrong choice even in the simple 
2D case.

We could in principle use OpenMP to parallelize the outer loop, as long 
as the inner ufunc part is implemented in C, and does not go back into 
Python. But if the problem is memory-bound, it is not clear that 
parallelization helps.

Another task for optimization would perhaps be to implement specialized 
loops for each operation, currently we do one function indirection per 
iteration which probably costs something. But again, it may be that the 
operations are already bounded by memory speed, and this would not 
improve performance.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen

Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote:
[clip]
 Ideally, algorithms would get wrapped in between two additional
 pre-/postprocessing steps:
 
 1) Preprocessing: After broadcasting, transpose the input arrays such
 that they become C order.  More specifically, sort the strides of one
 (e.g. the first/the largest) input array decreasingly, and apply the
 same transposition to the other arrays (in + out if given).
 
 2) Perform the actual algorithm, producing a C-order output array.
 
 3) Apply the reverse transposition to the output array.
[clip]

The post/preprocessing steps are fairly easy to implement in the ufunc 
machinery.

The problem here was that this would result to a subtle semantic change, 
as output arrays from ufuncs would no longer necessarily be in C order.

We need to explicitly to *decide* to make this change, before proceeding. 
I'd say that we should simply do this, as nobody should depend on this 
assumption.

I think I there was some code ready to implement this shuffling. I'll try 
to dig it out and implement the shuffling.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-11 Thread Pauli Virtanen

Fri, 11 Jun 2010 15:31:45 +0200, Sturla Molden wrote:
[clip]
 The innermost dimension is handled via the ufunc loop, which is a
 simple for loop with constant-size step and is given a number of
 iterations. The array iterator objects are used only for stepping
 through the outer dimensions. That is, it essentially steps through
 your dtype** array, without explicitly constructing it.

 Yes, exactly my point. And because the iterator does not explicitely
 construct the array, it sucks for parallel programming (e.g. with
 OpenMP):
 
 - The iterator becomes a bottleneck to which access must be serialized
 with a mutex.
 - We cannot do proper work scheduling (load balancing)

I don't necessarily agree: you can do

for parallelized outer loop {
critical section {
p = get iterator pointer
++iterator
}
inner loop in region `p`
}

This does allow load balancing etc., as a free processor can immediately 
grab the next available slice. Also, it would be easier to implement with 
OpenMP pragmas in the current code base.

Of course, the assumption here is that the outer iterator overhead is 
small compared to the duration of the inner loop. This must then be 
compared to the memory access overhead involved in the dtype** array.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Tensor contraction

2010-06-12 Thread Pauli Virtanen

Sat, 12 Jun 2010 23:15:16 +0200, Friedrich Romstedt wrote:
[clip]
 But note that for:
 T[:, I, I]
 the shape is reversed with respect to that of: 
 T[I, :, I]  and T[I, I, :] .
 
 I think it should be written in the docs how the shape is derived.

It's explained there in detail (although maybe not in the simplest 
possible words):

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer

Let index dimension be a dimension for which you supply an array  
(rather than the :-slice) as an index. 

Let broadcast shape of indices be the shape to which all the index 
arrays are broadcasted to. (= same as (i1+i2+...+in).shape).

The rule is that

1) if all the index dimensions are next to each other, then the shape of 
the result array is

(dimensions preceding the index dimensions) 
+ (broadcast shape of indices) 
+ (dimensions after the index dimensions)

2) if the index dimensions are not all next to each other, then the shape 
of the result array is

(broadcast shape of indices) + (non-index dimensions)

Might be a bit surprising, but at this point it's not going to be changed.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Tensor contraction

2010-06-12 Thread Pauli Virtanen

Sat, 12 Jun 2010 16:30:14 -0400, Alan Bromborsky wrote:
 If I have a single numpy array, for example with 3 indices T_{ijk} and I
 want to sum over two them in the sense of tensor contraction -
 
 T_{k} = \sum_{i=0}^{n-1} T_{iik}.  Is there an easy way to do this with
 numpy?

HTH, (not really tested, so you get what you pay for) 


def tensor_contraction_single(tensor, dimensions):
Perform a single tensor contraction over the dimensions given
swap = [x for x in range(tensor.ndim) 
if x not in dimensions] + list(dimensions)
x = tensor.transpose(swap)
for k in range(len(dimensions) - 1):
x = np.diagonal(x, axis1=-2, axis2=-1)
return x.sum(axis=-1)

def _preserve_indices(indices, removed):
Adjust values of indices after some items are removed
for r in reversed(sorted(removed)):
indices = [j if j = r else j-1 for j in indices]
return indices

def tensor_contraction(tensor, contractions):
Perform several tensor contractions
while contractions:
dimensions = contractions.pop(0)
tensor = tensor_contraction_single(tensor, dimensions)
contractions = [_preserve_indices(c, dimensions) 
for c in contractions]
return tensor

# b_{km} = sum_{ij} a_{i,j,k,j,i,m}

a = np.random.rand(3,2,4,2,3,5)
b = tensor_contraction(a, [(0,4), (1,3)])

b2 = np.zeros((4, 5))
for i in range(3):
for j in range(2):
b2 += a[i,j,:,j,i,:]

assert (abs(b - b2)  1e-12).all()

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy re-factoring project

2010-06-13 Thread Pauli Virtanen

Sun, 13 Jun 2010 06:54:29 +0200, Sturla Molden wrote:
[clip: memory management only in the interface]

You forgot views: if memory management is done in the interface layer, it 
must also make sure that the memory pointed to by a view is never moved 
around, and not freed before all the views are freed.

This probably cannot be done with existing Python objects, since IIRC, 
they always own their own memory. So one should refactor the memory 
management out of ndarray on the interface side.

The core probably also needs to decide when to create views (indexing), 
so a callback to the interface is needed.

 Second: If we cannot figure out how much to allocate before starting to
 use C (very unlikely), 

The point is that while you in principle can figure it out, you don't 
*want to* do this work in the interface layer. Computing the output size 
is just code duplication.

 we can always use a callback to Python. Or we can

This essentially amounts to having a

NpyArray_resize

in the interface layer, which the core can call.

Here, however, one has the requirement that all arrays needed by core are 
always passed in to it, including temporary arrays. Sounds a bit like 
Fortran 77 -- it's not going to be bad, if the number of temporaries is 
not very large.

If the core needs to allocate arrays by itself, this will unevitably lead 
to memory management that cannot avoided by having the allocation done in 
the interface layer, as the garbage collector must not mistake a 
temporary array referenced only by the core as unused.

An alternative here is to have an allocation routine

NpyArray_new_tmp
NpyArray_del_tmp

whose allocated memory is released automatically, if left dangling, after 
the call to the core is completed. This would be useful for temporary 
arrays, although then one would have to be careful not ever to return 
memory allocated by this back to the caller.

 have two C functions, the first determining the amount of allocation,
 the second doing the computation.

That sounds like a PITA, think about LAPACK.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyArray_Scalar() and Unicode

2010-06-13 Thread Pauli Virtanen

Sat, 12 Jun 2010 17:33:13 -0700, Dan Roberts wrote:
[clip: refactoring PyArray_Scalar]
 There are a few problems with this.  The biggest problem for me is
 that it appears PyUCS2Buffer_FromUCS4() doesn't produce UCS2 at all, but
 rather UTF-16 since it produces surrogate pairs for code points above
 0x.  My first question is: is there any time when the data produced
 by PyUCS2Buffer_FromUCS4() wouldn't be parseable by a standards
 compliant UTF-16 decoder?  

Since UTF-16 = UCS-2 + surrogate pairs, as far as I know, the data 
produced should always be parseable by DecodeUTF16.

Conversion to real UCS-2 from UCS-4 would be a lossy procedure, since not 
all code points can be represented with 2 bytes.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Technicalities of the SVN - GIT transition

2010-06-15 Thread Pauli Virtanen

ti, 2010-06-15 kello 13:18 +0900, David kirjoitti:
 On 06/15/2010 12:15 PM, Vincent Davis wrote:
  Is this http://github.com/cournape/numpy going to become the git repo?
 
 No, this is just a mirror of the svn repo, and used by various 
 developers to do their own stuff.

That's a mirror of a mirror, the master SVN mirror is here:

http://projects.scipy.org/git/numpy/

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Ufunc memory access optimization

2010-06-15 Thread Pauli Virtanen

pe, 2010-06-11 kello 10:52 +0200, Hans Meine kirjoitti:
 On Friday 11 June 2010 10:38:28 Pauli Virtanen wrote:
[clip]
  I think I there was some code ready to implement this shuffling. I'll try
  to dig it out and implement the shuffling.
 
 That would be great!
 
 Ullrich Köthe has implemented this for our VIGRA/numpy bindings:
   http://tinyurl.com/fast-ufunc
 At the bottom you can see that he basically wraps all numpy.ufuncs he can 
 find 
 in the numpy top-level namespace automatically.

Ok, here's the branch:


http://github.com/pv/numpy-work/compare/master...feature;ufunc-memory-access-speedup

Some samples: (the reported times in braces are times without the
optimization) 

x = np.zeros([100,100])
%timeit x + x
1 loops, best of 3: 106 us (99.1 us) per loop
%timeit x.T + x.T
1 loops, best of 3: 114 us (164 us) per loop
%timeit x.T + x
1 loops, best of 3: 176 us (171 us) per loop

x = np.zeros([100,5,5])
%timeit x.T + x.T
1 loops, best of 3: 47.7 us (61 us) per loop

x = np.zeros([100,5,100]).transpose(2,0,1)
%timeit np.cos(x)
100 loops, best of 3: 3.77 ms (9 ms) per loop

As expected, some improvement can be seen. There's also appears to be
an additional 5 us (~ 700 inner loop operations it seems) overhead
coming from somewhere; perhaps this can still be reduced.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Ufunc memory access optimization

2010-06-15 Thread Pauli Virtanen

ti, 2010-06-15 kello 10:10 -0400, Anne Archibald kirjoitti:
 Correct me if I'm wrong, but this code still doesn't seem to make the
 optimization of flattening arrays as much as possible. The array you
 get out of np.zeros((100,100)) can be iterated over as an array of
 shape (1,), which should yield very substantial speedups. Since
 most arrays one operates on are like this, there's potentially a large
 speedup here. (On the other hand, if this optimization is being done,
 then these tests are somewhat deceptive.)

It does perform this optimization, and unravels the loop as much as
possible. If all arrays are wholly contiguous, iterators are not even
used in the ufunc loop. Check the part after

/* Determine how many of the trailing dimensions are contiguous
*/

However, in practice it seems that this typically is not a significant
win -- I don't get speedups over the unoptimized numpy code even for
shapes

(2,)*20

where you'd think that the iterator overhead could be important:

x = np.zeros((2,)*20)
%timeit np.cos(x)
10 loops, best of 3: 89.9 ms (90.2 ms) per loop

This is actually consistent with the result

x = np.zeros((2,)*20)
y = x.flatten()
%timeit x+x
10 loops, best of 3: 20.9 ms per loop
%timeit y+y
10 loops, best of 3: 21 ms per loop

that you can probably verify with any Numpy installation.

This result may actually be because the Numpy ufunc inner loops are not
especially optimized -- they don't use fixed stride sizes etc., and are
so not much more efficient than using array iterators.

 On the other hand, it seems to me there's still some question about
 how to optimize execution order when the ufunc is dealing with two or
 more arrays with different memory layouts. In such a case, which one
 do you reorder in favour of? 

The axis permutation is chosen so that the sum (over different arrays)
of strides (for each dimension) is decreasing towards the inner loops.

I'm not sure, however, that this is the optimal choice. Things may
depend quite a bit on the cache architecture here.

 Is it acceptable to return
 freshly-allocated arrays that are not C-contiguous?

Yes, it returns non-C-contiguous arrays. The contiguity is determined so
that the ufunc loop itself can access the memory with a single stride.

This may cause some speed loss in some expressions. Perhaps there should
be a subtle bias towards C-order? But I'm not sure this is worth the
bother.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Ufunc memory access optimization

2010-06-15 Thread Pauli Virtanen

-defined, but
only for one that is good enough in practice.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checking for array type in C extension

2010-06-18 Thread Pauli Virtanen

pe, 2010-06-18 kello 12:49 +0200, Berthold Hoellmann kirjoitti:
[clip]
 tst.inttestfunc(np.array((1,2),dtype=np.int))
 tst.inttestfunc(np.array((1,2),dtype=np.int8))
 tst.inttestfunc(np.array((1,2),dtype=np.int16))
 tst.inttestfunc(np.array((1,2),dtype=np.int32))
 tst.inttestfunc(np.array((1,2),dtype=np.int64))
 
 h...@pc090498 ~/pytest $ PYTHONPATH=build/lib.win32-2.5/ python xx.py
 1.4.1 ['C:\\Python25\\lib\\site-packages\\numpy']
 PyArray_TYPE(array): 7; NPY_INT: 5
 NPY_INT not found
 PyArray_TYPE(array): 1; NPY_INT: 5
 NPY_INT not found
 PyArray_TYPE(array): 3; NPY_INT: 5
 NPY_INT not found
 PyArray_TYPE(array): 7; NPY_INT: 5
 NPY_INT not found
 PyArray_TYPE(array): 9; NPY_INT: 5
 NPY_INT not found

 NPY_INT32 is 7, but shouldn't NPY_INT correspond to numpy.int. And what
 kind of int is NPY_INT in this case?

I think the explanation is the following:

- NPY_INT is a virtual type that is either int32 or int64, depending on
  the native platform size.

- It has its own type code, distinct from NPY_SHORT (= 3 = int32) and
  NPY_LONG (= 7 = int64).

- But the type specifier is replaced either by NPY_SHORT or NPY_LONG
  on array creation, so no array is of this dtype.

Pauli


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checking for array type in C extension

2010-06-18 Thread Pauli Virtanen

pe, 2010-06-18 kello 16:43 +0200, Berthold Hoellmann kirjoitti:
[clip]
 The documentation (i refere to NumPy User Guide, Release 1.5.0.dev8106)
 claims that numpy.int is platform int and that NPY_INT is a C based
 name, thus referring to int also. So if I want to use int platform
 independent in my extension module, what do I do to check whether the
 correct data is provided. I want to avoid one of the casting routines,
 such as PyArray_FROM_OTF or PyArray_FromAny? Do I have to use
 PyArray_ITEMSIZE? But PyArray_ITEMSIZE indicates that numpy.int
 are not platform integers. It returns 8 on Linux64, where
 sizeof(int)==4.

 So is this a bug in 1.4.1, in the 1.5 beta documentation or a
 misunderstanding on my side?

I guess the documentation needs clarification, as numpy.int is not
actually the platform-size integer, but the Python-size integer.

The platform-size integer corresponding to C int is IIRC numpy.intc,
which should result to the same sizes as NPY_INT.

-- 
Pauli Virtanen


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multiplying numpy floats and python lists

2010-06-20 Thread Pauli Virtanen

su, 2010-06-20 kello 13:56 -0400, Tony S Yu kirjoitti:
 I came across some strange behavior when multiplying numpy floats and
 python lists: the list is returned unchanged:
 
  In [18]: np.float64(1.2) * [1, 2]
  
  Out[18]: [1, 2]

Probably a bug, it seems to round the result first to an integer, and
then do the usual Python thing (try for example np.float64(2.2)).

I'd suggest filing a bug ticket.

Looking at CPython source code, the issue seems to be that np.float64
for some reason passes PyIndex_Check.

But (without whipping out gdb) I can't understand how this can be: the
float64 scalar type is not supposed to have a non-NULL in the nb_index
slot. Indeed, a np.float64.__index__ method does not exist, and the
following indeed does not work:

[1, 2, 3][np.float64(2.2)]

Strange.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible to use numexpr with user made ufuncs/scipy ufuncs?

2010-06-26 Thread Pauli Virtanen

Hi,

la, 2010-06-26 kello 14:24 +0200, Francesc Alted kirjoitti:
[clip]
 Yeah, you need to explicitly code the support for new functions in numexpr.  
 But another possibility, more doable, would be to code the scipy.special 
 functions by using numexpr as a computing back-end.

Would it be possible to add generic support for user-supplied ufuncs
into numexpr? This would maybe be more of a convenience feature than a
speed improvement, although perhaps some speed could be gained by
evaluating ufuncs per-block.

-- 
Pauli Virtanen 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible to use numexpr with user made ufuncs/scipy ufuncs?

2010-06-26 Thread Pauli Virtanen

Sat, 26 Jun 2010 17:51:52 +0100, Francesc Alted wrote:
[clip]
 Well, I'd say that this support can be faked in numexpr easily.  For
 example, if one want to compute a certain ufunc called, say, sincos(x)
 defined as sin(cos(x)) (okay, that's very simple, but it will suffice
 for demonstration purposes), he can express that as sincos =
 sin(cos(%s)), and then use it in a more complex expression like:
 
 %s+1*cos(%s) % (sincos % 'x', 'y')
 
 that will be expanded as:
 
 sin(cos(x))+1*cos(y)
 
 Of course, this is a bit crude, but I'd say that's more than enough for
 allowing the evaluation of moderately complex expressions.

But what if such an expression does not exist? For example, there is no 
finite closed form expression for the Bessel functions in terms of 
elementary functions. An accurate implementation is rather complicated, 
and usually piecewise defined.

For convenience reasons, it could be useful if one could do something like

numexpr.evaluate(cos(iv(0, x)), functions=dict(iv=scipy.special.iv))

and this would be translated to numexpr bytecode that would make a Python 
function call to obtain iv(0, x) for each block of data required, 
assuming iv is a vectorized function. It's of course possible to 
precompute the value of iv(0, x), but this is extra hassle and requires 
additional memory.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.random.poisson docs missing Returns

2010-06-27 Thread Pauli Virtanen

Sat, 26 Jun 2010 17:37:22 -0700, David Goldsmith wrote:
 On Sat, Jun 26, 2010 at 3:22 PM, josef.p...@gmail.com wrote:
[clip]
 Is there a chance that some changes got lost?

 (Almost) anything's possible... :-(

There's practically no change of edits getting lost. There's a change of 
them being hidden if things are moved around in the source code, causing 
duplicate work, but that's not the case here.

 Well, here's what happened in the particular case of numpy's pareto:
 
 The promotion to Needs review took place - interestingly - 2008-06-26
 (yes, two years ago today), despite the lack of a Returns section; the
 initial check-in of HOWTO_DOCUMENT.txt - which does specify that a
 Returns section be included (when applicable) - was one week before,
 2008-06-19. So, it's not that surprising that this slipped through the
 cracks.

 Pauli (or anyone): is there a way to search the Wiki, e.g., using a
 SQL-like query, for docstrings that saw a change in status before a
 date, or between two dates?

No. The review status is not versioned, so the necessary information is 
not there. The only chance would be to search for docstrings that haven't 
been edited after a certain date.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] arr.copy(order='F') doesn't agree with docstring: what is intended behavior?

2010-06-27 Thread Pauli Virtanen

Sun, 27 Jun 2010 13:41:23 -0500, Kurt Smith wrote:
[clip]
 I misspoke (mistyped...).  So long as it has the 'F' behavior I'm happy,
 and I'll gladly file a bug report  patch to get the code to match the
 docs, assuming that's the intended behavior.

The problem is probably that the function is using PyArg_ParseTuple 
instead of PyArg_ParseTupleAndKeywords

 Who do I contact about getting a trac account?

Go to http://projects.scipy.org/numpy/ and select Register.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible to use numexpr with user made ufuncs/scipy ufuncs?

2010-06-28 Thread Pauli Virtanen

ma, 2010-06-28 kello 09:48 +0200, Francesc Alted kirjoitti:
[clip]
 But again, the nice thing would be to implement such a special functions in 
 terms of numexpr expressions so that the evaluation itself can be faster.  
 Admittedly, that would take a bit more time.

Quite often, you need to evaluate a series or continued fraction
expansion to get the value of a special function at some point, limiting
the number of terms by stopping when a certain convergence criterion is
satisfied. Also, which series to sum typically depends on the point
where things are evaluated. These things don't vectorize very nicely. If
I understand correctly, numexpr bytecode interpreter does not support
conditionals and loops like this at the moment?

The special function implementations are typically written in bare
C/Fortran. Do you think numexpr could give speedups there? As I see it,
speedups (at least with MKL) could come from using faster
implementations of sin/cos/exp etc. basic functions. Using SIMD to
maximum effect would then require an amount of cleverness in re-writing
the evaluation algorithms, which sounds like a major amount of work.

-- 
Pauli Virtanen  

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Pauli Virtanen

ke, 2010-06-23 kello 12:46 +0200, Ruben Salvador kirjoitti:
[clip]
 how can I detect the EOF to stop reading

r = f.read(1)
if not r:
break # EOF
else:
f.seek(-1, 1)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.load raising IOError but EOFError expected

2010-06-28 Thread Pauli Virtanen

ma, 2010-06-28 kello 15:48 -0400, Anne Archibald kirjoitti:
[clip]
 That said, good exception hygiene argues that np.load should throw
 EOFErrors rather than the more generic IOErrors, but I don't know how
 difficult this would be to achieve.

np.load is in any case unhygienic, since it tries to unpickle, if it
doesn't see the .npy magic header.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sum up to a specific value

2010-07-01 Thread Pauli Virtanen

Thu, 01 Jul 2010 06:17:50 -0300, Renato Fabbri wrote:
 i need to find which elements of an array sums up to an specific value
 
 any idea of how to do this?

Sounds like the knapsack problem

http://en.wikipedia.org/wiki/Knapsack_problem

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sum up to a specific value

2010-07-01 Thread Pauli Virtanen

to, 2010-07-01 kello 11:46 -0300, Renato Fabbri kirjoitti:
 just a solution (not all of them)
 
 and the application happen to come up with something like 10k values
 in the array. don care waiting, but...

As said, the problem is a well-known one, and it's not really Python or
Numpy-specific, so slightly off-topic for this list. Numpy and Scipy
don't ship pre-made algorithms for solving these.

But anyway, you'll probably find that the brute force algorithm (e.g.
the one from Vincent) takes exponential time (and exp(1) is a big
number). So you need to do something more clever. First stop, Wikipedia,

http://en.wikipedia.org/wiki/Knapsack_problem
http://en.wikipedia.org/wiki/Subset_sum_problem

and if you are looking for pre-cooked solutions, second stop
stackoverflow,

http://stackoverflow.com/search?q=subset+sum+problem

Some search words you might want to try on Google:

http://www.google.com/search?q=subset%20sum%20dynamic%20programming

Generic advice only this time, sorry; I don't have pre-made code for
solving this at hand, but hopefully the above links give some pointers
for what to do.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] memory leak using numpy and cvxopt

2010-07-02 Thread Pauli Virtanen

Fri, 02 Jul 2010 14:56:47 +0200, Tillmann Falck wrote:
 I am hitting a memory leak with the combination of numpy and
 cvxopt.matrix. As I am not where it occurs, I am cross posting.

Probably a bug in cvxopt, as also the following leaks memory:


from cvxopt import matrix

N = 2000

X = [0]*N
Y = matrix(0.0, (N, N))

while True:
Y[:N, :1] = X


-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.5 for Python 2.7 and 3.1

2010-07-03 Thread Pauli Virtanen

Fri, 25 Jun 2010 12:31:36 -0700, Christoph Gohlke wrote:
 at http://projects.scipy.org/numpy/ticket/1520 I have posted a patch
 against numpy svn trunk r8464 that:
 
 1) removes the datetime functionality 2) restores ABI compatibility with
 numpy 1.4.1 3) enables numpy to build and run on Python 2.7 and 3.1 (at
 least on Windows)
[clip]

This should make it easier to keep track of it:

http://github.com/pv/numpy-work/tree/1.5.x

I note that the patch contains some additional 3K porting. We/I'll need 
to split the patch into two parts:

- Py3K or other miscellaneous fixes that should also go to the trunk
- removing datetime and restoring the ABI

I can probably take a look on this during EuroScipy. I'd also like to 
manually check the diff to svn/1.4.x

So far, I think the following changes look like they should also go to 
the trunk:

- numpy/numarray/_capi.c
- numpy/lib/tests/test_io.py
- numpy/core/include/numpy/npy_math.h

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PATCH: reference leaks for 'numpy.core._internal' module object

2010-07-03 Thread Pauli Virtanen

Sat, 03 Jul 2010 01:11:20 -0300, Lisandro Dalcin wrote:
 The simple test below show the issue.
[clip: patch]

Thanks, applied in r8469.

[clip]
 PS: I think that such implementation should at least handle the very
 simple one/two character format (eg, 'i', 'f', 'd', 'Zf', 'Zd', etc.)
 without calling Python code... Of course, complaints without patches
 should not be taken too seriously ;-)

We'll optimize once someone complains this makes their code slow ;)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] [OT] Re: debian benchmarks

2010-07-04 Thread Pauli Virtanen

Sun, 04 Jul 2010 04:32:14 +0200, Sturla Molden wrote:
 I was just looking at Debian's benchmark. LuaJIT is now (on median)
 beating Intel Fortran! Consider that Lua is a dynamic language very
 similar to Python. I know it's just a benchmark but this has to count
 as insanely impressive.

I guess you're talking about shootout.alioth.debian.org tests?

 Beating Intel Fortran with a dynamic scripting
 language... How is that even possible?

It's possible that in the cases where Lua wins, the Lua code is not 
completely equivalent to the Fortran code, or uses stuff such as strings 
for which Lua's default implementation may be efficient.

At least in the mandelbrot example some things differ. I wonder if Lua 
there takes advantage of SIMD instructions because the author of the code 
has manually changed the inmost loop to process two elements at once?

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subtle behavior when subtracting sub-arrays

2010-07-05 Thread Pauli Virtanen

Mon, 05 Jul 2010 16:03:56 +0200, Steve Schmerler wrote:
[clip]
 To sum up, I find it a bit subtle that
 a = a - a[...,0][...,None]
 works as expected, while
 a -= a[...,0][...,None]
 does not.
 I guess the reason is that in the latter case (and the corresponding
 loop), a[...,0] itself is changed during the loop, while in the former
 case, numpy makes a copy of a[...,0] ?

Correct.

 Is this intended?

Not really. It's a feature we're planning to get rid of eventually, 
once a way to do it without sacrificing performance in safe cases is 
implemented.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy, matplotlib and masked arrays

2010-07-10 Thread Pauli Virtanen

Sat, 10 Jul 2010 11:03:47 +1000, Peter Isaac wrote:
 I'm not sure if this is a problem with numpy, matplotlib, EPD or me but
 here goes.  The general problem is that either matplotlib has stopped
 plotting masked arrays correctly or numpy has changed the way masked
 arrays are presented.  Or there is something strange with the Enthought
 Python Distribution or me.  Many would agree with the latter ...
[clip]
 Since the matplotlib version is the same for the repository Python
 installation and the EPD-6.1-1 installation and the script works in the
 first but not the second, it suggests the problem is not just
 matplotlib.  That seems to leave the numpy version change (from 1.3 to
 1.4) or something in the EPD.  Or me.  
 Note that EPD-6.2-2 works fine with this script on WinXP.

It works fine for me with Numpy 1.4.0 and matplotlib 0.99.1.2, on Ubuntu. 
So it seems your problem is localised on the older EPD-6.1-1.

The question is: do you need to support this older version of EPD at all? 
The problem does not appear to be that matplotlib has stopped plotting 
masked arrays properly, or that something crucial has changed in Numpy.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] ANN: Numpy runs on Python 3

2010-07-10 Thread Pauli Virtanen

Hi,

As many of you probably already know, Numpy works fully on Python 3 and 
Python 2, with a *single code base*, since March. This work is scheduled 
to be included in the next releases 1.5 and 2.0.

Porting Scipy to work on Python 3 has proved to be much less work, and 
will probably be finished soon. (Ongoing work is here: http://
github.com/cournape/scipy3/commits/py3k , http://github.com/pv/scipy-work/
commits/py3k )

For those who are interested in already starting to port their stuff to 
Python 3, you can use Numpy's SVN trunk version. Grab it:

svn clone http://svn.scipy.org/svn/numpy/trunk/ numpy
cd numpy
python3 setup.py build

An important point is that supporting Python 3 and Python 2 in the same 
code base can be done, and it is not very difficult either. It is also 
much preferable from the maintenance POV to creating separate branches 
for Python 2 and 3. We attempted to log changes needed in Numpy at

http://projects.scipy.org/numpy/browser/trunk/doc/Py3K.txt

which may be useful (although not completely up-to-date) information for 
people wanting to do make the same transition in their own code.

(Announcement as recommended by our PR department @ EuroScipy :)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Difference between shape=() and shape=(1,)

2010-07-13 Thread Pauli Virtanen

ti, 2010-07-13 kello 10:06 -0700, Keith Goodman kirjoitti:
 No need to use where. You can just do a[np.isnan(a)] = 0. But you do
 have to watch out for 0d arrays, can't index into those.

You can, but the index must be appropriate:

 x = np.array(4)
 x[()] = 3
 x
array(3)


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] isinf raises in inf

2010-07-15 Thread Pauli Virtanen

Thu, 15 Jul 2010 09:54:12 -0500, John Hunter wrote:
[clip]
 In [4]: np.isinf(x)
 Warning: invalid value encountered in isinf Out[4]: True

As far as I know, isinf has always created NaNs -- since 2006 it has been 
defined on unsupported platforms as

(!isnan((x))  isnan((x)-(x)))

I'll replace it by the obvious

((x) == NPY_INFINITY || (x) == -NPY_INFINITY)

which is true only for +-inf, and cannot raise any FPU exceptions.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] isinf raises in inf

2010-07-16 Thread Pauli Virtanen

Testing with arithmetic can raise overflows and underflows.

I think the correct isinf is to compare to NPY_INFINITY and -NPY_INFINITY. 
Patch is attached to #1500

- Alkuperäinen viesti -
 On Thu, Jul 15, 2010 at 6:42 PM, John Hunter jdh2...@gmail.com wrote:
 
  On Thu, Jul 15, 2010 at 7:27 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
   
   
   On Thu, Jul 15, 2010 at 6:11 PM, John Hunter jdh2...@gmail.com
   wrote:

On Thu, Jul 15, 2010 at 6:14 PM, Eric Firing efir...@hawaii.edu
  wrote:
 Is it certain that the Solaris compiler lacks isinf?   Is it
 possible that it has it, but it is not being detected?

Just to clarify, I'm not using the sun compiler, but gcc-3.4.3 on
  solaris
x86
   
   Might be related to this thread.   What version of numpy are you
   using?
  
  svn HEAD (2.0.0.dev8480)
  
  After reading the thread you suggested, I tried forcing the
  
  CFLAGS=-DNPY_HAVE_DECL_ISFINITE
  
  flag to be set, but this is apparently a bad idea for my platform...
  
  File
  /home/titan/johnh/dev/lib/python2.4/site-packages/numpy/core/__init__.py,
  line 5, in ?
  import multiarray
  ImportError: ld.so.1: python: fatal: relocation error: file
  /home/titan/johnh/dev/lib/python2.4/site-packages/numpy/core/multiarray.so:
  symbol isfinite: referenced symbol not found
  
  so while I think my bug is related to that thread, I don't see
  anything in that thread to help me fix my problem.   Or am I missing
  something?
  __
  
 
 Can you try out this patch without David's fixes?
 
 diff --git a/numpy/core/include/numpy/npy_math.h
 b/numpy/core/include/numpy/npy_
 index d53900e..341fb58 100644
 --- a/numpy/core/include/numpy/npy_math.h
 +++ b/numpy/core/include/numpy/npy_math.h
 @@ -151,13 +151,13 @@ double npy_spacing(double x);
   #endif
 
   #ifndef NPY_HAVE_DECL_ISFINITE
 -       #define npy_isfinite(x) !npy_isnan((x) + (-x))
 +       #define npy_isfinite(x) (((x) + (x)) != (x)  (x) == (x))
   #else
           #define npy_isfinite(x) isfinite((x))
   #endif
 
   #ifndef NPY_HAVE_DECL_ISINF
 -       #define npy_isinf(x) (!npy_isfinite(x)  !npy_isnan(x))
 +       #define npy_isinf(x) (((x) + (x)) == (x)  (x) != 0)
   #else
           #define npy_isinf(x) isinf((x))
   #endif
 
 
 Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] isinf raises in inf

2010-07-16 Thread Pauli Virtanen

Thu, 15 Jul 2010 19:09:15 -0600, Charles R Harris wrote:
[clip]
 PS, of course we should fix the macro also. Since the bit values of +/-
 infinity are known we should be able to define them as constants using a
 couple of ifdefs and unions.

We already do that, NPY_INFINITY and -NPY_INFINITY.

[clip]
 int
 (isinf)(double x)
 {
 if ((-1.0  x)  (x  1.0))/* x is small, and thus finite */
 return (0);
 else if ((x + x) == x)/* only true if x == Infinity */
 return (1);
 else  /* must be finite (normal or subnormal), or NaN */
 return (0);
 }

This function can generate overflows, for example for

x = 0.6 * np.finfo(np.float64).max

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [SciPy-User] Saving Complex Numbers

2010-07-16 Thread Pauli Virtanen

Thu, 15 Jul 2010 20:00:09 -0400, David Warde-Farley wrote:

 (CCing NumPy-discussion where this really belongs)
 
 On 2010-07-08, at 1:34 PM, cfra...@uci.edu wrote:
 
 Need Complex numbers in the saved file.
 
 Ack, this has come up several times according to list archives and no
 one's been able to provide a real answer.
 
 It seems that there is nearly no formatting support for complex numbers
 in Python. for a single value, {0.real:.18e}{0.imag:+.18e}.format(val)
 will get the job done, but because of the way numpy.savetxt creates its
 format string this isn't a trivial fix.
 
 Anyone else have ideas on how complex number format strings can be
 elegantly incorporated in savetxt?

The bug here is that

 z = np.complex64(1+2j)
 '%.18e' % z
'1.00e+00'
 float(np.complex64(1+2j))
1.0

This should raise an exception. I guess the culprit is at 
scalarmathmodule.c.src:1023.

It should at least be changed to raise a ComplexWarning, which we now 
anyway do in casting.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] arraystruct__: about using PyCapsule instead of PyCObject for Python 2.7

2010-07-17 Thread Pauli Virtanen

Wed, 30 Jun 2010 12:13:38 -0600, Charles R Harris wrote:
[clip]
  Grrr... I didn't see the point, myself, I'm tempted to deprecate 2.7
  just
 to
  get even. There are some routines in the numpy/core/src includes that
  you might want to copy, they will allow you to use a common interface
  for PyCObject and PyCapsule if you need to do that.
 
 
 I've already fixed my code for PyCapsule. What's not clear to me is how
 to build (i.e, use the old cobject or the new capsules)
 __array_struct__ across NumPy and Python versions combinations. Will
 NumPy 1.x series ever support Python 2.7? In such case, should I use
 cobjects or capsules?

 We do support 2.7, but with PyCapsule. You might want to take a look at
 f2py also, as it also uses PyCapsule for Python = 2.7.

I think we need to change this decision. PyCObject is still available on 
Python 2.7, and will only raise a PendingDeprecationWarning, which does 
not show up by default. I believe the Python devs reversed the full 
deprecation before the final 2.7 release.

So I think we should just stick with PyCObject on 2.x, as we have done so 
far. I'll just bump the version checks so that PyCapsule is used only on 
3.x.

I'll commit this to Numpy SVN soon.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] 1.5.x branched

2010-07-17 Thread Pauli Virtanen

Dear all,

Based on patches contributed by Christoph Gohlke, I've created a branch 
for 1.5.x:

http://svn.scipy.org/svn/numpy/branches/1.5.x

It should be

  a) Binary compatible with Numpy 1.4 on Python 2.x.

 This meant rolling back the datetime and a couple other changes.

  b) Support Python 3.1 and 2.7.

I expect it will be easy to track changes in the trunk, even if preparing 
the release will still take some time.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyDocWeb the doc editor and github transition

2010-07-17 Thread Pauli Virtanen

Sat, 17 Jul 2010 13:11:46 -0600, Vincent Davis wrote:
 I have not seen anything about the move to github and how this effects
 pydocweb or did I just miss it. I am willing to adapt pydocweb to work
 with github but not sure what the desired integration would be. This
 should not be to difficult but should/needs to be part of the plan and
 documented.

You do not need to make any changes to the doc editor because of the 
github move -- only to its configuration. The documentation editor app 
itself does not know or care what the version control system is.

The required change is essentially replacing

svn checkout ...
svn update

by

git clone ...
git pull ...

in a shell script, and clearing up the checkout directories. I can take 
care of this on the server machine, once the move to git has been made.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy svn down

2010-07-17 Thread Pauli Virtanen

Sat, 17 Jul 2010 16:06:40 -0600, Charles R Harris wrote:
 At the moment... Chuck

Worksforme at the moment.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 1.5.x branched

2010-07-18 Thread Pauli Virtanen

Sun, 18 Jul 2010 21:13:43 +0800, Ralf Gommers wrote:
[clip]
 Builds fine on OS X 10.6 with both 2.7 and 3.1, and all tests pass. With
 one exception: in-place build for 3.1 is broken. Does anyone know is
 this is a distutils or numpy issue? The problem is that on import
 numpy.__config__ can not be found.

The inplace build goes to

build/py3k

I don't think we can easily support inplace build in the main directory.

[clip]
 Don't think that's going to happen, but it would be good to discuss the
 release schedule. What changes need to go in before a first beta, and
 how much time do you all need for that?

I don't have anything urgent.

It might be good to glance through the tickets, however, to see if 
there's something serious pending.

Also, I'm not 100% sure what's the Python 3 status on Windows, so that 
should be checked too.

Pauli 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Caching 2to3 results

2010-07-18 Thread Pauli Virtanen

[shameless plug]

For those of you tired of waiting for 2to3 after rm -rf build:

http://github.com/pv/lib2to3cache

and

cd numpy/
USE_2TO3CACHE=1 python3 setup.py build

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy Trac emails

2010-07-18 Thread Pauli Virtanen

Sun, 18 Jul 2010 23:59:24 +0800, Ralf Gommers wrote:

 Scipy Trac seems to work very well now, I get notification emails for
 comments on tickets etc. For numpy Trac, nothing right now. Can this be
 fixed?

Works for me.

Did you check if your email address is correct there? (- Preferences)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Complex nan ordering

2010-07-18 Thread Pauli Virtanen

Hi,

The current way of Numpy handles ordering of complex nan is not very well 
defined. We should attempt to clarify this for 1.5.0.

For example, what should these return:

r1 = np.maximum(complex(1, nan), complex(2, 0))

r2 = np.complex64(complex(1, nan))  np.complex64(complex(2, 0))

or, what should `r3` be after this:

r3 = np.array([complex(3, nan), complex(1, 0), complex(nan, 2)])
r3.sort()

Previously, we have defined a lexical ordering relation for complex 
numbers,

x  y  iff  x.real  y.real or (x.real == y.real and x.imag  y.imag)

but applying this to the above can cause some surprises:

amax([1, 2, 4, complex(3, nan)]) == 4

which breaks nan propagation, and moreover the result depends on the 
order of the items, and the precise way the algorithm is written.

***

Numpy IIRC has specified a lexical order between complex numbers for some 
time now, unlike Python in which complex numbers are unordered.

So we won't change how finite numbers are handled, only the nan handling 
needs to be specified.

***

I suggest the following, aping the way the real nan works:

- (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
  equivalent representations of cnan, as far as comparisons, sort
  order, etc are concerned. 

- The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
  means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
  cnan_2 if both are some cnans.

- Moreover, all comparisons , , ==, =, = where one or more operands
  is a cnan are false.

- Except that when sorting, cnans are to be placed last.

The advantages are now that nan propagation is now easier to implement, 
and we get faster code. Moreover, complex nans start to behave more 
similarly as their real counterparts in comparisons etc.; for instance in 
the above cases

r1 = (1, nan)
r2 = False
r3 = [complex(1, 0), complex(3, nan), complex(nan, 2)]

where in `r3` the order of the last two elements is unspecified.

This is in fact the SVN trunk now works (final tweaks in r8508, 8509). 

Comments are welcome.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Complex nan ordering

2010-07-18 Thread Pauli Virtanen

Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
 On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen p...@iki.fi wrote:
[clip]
 I suggest the following, aping the way the real nan works:

 - (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
  equivalent representations of cnan, as far as comparisons, sort
  order, etc are concerned.

 - The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
  means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
  cnan_2 if both are some cnans.

 The sort and cmp order was defined in 1.4.0, see the release notes.
 (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are
 tests to enforce this. Sort and searchsorted need to work together.

Ok, now we're diving into an obscure corner that hopefully many people 
don't care about :)

There are several issues here:

1) We should not use lexical order in comparison operations,
   since this contradicts real-valued nan arithmetic.

   Currently (and in 1.4) we do some weird sort of mixture,
   which seems inconsistent.

2) maximum/minimum should propagate nans, fmax/fmin should not

3) sort/searchsorted, and amax/argmax need to play together

4) as long as 1)-3) are valid, I don't think anybody cares what
   what exactly we mean by a complex nan, as long as

   np.isnan(complex nan) == True

   The fact that there are happen to be several different representations
   of a complex nan should not be important.

***

1) 

Unless we want to define

(complex(nan, 0)  complex(0, 0)) == True

we cannot strictly follow the lexical order in comparisons. And if we 
define it like this, we contradict real-valued nan arithmetic, which IMHO 
is quite bad.

Here, it would make sense to me to lump all the different complex nans 
into a single cnan, as far as the arithmetic comparison operations are 
concerned. Then,

z OP cnan == False

for all comparison operations.

In 1.4.1 we have

 import numpy as np
 np.__version__
'1.4.1'
 x = np.complex64(complex(np.nan, 1))
 y = np.complex64(complex(0, 1))
 x = y
False
 x  y
False
 x = np.complex64(complex(1, np.nan))
 y = np.complex64(complex(0, 1))
 x = y
True
 x  y
False

which seems an obscure mix of real-valued nan arithmetic and lexical 
ordering -- I don't think it's the correct choice...

Of course, the practical importance of this decision approaches zero, but 
it would be nice to be consistent.

***

2)

For maximum/amax, strict lexical order contradicts nan propagation:

maximum(1+nan*j, 2+0j) == 2+0j  ???

I don't see why we should follow the lexical order when both arguments 
are nans. The implementation will be faster if we don't.

Also, this way argmax (which should be nan-propagating) can stop looking 
once it finds the first nan -- and it does not need to care if later on 
in the array there would be a greater nan.

***

3)

For sort/searchsorted we have a technical reason to do something more, 
and there the strict lexical order seems the correct decision.

For `argmax` it was possible to be compatible with `amax` when lumping 
cnans in maximum -- just return the first cnan.

***

4)

As far as np.isnan is concerned,

 np.isnan(complex(0, nan))
True
 np.isnan(complex(nan, 0))
True
 np.isnan(complex(nan, nan))
True

So I think nobody should care which complex nan a function such as 
maximum or amax returns. 

We can of course give up some performance to look for the greatest nan 
in these cases, but I do not think that it would be very worthwhile.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Complex nan ordering

2010-07-19 Thread Pauli Virtanen

  However, nans have been propagated by maximum and minimum since 1.4.0.
 There was a question, discussed on the list, as to what 'nan' complex to
 return in the propagation, but it was still a nan complex in your
 definition of such objects. The final choice was driven by using the
 first of the already available complex nans as it was the easiest thing
 to do. However, (nan, nan) would be just as easy to do now that nans are
 available. 

Then we would be creating an inconsistency between amax/argmax. Of course if we 
say all cnans are equivalent, it doesn't matter.

 I'm not sure what your modifications to the macros buys us,
 what do you want to achieve?

1) fix bugs in nan propagation,

maximum(1, complex(0, nan))

used to return 1. The result also depended previously on the order of arguments.

2) make complex nan behave similarly as a real nan in comparisons (non-sorting).

I think both of these are worthwhile.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Complex128

2010-07-19 Thread Pauli Virtanen

Sun, 18 Jul 2010 21:15:15 -0500, Ross Harder wrote:

 mac os x leopard 10.5..
 EPD installed
 
 i just don't understand why i get one thing when i ask for another. i
 can get what i want, but only by not asking for it.

Do you get the same behavior also from

import numpy as np
np.array([0,0], dtype=np.complex256)

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread Pauli Virtanen

Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:

 I have an recarray -- the first column is date.
 
 I have the following function to compute the number of unique dates in
 my data set:
 
 
 def byName(): return(len(list(set(d['Date'])) ))

What this code does is:

1. d['Date']

   Extract an array slice containing the dates. This is fast.

2. set(d['Date'])

   Make copies of each array item, and box them into Python objects. 
   This is slow.

   Insert each of the objects in the set. Also this is somewhat slow.

3. list(set(d['Date']))

   Get each item in the set, and insert them to a new list.
   This is somewhat slow, and unnecessary if you only want to
   count.

4. len(list(set(d['Date'])))


So the slowness arises because the code is copying data around, and 
boxing it into Python objects.

You should try using Numpy functions (these don't re-box the data) to do 
this. http://docs.scipy.org/doc/numpy/reference/routines.set.html

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Pauli Virtanen

Thu, 22 Jul 2010 00:47:20 -0400, Robin Kraft wrote:
[clip]
 Let's say the image looks like this: np.random.randint(0,2,
 16).reshape(4,4)
 
 array([[0, 0, 0, 1],
[0, 0, 1, 1],
[1, 1, 0, 0],
[0, 0, 0, 0]])
 
 I want to use a square, non-overlapping moving window for resampling, so
 that I get a count of all the 1's in each 2x2 window.
 
 0, 0,   0, 1
 0, 0,   1, 1 0  3
 =   2  0
 1, 1,   0, 0
 0, 0,   0, 0
 
 In another situation with similar data I'll need the average, or the
 maximum value, etc..

Non-overlapping windows can be done by reshaping:

x = np.array([[0, 0, 0, 1, 1, 1],
  [0, 0, 1, 1, 0, 0],
  [1, 1, 0, 0, 1, 1],
  [0, 0, 0, 0, 1, 1],
  [1, 0, 1, 0, 1, 1],
  [0, 0, 1, 0, 0, 0]])

y = x.reshape(3,2,3,2)
y2 = y.sum(axis=3).sum(axis=1)

# - array([[0, 3, 2],
#   [2, 0, 4],
#   [1, 2, 2]])

y2 = x.reshape(3,2,3,2).transpose(0,2,1,3).reshape(3,3,4).sum(axis=-1)

# - array([[0, 3, 2],
#   [2, 0, 4],
#   [1, 2, 2]])


The above requires no copying of data, and should be relatively fast. If 
you need overlapping windows, those can be emulated with strides:

http://mentat.za.net/numpy/scipy2009/stefanv_numpy_advanced.pdf
http://conference.scipy.org/scipy2010/slides/tutorials
/stefan_vd_walt_numpy_advanced.pdf

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Can't get ufunc to work for integers

2010-07-22 Thread Pauli Virtanen

Thu, 22 Jul 2010 08:49:09 -0700, John Salvatier wrote:
 I am trying to learn how to create ufuncs, and I got a ufunc to compile
 correctly with the signature int - double, but I can't get it to accept
 any arguments. My function is testfunc and I used NPY_INT as the first
 signature and NPY_DOUBLE as the second signature. What should I look at
 to figure this out?

I think you need to specify NPY_LONG instead of NPY_INT -- the NPY_INT is 
IIRC sort of a virtual type that you can use when creating arrays, but 
no array is ever of type NPY_INT, since it maps to either NPY_LONG or 
whatever the C int type happens to be.

Maybe someone can correct me here, but AFAIK it goes like so.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subtract.reduce behavior

2010-07-22 Thread Pauli Virtanen

Thu, 22 Jul 2010 15:00:50 -0500, Johann Hibschman wrote:
[clip]
 Now, I'm on an older version (1.3.0), which might be the problem, but
 which is correct here, the code or the docs?

The documentation is incorrect.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Arrays of Python Values

2010-07-23 Thread Pauli Virtanen

Fri, 23 Jul 2010 01:16:41 -0700, Ian Mallett wrote:
[clip]
 Because I've never used arrays of Python objects (and Googling didn't
 turn up any examples), I'm stuck on how to sort the corresponding array
 in NumPy in the same way.

I doubt you will gain any speed by switching from Python lists to Numpy 
arrays containing Python objects.

 Of course, perhaps I'm just trying something that's absolutely
 impossible, or there's an obviously better way.  I get the feeling that
 having no Python objects in the NumPy array would speed things up even
 more, but I couldn't figure out how I'd handle the different attributes
 (or specifically, how to keep them together during a sort).
 
 What're my options?

One option could be to use structured arrays to store the data, instead 
of Python objects.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subtract.reduce behavior

2010-07-23 Thread Pauli Virtanen

Fri, 23 Jul 2010 10:29:47 -0400, Alan G Isaac wrote:
[clip]
   np.subtract.reduce([])
  0.0
 
 Getting a right identity for an empty array is surprising. Matching
 Python's behavior (raising a TypeError) seems desirable. (?)

I don't think matching Python's behavior is a sufficient argument for a 
change. As far as I see, it'd mostly cause unnecessary breakage, with no 
significant gain.

Besides, it's rather common to define

sum_{z in Z} z = 0
prod_{z in Z} z = 1

if Z is an empty set -- this can then be extended to other reduction 
operations. Note that changing reduce behavior would require us to 
special-case the above two operations.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] subtract.reduce behavior

2010-07-23 Thread Pauli Virtanen

Fri, 23 Jul 2010 11:17:56 -0400, Alan G Isaac wrote:
[clip]
 I also do not understand why there would have to be any special cases.

That's a technical issue: e.g. prod() is implemented via 
np.multiply.reduce, and it is not clear to me whether it is possible, in 
the ufunc machinery, to leave the identity undefined or whether it is 
needed in some code paths (as the right identity).

It's possible to define binary Ufuncs without an identity element (e.g. 
scipy.special.beta), so in principle the machinery to do the right thing 
is there.

 Returning a *right* identity for an operation that is otherwise a *left*
 fold is very odd, no matter how you slice it. That is what looks like
 special casing...

I think I see your point now.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 995 matches

Mail list logo