Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-15 Thread Derek Homeier
On 15 Feb 2016, at 6:55 pm, Jeff Reback  wrote:
> 
> https://github.com/pydata/pandas/releases/tag/v0.18.0rc1

Ah, think I forgot about the ‘releases’ pages.
Built on OS X 10.10 + 10.11 with python 2.7.11, 3.4.4 and 3.5.1.
17 errors in the test suite + 1 failure with python2.7 only; I can send
you details on the errors if desired, the majority seems to be generic
to a urllib problem with openssl on OS X anyway.

Thanks for the good work

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

2016-02-15 Thread Derek Homeier
On 14 Feb 2016, at 1:53 am, Jeff Reback  wrote:
> 
> I'm pleased to announce the availability of the first release candidate of 
> Pandas 0.18.0.
> Please try this RC and report any issues here: Pandas Issues
> We will be releasing officially in 1-2 weeks or so.
> 
Thanks, looking forward to give this a try!
Do you have a download link to the source for non-Conda users and packagers?
Finding anything in the github source tarball repositories without having the 
exact
path seems hopeless.

Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.11.0b1 is out

2016-02-01 Thread Derek Homeier

> On 31 Jan 2016, at 9:48 am, Sebastian Berg <sebast...@sipsolutions.net> wrote:
> 
> On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote:
>> On 27 Jan 2016, at 1:10 pm, Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>> 
>>> On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote:
>>>> Why the dot function/method is slower than @ on python 3.5.1?
>>>> Tested
>>>> from the latest 1.11 maintenance branch.
>>>> 
>>> 
>>> The explanation I think is that you do not have a blas
>>> optimization. In
>>> which case the fallback mode is probably faster in the @ case
>>> (since it
>>> has SSE2 optimization by using einsum, while np.dot does not do
>>> that).
>> 
>> I am a bit confused now, as A @ c is just short for A.__matmul__(c)
>> or equivalent
>> to np.matmul(A,c), so why would these not use the optimised blas?
>> Also, I am getting almost identical results on my Mac, yet I thought
>> numpy would
>> by default build against the VecLib optimised BLAS. If I build
>> explicitly against
>> ATLAS, I am actually seeing slightly slower results.
>> But I also saw these kind of warnings on the first timeit runs:
>> 
>> %timeit A.dot(c)
>> The slowest run took 6.91 times longer than the fastest. This could
>> mean that an intermediate result is being cached
>> 
>> and when testing much larger arrays, the discrepancy between matmul
>> and dot rather
>> increases, so perhaps this is more an issue of a less memory
>> -efficient implementation
>> in np.dot?
> 
> Sorry, I missed the fact that one of the arrays was 3D. In that case I
> am not even sure which if the functions call into blas or what else
> they have to do, would have to check. Note that `np.dot` uses a
> different type of combinging high dimensional arrays. @/matmul
> broadcasts extra axes, while np.dot will do the outer combination of
> them, so that the result is:
> 
> As = A.shape
> As.pop(-1)
> cs = c.shape
> cs.pop(-2)  # if possible
> result_shape = As + cs
> 
> which happens to be identical if only A.ndim > 2 and c.ndim <= 2.

Makes sense now; with A.ndim = 2 both operations take about the same time
(and are ~50% faster with VecLib than with ATLAS) and yield identical results,
while any additional dimension in A adds more overhead time to np.dot,
and the results are np.allclose, but not exactly identical.

Thanks,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.11.0b1 is out

2016-01-30 Thread Derek Homeier
On 27 Jan 2016, at 1:10 pm, Sebastian Berg  wrote:
> 
> On Mi, 2016-01-27 at 11:19 +, Nadav Horesh wrote:
>> Why the dot function/method is slower than @ on python 3.5.1? Tested
>> from the latest 1.11 maintenance branch.
>> 
> 
> The explanation I think is that you do not have a blas optimization. In
> which case the fallback mode is probably faster in the @ case (since it
> has SSE2 optimization by using einsum, while np.dot does not do that).

I am a bit confused now, as A @ c is just short for A.__matmul__(c) or 
equivalent
to np.matmul(A,c), so why would these not use the optimised blas?
Also, I am getting almost identical results on my Mac, yet I thought numpy would
by default build against the VecLib optimised BLAS. If I build explicitly 
against
ATLAS, I am actually seeing slightly slower results.
But I also saw these kind of warnings on the first timeit runs:

%timeit A.dot(c)
The slowest run took 6.91 times longer than the fastest. This could mean that 
an intermediate result is being cached

and when testing much larger arrays, the discrepancy between matmul and dot 
rather
increases, so perhaps this is more an issue of a less memory-efficient 
implementation
in np.dot?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Numpy 1.11.0b1 is out

2016-01-27 Thread Derek Homeier
On 27 Jan 2016, at 2:58 AM, Charles R Harris  wrote:
> 
> FWIW, the maintenance/1.11.x branch (there is no tag for the beta?) builds 
> and passes all tests with Python 2.7.11
> and 3.5.1 on  Mac OS X 10.10.
> 
> 
> You probably didn't fetch the tags, if they can't be reached from the branch 
> head they don't download automatically. Try `git fetch --tags upstream`

Thanks, that did it. Successfully tested v1.11.0b1 on 10.11 and with Python 
2.7.8 and 3.4.1 on openSUSE 13.2 as well.

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Numpy 1.11.0b1 is out

2016-01-26 Thread Derek Homeier
Hi Chuck,

> I'm pleased to announce that Numpy 1.11.0b1 is now available on sourceforge. 
> This is a source release as the mingw32 toolchain is broken. Please test it 
> out and report any errors that you discover. Hopefully we can do better with 
> 1.11.0 than we did with 1.10.0 ;)

the tarball seems to be incomplete, hope that does not bode ill ;-)

  adding 
'build/src.macosx-10.10-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to 
sources.
executing numpy/core/code_generators/generate_numpy_api.py
error: [Errno 2] No such file or directory: 
'numpy/core/code_generators/../src/multiarray/arraytypes.c.src'

> tar tvf /sw/src/numpy-1.11.0b1.tar.gz |grep arraytypes
>  
-rw-rw-r-- charris/charris   62563 2016-01-21 20:38 
numpy-1.11.0b1/numpy/core/include/numpy/ndarraytypes.h
-rw-rw-r-- charris/charris 981 2016-01-21 20:38 
numpy-1.11.0b1/numpy/core/src/multiarray/arraytypes.h

FWIW, the maintenance/1.11.x branch (there is no tag for the beta?) builds and 
passes all tests with Python 2.7.11
and 3.5.1 on  Mac OS X 10.10.

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Derek Homeier
On 16 Dec 2015, at 8:22 PM, Matthew Brett  wrote:
> 
>>> In [4]: %time testx = np.linalg.solve(testA, testb)
>>> CPU times: user 1min, sys: 468 ms, total: 1min 1s
>>> Wall time: 15.3 s
>>> 
>>> 
>>> so, it looks like you will need to buy a MKL license separately (which
>>> makes sense for a commercial product).
> 
> If you're on a recent Mac, I would guess that the default
> Accelerate-linked numpy / scipy will be in the same performance range
> as those linked to the MKL, but I am happy to be corrected.
> 
Getting around 30 s wall time here on a not so recent 4-core iMac, so that 
would seem to fit
(iirc Accelerate should actually largely be using the same machine code as MKL).

Cheers,
Derek



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate fromstring() for text reading?

2015-11-04 Thread Derek Homeier
On 3 Nov 2015, at 6:03 pm, Chris Barker - NOAA Federal  
wrote:
> 
> I was more aiming to point out a situation where the NumPy's text file reader 
> was significantly better than the Pandas version, so we would want to make 
> sure that we properly benchmark any significant changes to NumPy's text 
> reading code. Who knows where else NumPy beats Pandas?
> Indeed. For this example, I think a fixed-with reader really is a different 
> animal, and it's probably a good idea to have a high performance one in 
> Numpy. Among other things, you wouldn't want it to try to auto-determine data 
> types or anything like that.
> 
> I think what's on the table now is to bring in a new delimited reader -- I.e. 
> CSV in its various flavors.
> 
To add my own handful of change or at least another data point, I had been 
looking into both
the pandas and the Astropy fast readers as a fast loadtxt/genfromtxt 
replacement; at the time
I found the Astropy cparser source somewhat easier to dig into, although 
looking now Pandas'
parser.pyx seems clear enough as well.
Some comparison of the two can be found at
http://astropy.readthedocs.org/en/stable/io/ascii/fast_ascii_io.html#speed-gains

Unfortunately the Astropy fast reader currently does not support fixed-width 
format either, and
adding this functionality would require modifications to the tokenizer C code - 
not sure how
extensive.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: HDF5 for Python 2.5.0

2015-04-09 Thread Derek Homeier
On 9 Apr 2015, at 9:41 pm, Andrew Collette andrew.colle...@gmail.com wrote:
 
 Congrats! Also btw, you might want to switch to a new subject line format
 for these emails -- the mention of Python 2.5 getting hdf5 support made me
 do a serious double take before I figured out what was going on, and 2.6 and
 2.7 will be even worse :-)
 
 Ha!  Didn't even think of that.  For our next release I guess we'll
 have to go straight to h5py 3.5.

You may have to hurry though ;-)
Monday, March 30, 2015
Python 3.5.0a3 has been released.   This is the third alpha release of Python 
3.5, which will be the next major release of Python.  Python 3.5 is still under 
heavy development and is far from complete.”

3 alpha releases in 7 weeks…

On a more serious note though, h5py 2.5.x in the subject would be perfectly 
clear enough,
I think, and also help to distinguish from pytables releases.

Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt

2014-10-26 Thread Derek Homeier
On 26 Oct 2014, at 02:21 pm, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com 
wrote:

 Im not sure why the memory doubling is necessary. Isnt it possible to 
 preallocate the arrays and write to them? I suppose this might be inefficient 
 though, in case you end up reading only a small subset of rows out of a 
 mostly corrupt file? But that seems to be a rather uncommon corner case.
 
 Either way, id say a doubling of memory use is fair game for numpy. 
 Generality is more important than absolute performance. The most important 
 thing is that temporary python datastructures are avoided. That shouldn't be 
 too hard to accomplish, and would realize most of the performance and memory 
 gains, I imagine.

Preallocation is not straightforward because the parser needs to be able in 
general to work with streamed input.
I think I even still have a branch on github bypassing this on request (by 
keyword argument).
But a factor 2 is already a huge improvement over that factor ~6 coming from 
the current text readers buffering
the entire input as list of list of Python strings, not to speak of the vast 
performance gain from using a parser
implemented in C like pandas’ - in fact one of the last times this subject came 
up one suggestion was to steal
pandas.read_cvs and adopt as required.
Someone also posted some code or the draft thereof for using resizable arrays 
quite a while ago, which would
reduce the memory overhead for very large arrays.

Cheers,
Derek



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-04 Thread Derek Homeier
On 4 Oct 2014, at 08:37 pm, Ariel Rokem aro...@gmail.com wrote:

  import numpy as np
  np.__version__
 '1.9.0'
  np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
 [array([[ 2.,  2., -1.],
[ 2.,  2., -1.]]), array([[-0.5,  2.5,  5.5],
[ 1. ,  1. ,  1. ]])]
 
 On the other hand: 
  import numpy as np
  np.__version__
 '1.8.2'
  np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
 [array([[ 2.,  2., -1.],
[ 2.,  2., -1.]]), array([[ 1. ,  2.5,  4. ],
[ 1. ,  1. ,  1. ]])]
 
 For what it's worth, the 1.8 version of this function seems to be in 
 agreement with the Matlab equivalent function ('gradient'): 
  gradient([[1, 2, 6]; [3, 4, 5]])
 ans =
 1.2.50004.
 1.1.1.
 
 This seems like a regression to me, but maybe it's an improvement? 
 
Technically yes, the function has been changed to use 2nd-order differences 
where possible,
as is described in the docstring. Someone missed to update the example though, 
which still
quotes the 1.8 results.
And if the loss of Matlab-compliance is seen as a disadvantage, maybe there is 
a case for
re-enabling the old behaviour via keyword argument?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-04 Thread Derek Homeier

Hi Ariel,

 I think that the docstring in 1.9 is fine (has the 1.9 result). The docs 
 online (for all of numpy) are still on version 1.8, though. 
 
 I think that enabling the old behavior might be useful, if only so that I can 
 write code that behaves consistently across these two versions of numpy. For 
 now, I might just copy over the 1.8 code into my project. 
 
Hmm, I got this with 1.9.0:

Examples

 x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
 np.gradient(x)
array([ 1. ,  1.5,  2.5,  3.5,  4.5,  5. ])
 np.gradient(x, 2)
array([ 0.5 ,  0.75,  1.25,  1.75,  2.25,  2.5 ])

 np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
[array([[ 2.,  2., -1.],
   [ 2.,  2., -1.]]),
array([[ 1. ,  2.5,  4. ],
   [ 1. ,  1. ,  1. ]])]

In [5]: x =np.array([1, 2, 4, 7, 11, 16], dtype=np.float)

In [6]: print(np.gradient(x))
[ 0.5  1.5  2.5  3.5  4.5  5.5]

In [7]: print(np.gradient(x, 2))
[ 0.25  0.75  1.25  1.75  2.25  2.75]
…

I think there is a point for supporting the old behaviour besides 
backwards-compatibility or any sort of Matlab-compliance
as I’d probably like to be able to restrict a function to linear/1st order 
differences in cases where I know the input to be
not well-behaved.

+1 for an order=2 or maxorder=2 flag

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters

2014-08-27 Thread Derek Homeier
On 26 Aug 2014, at 09:05 pm, Adrian Altenhoff adrian.altenh...@inf.ethz.ch 
wrote:

 But you are right that the problem with using the first_values, which should 
 of course be valid,
 somehow stems from the use of usecols, it seems that in that loop
 
for (i, conv) in user_converters.items():
 
 i in user_converters and in usecols get out of sync. This certainly looks 
 like a bug, the entire way of
 modifying i inside the loop appears a bit dangerous to me. I’ll have look if 
 I can make this safer.
 Thanks.
 
 As long as your data don’t actually contain any missing values you might 
 also simply use np.loadtxt.
 Ok, wasn't aware of that function so far. I will try that!
 
It was first_values that needs to be addressed by the original indices.
I have created a short test from your case and submitted a fix at
https://github.com/numpy/numpy/pull/5006

Cheers,
Derek
 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters

2014-08-26 Thread Derek Homeier
Hi Adrian,

 I tried to load data from a csv file into numpy using genfromtxt. I need
 only a subset of the columns and want to apply some conversions to the
 data. attached is a minimal script showing the error.
 In brief, I want to load columns 1,2 and 4. But in the converter
 function for the 4th column, I get the 3rd value. The issue does not
 occur if I also load the 3rd column.
 Did I somehow misunderstand how the function is supposed to work or is
 this indeed a bug?

not sure whether to call it a bug; the error seems to arise before reading any 
actual data
(even on reading from an empty string); when genfromtxt is checking the 
filling_values used
to substitute missing or invalid data it is apparently testing on default 
testing values of 1 or -1
which your conversion scheme does not know about. Although I think it is rather 
the user’s
responsibility to provide valid converters, probably the documentation should 
at least be
updated to make them aware of this requirement.
I see two possible fixes/workarounds:

provide an keyword argument filling_values=[0,0,'1:1’]
or add the default filling values to your relEnum dictionary, e.g.  { … 
'-1':-1, '1':-1}

Could you check if this works for your case?

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters

2014-08-26 Thread Derek Homeier
Hi Adrian,

 not sure whether to call it a bug; the error seems to arise before reading 
 any actual data
 (even on reading from an empty string); when genfromtxt is checking the 
 filling_values used
 to substitute missing or invalid data it is apparently testing on default 
 testing values of 1 or -1
 which your conversion scheme does not know about. Although I think it is 
 rather the user’s
 responsibility to provide valid converters, probably the documentation 
 should at least be
 updated to make them aware of this requirement.
 I see two possible fixes/workarounds:
 
 provide an keyword argument filling_values=[0,0,'1:1’]
 This workaround seems to be work, but I doubt that the actual problem is
 the converter function I pass. The '-1', which is used as the testing
 value is the first_values from the 3rd column (line 1574 in npyio.py),
 but the converter is defined for column 4. by setting the filling_values
 to an array of length 3, this obviously makes the problem disappear. But
 I think if the first row is used, it should also use the values from the
 column for which the converter is defined.

it is certainly related to the converter function because a KeyError for the 
dictionary you provide is raised:
File test.py, line 13, in module
3: lambda rel: relEnum[rel.decode()]})
  File /sw/lib/python3.4/site-packages/numpy/lib/npyio.py, line 1581, in 
genfromtxt
missing_values=missing_values[i],)
  File /sw/lib/python3.4/site-packages/numpy/lib/_iotools.py, line 784, in 
update
tester = func(testing_value or asbytes('1'))
  File test.py, line 13, in lambda
3: lambda rel: relEnum[rel.decode()]})
KeyError: '-1’

But you are right that the problem with using the first_values, which should of 
course be valid,
somehow stems from the use of usecols, it seems that in that loop

for (i, conv) in user_converters.items():

i in user_converters and in usecols get out of sync. This certainly looks like 
a bug, the entire way of
modifying i inside the loop appears a bit dangerous to me. I’ll have look if I 
can make this safer.

As long as your data don’t actually contain any missing values you might also 
simply use np.loadtxt.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate

2014-08-05 Thread Derek Homeier
On 5 Aug 2014, at 11:27 pm, Matthew Brett matthew.br...@gmail.com wrote:

 OSX wheels built and tested and uploaded OK :
 
 http://wheels.scikit-image.org
 
 https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958
 
 Will test against the scipy stack later on today.

Built and tested against the Fink Python installation under OSX.
Seems to resolve one of a couple of f2py test errors appearing with 1.8.1 on 
Python 3.3 and 3.4:

==
ERROR: test_return_real.TestCReturnReal.test_all
--
Traceback (most recent call last):
  File /sw/lib/python3.4/site-packages/nose/case.py, line 382, in setUp
try_run(self.inst, ('setup', 'setUp'))
  File /sw/lib/python3.4/site-packages/nose/util.py, line 470, in try_run
return func()
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 348, in 
setUp
module_name=self.module_name)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in 
wrapper
memo[key] = func(*a, **kw)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 163, in 
build_code
module_name=module_name)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in 
wrapper
memo[key] = func(*a, **kw)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 144, in 
build_module
__import__(module_name)
ImportError: No module named ‘c_ext_return_real'

is gone on 3.4 now but still present on 3.3. Two errors of this kind (with 
different numbers) remain:

ERROR: test_return_real.TestF90ReturnReal.test_all
--
Traceback (most recent call last):
  File /sw/lib/python3.4/site-packages/nose/case.py, line 382, in setUp
try_run(self.inst, ('setup', 'setUp'))
  File /sw/lib/python3.4/site-packages/nose/util.py, line 470, in try_run
return func()
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 348, in 
setUp
module_name=self.module_name)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in 
wrapper
memo[key] = func(*a, **kw)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 163, in 
build_code
module_name=module_name)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 74, in 
wrapper
memo[key] = func(*a, **kw)
  File /sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py, line 144, in 
build_module
__import__(module_name)
ImportError: No module named ‘_test_ext_module_5415'

NumPy version 1.8.2rc1
NumPy is installed in /sw/lib/python3.4/site-packages/numpy
Python version 3.4.1 (default, Aug  3 2014, 21:02:44) [GCC 4.2.1 Compatible 
Apple LLVM 5.1 (clang-503.0.40)]
nose version 1.3.3

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] length - sticks algorithm

2014-07-29 Thread Derek Homeier
On 29 Jul 2014, at 02:43 pm, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Jul 29, 2014 at 12:47 PM, Josè Luis Mietta 
 joseluismie...@yahoo.com.ar wrote:
 Robert, thanks for your help!
 
 Now I have: 
 
 * Q nodes (Q stick-stick intersections)
 * a list 'NODES'=[(x,y,i,j)_1,, (x,y,i,j)_Q], where each element 
 (x,y,i,j) represent the intersection point (x,y) of the sticks i and j.
 * a matrix 'H' with Q elements {H_k,l}. 
 H_k,l=0 if nodes 'k' and 'l' aren't joined by a edge, and H_k,l = R_k,l = the 
 electrical resistance associated withthe union of the nodes 'k' and 'l' 
 (directly proportional to the length of the edge that connects these nodes).
 * a list 'nodes_resistances'=[R_1, ., R_Q].
 
 All nodes with 'j' (or 'i') = N+1 have a electric potential 'V' respect all 
 nodes with 'j' or 'i' = N.
 
 Now i must apply NODAL ANALYSIS for determinate the electrical current 
 through each of the edges, and the net current (see attached files). I have 
 no ideas about how to do that. Can you help me? 
 
 Please do not send largish binary attachments to this list. I do not know 
 off-hand how to do this, but it looks like the EE201 document you attached 
 tells you how. It is somewhat beyond the scope of this mailing list to help 
 you understand that document, sorry.
 
And it is not a good idea to post copyrighted journal articles to a list where 
they will end
up in a public list archive (even if not immediately recognisable so).

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] problems with mailing list ?

2014-07-18 Thread Derek Homeier
On 18 Jul 2014, at 01:07 pm, josef.p...@gmail.com wrote:

 Are the problems with sending out the messages with the mailing lists?
 
 I'm getting some replies without original messages, and in some threads I 
 don't get replies, missing part of the discussions.
 
There seem to be problems with the Scipy list server; my last mails to 
astr...@scipy.org have taken
12-18 hours before they made it to the list, and some people here reported 
messages staying in the
void for several days. But I think it’s been reported to Enthought already.

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genfromtxt universal newline support

2014-06-30 Thread Derek Homeier
Hi all,

I was just having a new look into the mess that is, imo, the support for 
automatic
line ending recognition in genfromtxt, and more generally, the Python file 
openers.
I am glad at least reading gzip files is no longer entirely broken in Python3, 
but
actually detecting in particular “old Mac” style CR line endings currently only 
work
for uncompressed and bzip2 files under 2.6/2.7.
This is largely because genfromtxt wants to open everything in binary mode,
which arguably makes no sense for ASCII text files with numbers. I think the
only reason this works in 2.x is that the ‘U’ reading mode overrides the ‘b’.

So on the Python side what actually works for automatic line ending detection 
is:

Python  2.6 2.7 3.2 3.3/3.4
uncompressed:   U   U   t   t
gzip:   E   N   E   t
bzip2:  U   U   E   t*
lzma:   -   -   -   t*

U - works with mode ‘rU’
E - mode ‘rU’ raises an error
N - mode ‘rU’ is accepted, but does not detect CR (‘\r’) line endings
(actually I think ‘U’ is simply internally discarded by gzip.open() in 
2.7.4+)
t - works with mode ‘rt’ (default with plain open())
- * means requires the '.open()' rather than the '.XXXFile()' method of 
bz2/lzma

Therefore I’d propose the changes in
https://github.com/dhomeier/numpy/commit/995ec93

to extend universal newline recognition as far as possible with the above 
openers.
There are some potential issues with this:

1. Switching to ‘rt’ mode for Python3.x  means that np.lib._datasource.open() 
does not
return byte strings by itself, so genfromtxt has to use asbytes() on the 
returned lines.
Since this occurs only in two places, I don’t see a major problem with this.
2. In the tests I had to work around the lack of fileobj support in bz2.BZ2File 
by using
os.system(‘bzip2 …’) on the temporary file, which might not work on all systems.
In particular I’d expect it to fail under Windows, but it’s not clear to me how 
far the entire
mkstemp thing works under Windows...

As a final note, http://bugs.python.org/issue13989#msg153127 suggests a 
workaround
that might make this work with gzip.open() (and perhaps bz2?) on 3.2 as well.
I am not sure how high 3.2 support is ranking for the near future; for the 
moment I am not
strongly inclined to implement it…

Grateful for comments or tests (especially under Windows!) of the commit(s) 
above -

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt universal newline support

2014-06-30 Thread Derek Homeier

On 30 Jun 2014, at 04:39 pm, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jun 30, 2014 at 12:33 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
 genfromtxt and loadtxt need an almost full rewrite to fix the botched
 python3 conversion of these functions. There are a couple threads
 about this on this list already.
 There are numerous PRs fixing stuff in these functions which I
 currently all -1'd because we need to fix the underlying unicode
 issues first.
 I have a PR were I started this for loadtxt but it is incredibly
 annoying to try to support all the broken use cases the function
 accidentally supported.
 
 1.9 beta still uses the broken functions because I had no time to get
 this done correctly.
 But we should probably put a big fat future warning into the release
 notes that genfromtxt and loadtxt may stop working for your binary
 streams.

What binary streams?

 That will probably allow us to start fixing these functions.
 
 +1 to doing the proper fix instead of piling up buggy hacks. Do we
 understand the difference between the current code and the proper
 code well enough to detect cases where they differ and issue warnings
 in those cases specifically?

Does it make sense to keep maintaing both functions at all? IIRC the idea that
loadtxt would be the faster version of the two has been discarded long ago,
thus it seems there is very little, if anything, loadtxt can do that cannot be 
done
just as well by genfromtxt. Main compatibility issue is probably different 
default
behaviour and interface of the two, but perhaps that might be best solved by
replacing loadtxt with another genfromtxt wrapper?
A real need, which had also been discussed at length, is a truly performant 
text IO
function (i.e. one using a compiled ASCII number parser, and optimally also a 
more
memory-efficient one), but unfortunately all people interested in implementing 
this
seem to have drifted away (not excluding myself from this)…

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt universal newline support

2014-06-30 Thread Derek Homeier
On 30 Jun 2014, at 04:56 pm, Nathaniel Smith n...@pobox.com wrote:

 A real need, which had also been discussed at length, is a truly performant 
 text IO
 function (i.e. one using a compiled ASCII number parser, and optimally also 
 a more
 memory-efficient one), but unfortunately all people interested in 
 implementing this
 seem to have drifted away (not excluding myself from this)…
 
 It's possible we could steal some code from Pandas for this. IIRC they
 have C/Cython text parsing routines. (It's also an interesting
 question whether they've fixed the unicode/binary issues, might be
 worth checking before rewriting from scratch...)

Good point, last time I was playing with Pandas it was not any faster, but now 
a 10x
speedup speaks for itself. Their C engine does not support generic whitespace 
separators,
but that could probably be addressed in a numpy implementation.

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt universal newline support

2014-06-30 Thread Derek Homeier
On 30.06.2014, at 23:10, Jeff Reback jeffreb...@gmail.com wrote:

 In pandas 0.14.0, generic whitespace IS parsed via the c-parser, e.g. 
 specifying '\s+' as a separator. Not sure when you were playing last with 
 pandas, but the c-parser has been in place since late 2012. (version 0.8.0)
 
 http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#text-parsing-api-changes

Ah, I did not see the '\s' syntax in the documentation and thought ' *' would 
be the only option.

Thanks,

Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.2rc1 release

2013-11-15 Thread Derek Homeier
On 13.11.2013, at 3:07AM, Charles R Harris charlesr.har...@gmail.com wrote:

 Python 2.4 fixes at  https://github.com/numpy/numpy/pull/4049.

Thanks for the fixes; builds under OS X 10.5 now as well. There are two test 
errors (or maybe as nose problem?):

NumPy version 1.7.2rc1
NumPy is installed in 
/sw/src/fink.build/root-numpy-py24-1.7.2rc1-1/sw/lib/python2.4/site-packages/numpy
Python version 2.4.4 (#1, Jan  5 2011, 03:05:41) [GCC 4.0.1 (Apple Inc. build 
5493)]
nose version 1.3.0
...
ERROR: Failure: SkipTest (Skipping test: test_special_values
Numpy is using complex functions (e.g. sqrt) provided by yourplatform's C 
library. However, they do not seem to behave accordingto C99 -- so C99 tests 
are skipped.)
--
Traceback (most recent call last):
  File /sw/lib/python2.4/site-packages/nose/failure.py, line 37, in runTest
if isinstance(self.exc_val, BaseException):
NameError: global name 'BaseException' is not defined

==
ERROR: Failure: SkipTest (Skipping test: test_special_values
Numpy is using complex functions (e.g. sqrt) provided by yourplatform's C 
library. However, they do not seem to behave accordingto C99 -- so C99 tests 
are skipped.)
--
Traceback (most recent call last):
  File /sw/lib/python2.4/site-packages/nose/failure.py, line 37, in runTest
if isinstance(self.exc_val, BaseException):
NameError: global name 'BaseException' is not defined



Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.2rc1 release

2013-11-12 Thread Derek Homeier
Hi,

On 03.11.2013, at 5:42PM, Julian Taylor jtaylor.deb...@googlemail.com wrote:

 I'm happy to announce the release candidate of Numpy 1.7.2.
 This is a bugfix only release supporting Python 2.4 - 2.7 and 3.1 - 3.3.

on OS X 10.5, build and tests succeed for Python 2.5-3.3, but Python 2.4.4 
fails with

/sw/bin/python2.4 setup.py build
Running from numpy source directory.
Traceback (most recent call last):
  File setup.py, line 214, in ?
setup_package()
  File setup.py, line 191, in setup_package
from numpy.distutils.core import setup
  File 
/sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/core.py,
 line 25, in ?
from numpy.distutils.command import config, config_compiler, \
  File 
/sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/command/build_ext.py,
 line 16, in ?
from numpy.distutils.system_info import combine_paths
  File 
/sw/src/fink.build/numpy-py24-1.7.2rc1-1/numpy-1.7.2rc1/numpy/distutils/system_info.py,
 line 235
finally:
  ^
SyntaxError: invalid syntax

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Removal of numarray and oldnumeric packages.

2013-09-23 Thread Derek Homeier
On 23.09.2013, at 7:03PM, Charles R Harris charlesr.har...@gmail.com wrote:

 I have gotten no feedback on the removal of the numarray and oldnumeric 
 packages. Consequently the removal will take place on 9/28. Scream now or 
 never...

The only thing I'd care about is the nd_image subpackage, but as far as I can 
see, that's already
just a wrapper to import scipy.ndimage. I take it there are no pure numpy 
implementations for
the likes of map_coordinates, right?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt and gzip

2013-06-11 Thread Derek Homeier
On 05.06.2013, at 9:52AM, Ted To rainexpec...@theo.to wrote:

 From the list archives (2011), I noticed that there is a bug in the
 python gzip module that causes genfromtxt to fail with python 2 but this
 bug is not a problem for python 3.  When I tried to use genfromtxt and
 python 3 with a gzip'ed csv file, I instead got:
 
 IOError: Mode rbU not supported
 
 Is this a bug?  I am using python 3.2.3 and numpy 1.7.1 from the
 experimental Debian repository.

Interesting, it used to be the other way round indeed - at least Python3's gzip
module was believed to work with 'U' mode (universal newline conversion).
This was apparently fixed in Python 2.7.3:
http://bugs.python.org/issue5148

but from the closing comment I'd take it should indeed _not_ be used in Python 3

The data corruption issue is now fixed in the 2.7 branch.

In 3.x, using a mode containing 'U' results in an exception rather than silent 
data corruption. 
Additionally, gzip.open() has supported text modes (rt/wt/at) and newline 
translation since 3.3

Checking the various Python versions on OS X 10.8 I found:

2.6.8: fails similar to the older 2.x, i.e. gzip opens with 'rbU', but then 
fails upon reading
(possibly randomly) with
/sw/lib/python2.6/gzip.pyc in _read_eof(self)
302 if crc32 != self.crc:
303 raise IOError(CRC check failed %s != %s % (hex(crc32),
-- 304  hex(self.crc)))

2.7.5: works as to be expected with the resolution of 5148 above.

3.1.5: works as well, which could just mean that the exception mentioned above 
has not
made it into the 3.1.x branch…

3.2.5+3.3.2: gzip.open raises the exception as documented.

This looks like the original issue, that the universal newline conversion 
should not be passed
to gzip.open (where it is meaningless or even harmful) is still not resolved; 
ideally the 'U' flag
should probably be caught in _datasource.py.
I take it from the comments on issue 5148 that 3.3's gzip module offers 
alternative methods to
do the newline conversion, but for 3.1, 3.2 and 2.6 this might still have to be 
done within either
_datasource.py or genfromtxt; however I have no idea if anyone has come up with 
a good
solution for this by now…

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] printing array in tabular form

2013-05-10 Thread Derek Homeier
On 10.05.2013, at 1:20PM, Sudheer Joseph sudheer.jos...@yahoo.com wrote:

 If some one has a quick way I would like to learn from them or get a 
 referecence 
 where the formatting part is described which was 
 my intention while posting here. As I have been using fortran I just tried 
 to use it to explain my requirement
 
Admittedly the formatting options in Python can be confusing to beginners, 
precisely
since they are much more powerful than for many other languages. As already 
pointed
out, formats of the type '(5i5)' are very common to Fortran programs and thus 
readily
supported by the language. np.savetxt is just a convenience function to support 
a number
of similarly common output types, and it can create csv, tab-separated, or 
plenty of other
outputs from a numpy array just out of the box. 
But you added to the confusion as you did not make it clear that you were not 
just requiring
a plain csv file as your Fortran example would create (and the first version 
did not even
have the commas); since this is a rather non-standard form you will just have 
to write a
short loop yourself, wether you are using Fortran or Python.

  Infact the program which should read this file 
 requires it in specified format which should look like
 IL = 1,2,3,4,5
  1,2,3,4,5
  1,2,3,4,5
 
The formats are all documented 
http://docs.python.org/2/library/string.html#format-specification-mini-language
one important thing to know is that you can pretty much add (i.e. concatenate) 
them like strings:

print((%6s+4*%d,+%d\n) % ((IL = ,)+tuple(IL[:5])))

or, perhaps a bit clearer:

fmt = %6s+4*%d,+%d\n
print_t = (IL = ,)+tuple(IL[:5])
print(fmt % print_t)

The other important bit to keep in mind is that all arguments have to be passed 
as tuples.
This should allow you to write a loop to print with a header or an empty 
header column
for the subsequent lines as you see fit. 
Except for the string field which is explicitly formatted %s here, this is 
mostly equivalent
to the example Henry just posted.

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] printing array in tabular form

2013-05-10 Thread Derek Homeier
On 10.05.2013, at 2:51PM, Daniele Nicolodi dani...@grinta.net wrote:

 If you wish to format numpy arrays preceding them with a variable name,
 the following is a possible solution that gives the same formatting as
 in your example:
 
 import numpy as np
 import sys
 
 def format(out, v, name):
header = {} = .format(name)
out.write(header)
np.savetxt(out, v, fmt=%d, delimiter=, ,
   newline=\n +   * len(header))
out.write(\n)
 
 IL = np.array([range(5), ] * 5)
 format(sys.stdout, IL, IL)

That is a quite ingenuous way to use savetxt functionality to write that extra 
column!

Only two comments:

Don't call that function format, as it would mask the 'format' builtin!

In the present version it will only work with a file handle; to print it a to 
file you would need
to pass it as fformat(open(fname, 'a'), … or check for that case inside the 
function.

Cheers,
Derek
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] printing array in tabular form

2013-05-07 Thread Derek Homeier
Dear Sudheer,

On 07.05.2013, at 11:14AM, Sudheer Joseph sudheer.jos...@yahoo.com wrote:

 I need to print few arrays in a tabular form for example below 
 array IL has 25 elements, is there an easy way to print this as 5x5 comma 
 separated table? in python
 
 IL=[]
 for i in np.arange(1,bno+1):
IL.append(i)
 print(IL)
 
assuming you want this table printed to a file, savetxt does just what you 
need. In brief for your case,

np.savetxt(file.txt, IL.reshape(-1,5), fmt='%5d', delimiter=',')

should print it in the requested form; you can refer to the save txt 
documentation for further options.

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Please stop bottom posting!!

2013-04-12 Thread Derek Homeier
On 12.04.2013, at 2:14AM, Charles R Harris charlesr.har...@gmail.com wrote:

 On Thu, Apr 11, 2013 at 5:49 PM, Colin J. Williams cjwilliam...@gmail.com 
 wrote:
 On 11/04/2013 7:20 PM, Paul Hobson wrote:
 On Wed, Apr 3, 2013 at 4:28 PM, Doug Coleman doug.cole...@gmail.com wrote:
 
 Also, gmail bottom-posts by default. It's transparent to gmail users. I'd 
 imagine they are some of the biggest offenders.
 
 Interesting. Mine go to the top by default and I always have to expand the 
 quoted text, trim down as necessary, and then reply below the relevant bits. 
 A quick gander at gmail's setting doesn't offer anything obvious. I'll dig 
 deeper later.
 
 
 ___
 NumPy-Discussion mailing list
 
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 Bottom posting seems to be the accepted Usenet standard.
 
 I don't care, can't someone can make a decision, so that we all do the same 
 thing?
 
 Please develop a rationale or toss a coin and let us know.  Numpy needs a 
 BDFL (or a shorter term, if you wish).
 
 
 It's always been bottom posting.

In German this kind of faux pas is usually labelled TOFU for text on top, 
full quote underneath,
and I think it has been a bit overlooked so far that the full quote part 
probably is the bigger problem.
IOW a call to try and trim the OP more rigourously should help a lot, and I'd 
think most people then
can agree on bottom posting (and I know the issue with mail clients doing that 
automatically - the thread
in question looks quite readable in Mountain Lion's Mail.app, but a nightmare 
on Snow Leopard!).

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trouble building numpy on different version of OSX.

2013-02-14 Thread Derek Homeier
On 14.02.2013, at 3:55PM, Steve Spicklemire st...@spvi.com wrote:

 I got Xcode 4,6 from the App Store. I don't think it's the SDK since the 
 python 2.7 version builds fine. It's just the 3.2 version that doesn't have 
 the -I/Library/Frameworks/Python.Framework/Versions/3.2/include/python3.2m in 
 the compile options line. When I run setup for 2.7 I see the right include. 
 I'm just not sure where setup is building those options, and why they're not 
 working on 10.7 and 10.8 and python3.2. Strange!

Where did you get the python3.2 from? Building the 1.7.0 release works for me 
under 10.8 and Xcode 4.6
both with the system-provided /usr/bin/python2.7 and with fink-installed 
versions of python2.7 and python3.2,
but in no case is it linking or including any 10.6 SDK:

C compiler: gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

compile options: '-Inumpy/core/include 
-Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/include/numpy 
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath 
-Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort 
-Inumpy/core/include -I/sw/include/python3.2m 
-Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/src/multiarray 
-Ibuild/src.macosx-10.8-x86_64-3.2/numpy/core/src/umath -c'

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how do I specify maximum line length when using savetxt?

2012-12-05 Thread Derek Homeier
On 06.12.2012, at 12:40AM, Mark Bakker wrote:

 I guess I wasn't explicit enough.
 Say I have an array with 100 numbers and I want to write it to a file
 with 6 numbers on each line (and hence, only 4 on the last line).
 Can I use savetxt to do that?
 What other easy tool does numpy have to do that?

I've just been looking into a similar case and I think there is no easy tool
for this - i.e. nothing comparable to Fortran's '(6e10.3)' or the like, so if
your array does not reshape to a Nx6 array, you'd probably have to write
something customised yourself. I would not be terribly difficult to add such
functionality to savetxt, but then, unless you want the output file to be more
human-readable, there is not really a strong case for writing a shape (100,)
array into 16 lines plus an incomplete one - it just would not play well with
reading back in and then determining the right shape automatically…

HTH,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple Loadtxt question

2012-11-28 Thread Derek Homeier
On 29.11.2012, at 1:21AM, Robert Love wrote:

 I have a file with thousands of lines like this:
 
 Signal was returned in 204 microseconds
 Signal was returned in 184 microseconds
 Signal was returned in 199 microseconds
 Signal was returned in 4274 microseconds
 Signal was returned in 202 microseconds
 Signal was returned in 189 microseconds
 
 I try to read it like this:
 
 
 data = np.loadtxt('dummy.data', dtype={'names':('label','times','musec'), 
 'fmts':('|S23','i8','|S13')})
 
 It fails, I think, because it wants a string format and field for each of the 
 words 'Signal' 'was' 'returned' etc.
 
 Can I make it treat that whole string before the number as one string, one 
 field?  All I really care about is the numbers anyway.
 
Then how about 

np.loadtxt('dummy.data', usecols=(4, ))

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Synonym standards

2012-07-27 Thread Derek Homeier
On 27.07.2012, at 3:27PM, Benjamin Root wrote:

  I would prefer not to use:  from xxx import *,
 
  because of the name pollution.
 
  The name  convention that I copied above facilitates avoiding the pollution.
 
  In the same spirit, I've used:
  import pylab as plb
 
 But in that same spirit, using np and plt separately is preferred.
 
 
 Namespaces are one honking great idea -- let's do more of those!
 from http://www.python.org/dev/peps/pep-0020/
 
 Absolutely correct.  The namespace pollution is exactly why we encourage 
 converts to move over from the pylab mode to separating out the numpy and 
 pyplot namespaces.  There are very subtle issues that arise when doing from 
 pylab import * such as overriding the built-in any and all.  The only 
 real advantage of the pylab mode over separating out numpy and pyplot is 
 conciseness, which many matlab users expect at first.

It unfortunately also comes with the convenience of using the ipython --pylab 
mode - 
does anyone know how to turn the import * part of, or how to create a similar 
working 
environment with ipython that does keep namespaces clean?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Synonym standards

2012-07-27 Thread Derek Homeier
On 27 Jul 2012, at 17:58, Tony Yu wrote:

 On Fri, Jul 27, 2012 at 11:39 AM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
 On 27.07.2012, at 3:27PM, Benjamin Root wrote:
 
   I would prefer not to use:  from xxx import *,
  
   because of the name pollution.
  
   The name  convention that I copied above facilitates avoiding the 
   pollution.
  
   In the same spirit, I've used:
   import pylab as plb
 
  But in that same spirit, using np and plt separately is preferred.
 
 
  Namespaces are one honking great idea -- let's do more of those!
  from http://www.python.org/dev/peps/pep-0020/
 
  Absolutely correct.  The namespace pollution is exactly why we encourage 
  converts to move over from the pylab mode to separating out the numpy and 
  pyplot namespaces.  There are very subtle issues that arise when doing 
  from pylab import * such as overriding the built-in any and all.  The 
  only real advantage of the pylab mode over separating out numpy and pyplot 
  is conciseness, which many matlab users expect at first.
 
 It unfortunately also comes with the convenience of using the ipython 
 --pylab mode -
 does anyone know how to turn the import * part of, or how to create a 
 similar working
 environment with ipython that does keep namespaces clean?
 
 Cheers,
 Derek
 
 
  There's a config flag that you can add to your ipython profile:
 
 c.TerminalIPythonApp.pylab_import_all = False
 
 For example, my profile is in ~/.ipython/profile_default/ipython_config.py
 
thanks, that was exactly what I was looking for - together with 

c.TerminalIPythonApp.exec_lines = ['import sys',
   'import numpy as np',
   'import matplotlib as mpl',
   'import matplotlib.pyplot as plt']

etc. to have the shortcuts.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Synonym standards

2012-07-27 Thread Derek Homeier
On 27.07.2012, at 8:30PM, Fernando Perez fperez@gmail.com wrote:

 On Fri, Jul 27, 2012 at 9:43 AM, Derek Homeier
 de...@astro.physik.uni-goettingen.de wrote:
 thanks, that was exactly what I was looking for - together with
 
 c.TerminalIPythonApp.exec_lines = ['import sys',
   'import numpy as np',
   'import matplotlib as mpl',
   'import matplotlib.pyplot as plt']
 
 Note that if you do this only and don't use %pylab interactively or
 the --pylab flag, then you will *not* get the proper non-blocking
 control of the matplotlib event loop integrated with the terminal or
 qtconsole.
 
 In summary, following Tony's suggestion is enough to give you:
 
 - event loop integration when you do --pylab at the prompt or %pylab in 
 ipython.
 - the np, mpl and plt shortcuts
 - no 'import *' at all.
 
 So that should be sufficient, but you should still use --pylab or
 %pylab to indicate to IPython that you want the mpl event loops to
 work in conjunction with the shell.

Yes, I was aware of that, without the pylab option at least with the macosx 
backend windows 
either would not draw and refresh properly, or block the shell after a draw() 
or show(); 
that's why I was asking how to avoid the 'import *' with it. I have not used 
the %pylab builtin
before, though.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] silly isscalar question

2012-05-29 Thread Derek Homeier
On 29 May 2012, at 15:00, Mark Bakker wrote:

 Why does isscalar('hello') return True?
 
 I thought it would check for a number?

No, it checks for something that is of 'scalar type', which probably can be 
translated as 'not equivalent to an array'. Since strings can form numpy 
arrays, 
I guess the logic behind this is that the string is the atomic block of an 
array 
of dtype 'S' - for comparison, np.isscalar(['hello']) = False. 
I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( 
('hello'), )...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] silly isscalar question

2012-05-29 Thread Derek Homeier
On 29 May 2012, at 15:42, Nathaniel Smith wrote:

 I note the fine distinction between np.isscalar( ('hello') ) and 
 np.isscalar( ('hello'), )...
 
 NB you mean np.isscalar( ('hello',) ), which creates a single-element
 tuple. A trailing comma attached to a value in Python normally creates
 a tuple, but in a function argument list it is treated as separating
 arguments instead, and a trailing empty argument is ignored. The
 parentheses need to be around the comma to hide it from from the
 argument list parsing rule so that the tuple rule can see it.
 (Probably you know this, but for anyone reading the archives later...)

Correct, sorry for the typo! 
I was actually puzzled by the habit of what seemed to me automatic unpacking of 
the 
simple case ('hello') as compared to ('hello', ); I only now looked up that by 
the Python 
syntax indeed the comma makes the tuple, not the parentheses, the latter only 
becoming necessary to protect the comma as you describe above.
Just stumbled on this as in several cases, numpy's rules for creating arrays 
from tuples 
are slightly different from those for creating arrays from lists. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-08 Thread Derek Homeier
On 06.05.2012, at 8:16AM, Paul Anton Letnes wrote:

 All tests for 1.6.2rc1 pass on
 Mac OS X 10.7.3
 python 2.7.2
 gcc 4.2 (Apple)

Passing as well on 10.6 x86_64 and on 10.5.8 ppc with 
python 2.5.6/2.6.6/2.7.2 Apple gcc 4.0.1, 
but I am  getting one failure on Lion (same with Python 2.5.6+2.6.7):

Python version 2.7.3 (default, May  6 2012, 15:05:35) [GCC 4.2.1 Compatible 
Apple Clang 3.0 (tags/Apple/clang-211.12)]
nose version 1.1.2
==
FAIL: Test basic arithmetic function errors
--
Traceback (most recent call last):
  File /sw/lib/python2.7/site-packages/numpy/testing/decorators.py, line 215, 
in knownfailer
return f(*args, **kwargs)
  File /sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py, line 
323, in test_floating_exceptions
lambda a,b:a*b, ft_tiny, ft_tiny)
  File /sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py, line 
271, in assert_raises_fpe
Type %s did not raise fpe error '%s'. % (ftype, fpeerr))
  File /sw/lib/python2.7/site-packages/numpy/testing/utils.py, line 34, in 
assert_
raise AssertionError(msg)
AssertionError: Type type 'numpy.complex64' did not raise fpe error ''.

--
Ran 3551 tests in 130.778s

FAILED (KNOWNFAIL=3, SKIP=4, failures=1)

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to check type of array?

2012-03-29 Thread Derek Homeier
On 29 Mar 2012, at 13:54, Chao YUE wrote:

 how can I check type of array in if condition expression?
 
 In [75]: type(a)
 Out[75]: type 'numpy.ndarray'
 
 In [76]: a.dtype
 Out[76]: dtype('int32')
 
 a.dtype=='int32'?

this and 

a.dtype=='i4'
a.dtype==np.int32

all work. For a more general check (e.g. if it is any type of integer), you can 
do

np.issubclass_(a.dtype.type, np.integer)

See also help(np.subdtype)

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to check type of array?

2012-03-29 Thread Derek Homeier
On 29 Mar 2012, at 14:49, Robert Kern wrote:

 all work. For a more general check (e.g. if it is any type of integer), you 
 can do
 
 np.issubclass_(a.dtype.type, np.integer)
 
 I don't recommend using that. Use np.issubdtype(a.dtype, np.integer) instead.

Sorry, you're right, this works the same way - I had the impression from the 
documentation 
that tests like np.issubdtype(np.int16, np.integer) would not work, but they do.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to Extract the Number of Rows and Columns in a Matrix

2012-03-26 Thread Derek Homeier
On 27.03.2012, at 1:26AM, Olivier Delalleau wrote:

 len(M) will give you the number of rows of M.
 For columns I just use M.shape[1] myself, I don't know if there exists a 
 shortcut.
 

You can use tuple unpacking, if that helps keeping your code conciser…

nrow, ncol = M.shape

Cheers,
Derek

 Le 26 mars 2012 19:03, Stephanie Cooke cooke.stepha...@gmail.com a écrit :
 Hello,
 
 I would like to extract the number of rows and columns of a matrix
 individually. The shape command outputs the rows and columns together,
 but are there commands that will separately give the rows and
 separately give the columns?
 
 Thanks

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] AttributeError with shape command

2012-03-26 Thread Derek Homeier
On 27.03.2012, at 2:07AM, Stephanie Cooke wrote:

 I am new to numpy. When I try to use the command array.shape, I get
 the following error:
 
 AttributeError: 'list' object has no attribute 'shape'
 
 Is anyone familiar with this type of error?

It means 'array' actually is not one, more precisely, not an object of type 
np.ndarray. 
How did you create your array? If it originates just from a list of numbers, 
you can 
create an array from it by 'np.array(list)' (assuming previous 'import numpy as 
np'). 

It's also possible that a function has returned a list of arrays where you 
might have 
expected a single array - so it really depends on the circumstances. 

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trying to read 500M txt file using numpy.genfromtxt within ipython shell

2012-03-20 Thread Derek Homeier
On 20 Mar 2012, at 14:40, Chao YUE wrote:

 I would be in agree. thanks!
 I use gawk to separate the file into many files by year, then it would be 
 easier to handle.
 anyway, it's not a good practice to produce such huge line txt files

Indeed it's not, but it's also not good practice to load the entire content 
of text files as python lists into memory, as unfortunately all the numpy 
readers are still doing. But this has been discussed on this list and 
improvements are under way. 
For your problem at hand the textreader Warren Weckesser recently 
made known - can't find the post right now, but you can find it at

https://github.com/WarrenWeckesser/textreader

might be helpful. It is still under construction, but for a plain csv file such 
as yours it should be working already. And since the text parsing is 
implemented in C, it should also give you a huge speedup for your 1/2 GB!

For additional profiling, similar to what David suggested, it would certainly 
be a good idea to read in smaller chunks of the file and write it directly to 
the netCDF file. Note that you can already read single lines at a time with the 
likes of

from StringIO import StringIO
f = open('file.txt'. 'r')
np.genfromtxt(StringIO(f.next()), delimiter=',')

but I don't think it would work this way with textreader, and iterating such a 
small 
loop over lines in Python would beat the point of using a fast reader. 
As your actual data would be  1GB in numpy, memory usage with textreader 
should also not be critical yet.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] remove redundant dimension in a ndarray?

2012-03-16 Thread Derek Homeier
Dear Chao,

 Do we have a function in numpy that can automatically shrink a ndarray with 
 redundant dimension?
 
 like I have a ndarray with shape of (13,1,1,160,1), now I have written a 
 small function to change the array to dimension of (13,160) [reduce the extra 
 dimension with length as 1].
 but I just would like to know maybe there is already something which can do 
 this there ?

np.squeeze

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file

2012-01-23 Thread Derek Homeier
On 23 Jan 2012, at 21:15, Emmanuel Mayssat wrote:

 Is there a way to save a structured array in a text file?
 My problem is not so much in the saving procedure, but rather in the
 'reloading' procedure.
 See below
 
 
 In [3]: import numpy as np
 
 In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', 'i8'), ('bar', 'f8')])
 
 In [5]: r.tofile('toto.txt',sep='\n')
 
 bash-4.2$ cat toto.txt
 ('1', 1, 1.0)
 ('1', 1, 1.0)
 ('1', 1, 1.0)
 
 In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
 ---
 ValueErrorTraceback (most recent call last)
 /home/cls1fs/clseng/10/ipython-input-7-b07ba265ede7 in module()
  1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
 
 ValueError: Unable to read character files of that array type

I think most of the np.fromfile functionality works for binary input; for 
reading text 
input np.loadtxt and np.genfromtxt are the (currently) recommended functions. 
It is bit tricky to read the format generated by tofile() in the above example, 
but 
the following should work:

cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)

Generally loadtxt works more smoothly together with savetxt, but the latter 
unfortunately 
does not offer an easy way to save structured arrays (note to self and others 
currently 
working on npyio: definitely room for improvement!).

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Saving and loading a structured array from a TEXT file

2012-01-23 Thread Derek Homeier
On 23 Jan 2012, at 22:07, Derek Homeier wrote:

 In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', 'i8'), ('bar', 
 'f8')])
 
 In [5]: r.tofile('toto.txt',sep='\n')
 
 bash-4.2$ cat toto.txt
 ('1', 1, 1.0)
 ('1', 1, 1.0)
 ('1', 1, 1.0)
 
 
 cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
 r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)
 
 Generally loadtxt works more smoothly together with savetxt, but the latter 
 unfortunately 
 does not offer an easy way to save structured arrays (note to self and others 
 currently 
 working on npyio: definitely room for improvement!).

For the record, in that example

np.savetxt('toto.txt', r, fmt='%s,%d,%f')

would work as well, saving you the custom converter for loadtxt - it could just 
become tedious 
to work out the format for more complex structures, so an option to construct 
this automatically 
from r.dtype could certainly be a nice enhancement. 
Just wondering, is there something like the inverse operator to 
np.format_parser, i.e. 
mapping each dtype to a default print format specifier?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 'Advanced' save and restore operation

2012-01-23 Thread Derek Homeier
On 24 Jan 2012, at 01:45, Olivier Delalleau wrote:

 Note sure if there's a better way, but you can do it with some custom load 
 and save functions:
 
  with open('f.txt', 'w') as f:
 ... f.write(str(x.dtype) + '\n')
 ... numpy.savetxt(f, x)
 
  with open('f.txt') as f:
 ... dtype = f.readline().strip()
 ... y = numpy.loadtxt(f).astype(dtype)
 
 I'm not sure how that'd work with structured arrays though. For the dict of 
 parameters you'd have to write your own load/save piece of code too if you 
 need a clean text file.
 
 -=- Olivier
 
 2012/1/23 Emmanuel Mayssat emays...@gmail.com
 After having saved data, I need to know/remember the data dtype to
 restore it correctly.
 Is there a way to save the dtype with the data?
 (I guess the header parameter of savedata could help, but they are
 only available in v2.0+ )
 
 I would like to save several related structured array and a dictionary
 of parameters into a TEXT file.
 Is there an easy way to do that?
 (maybe xml file, or maybe archive zip file of other files, or . )
 
 Any recommendation is helpful.

asciitable might be of some help, but to implement all of your required 
functionality, 
you'd probably still have to implement your own Reader class:

http://cxc.cfa.harvard.edu/contrib/asciitable/

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] find location of maximum values

2012-01-04 Thread Derek Homeier
On 04.01.2012, at 5:10AM, questions anon wrote:

 Thanks for your responses but I am still having difficuties with this 
 problem. Using argmax gives me one very large value and I am not sure what it 
 is. 
 There shouldn't be any issues with the shape. The latitude and longitude are 
 the same shape always (covering a state) and the temperature (TSFC) data are 
 hourly for a whole month. 

There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == LON.shape

One needs more information on the structure of these data to say anything 
definite, 
but if e.g. your TSFC data have a time and a location dimension, argmax will 
per default return the index for the flattened array (see the argmax 
documentation 
for details, and how to use the axis keyword to get a different output). 
This might be the very large value you mention, and if your location data have 
fewer 
dimensions, the index will easily be out of range. As Ben wrote, you'd need 
extra work to 
find the maximum location, depending on what maximum you are actually looking 
for. 

As a speculative example, let's assume you have the temperature data in an 
array(ntime, nloc) and the position data in array(nloc). Then 

TSFC.argmax(axis=1) 

would give you the index for the hottest place for each hour of the month 
(i.e. actually an array of ntime indices, and pointer to so many different 
locations). 

To locate the maximum temperature for the entire month, your best way would 
probably 
be to first extract the array of (monthly) maximum temperatures in each 
location as

tmax = TSFC.max(axis=0)

which would have (in this example) the shape (nloc,), so you could directly use 
it to index 

LAT[tmax.argmax()]   etc. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] structured array with submember assign

2011-12-26 Thread Derek Homeier
On 26.12.2011, at 7:37PM, Fabian Dill wrote:

 I have a problem with a structured numpy array.
 I create is like this:
 tiles = numpy.zeros((header[width], header[height],3), dtype =  
 numpy.uint8)
 and later on, assignments such as this:
 tiles[x, y,0] = 3
 
 Now uint8 is not sufficient anymore, but only for the first of the 3 values.
 uint16 for all of them would use too much ram (increase of 1-3 GB)
 
 I have tried using structured arrays, but the dtype is essentially always a 
 tuple.
 
 tiles = numpy.zeros((header[width], header[height], 1), dtype =  
 u2,u1,u1)
 
 tiles[x, y,0] = 0
 TypeError: expected an object with a buffer interface
 

If you create a structured array, you probably don't want the third dimension, 
as the 
structure already spans three fields, and to assign to it you either need to 
address 
the fields explicitly (with the default field names 'f0', 'f1', 'f2'), or use 
an array with 
corresponding dtype: 

 dt = u2,u1,u1
 tiles = numpy.zeros((2,3), dtype=dt)
 tiles
array([[(0, 0, 0), (0, 0, 0), (0, 0, 0)],
   [(0, 0, 0), (0, 0, 0), (0, 0, 0)]], 
  dtype=[('f0', 'u2'), ('f1', '|u1'), ('f2', '|u1')])

 tiles['f0'][0] = 1
 tiles[0,1] = np.array((3,4,5), dtype=dt)
 tiles
array([[(1, 0, 0), (3, 4, 5), (1, 0, 0)],
   [(0, 0, 0), (0, 0, 0), (0, 0, 0)]], 
  dtype=[('f0', 'u2'), ('f1', '|u1'), ('f2', '|u1')])

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy1.6.1 install fortran compiler error

2011-12-20 Thread Derek Homeier
Hi Jack,

 In order to install scipy,  I am trying to install numpy 1.6.1. on GNU/linux 
 redhat  2.6.18. 
 
 But, I got error about fortran compiler. 
 
 I have gfortran. I do not have f77/f90/g77/g90.
 
that's good!

 I run :
 python setup.py build --fcompiler=gfortran
 
 It woks well and tells me that 
 
 customize Gnu95FCompiler
 Found executable /usr/bin/gfortran
 
 But, i run: 
 
 building library npymath sources
 customize GnuFCompiler
 Could not locate executable g77
 Could not locate executable f77
 customize IntelFCompiler
 Could not locate executable ifort
 Could not locate executable ifc
 customize LaheyFCompiler
 Could not locate executable lf95
 customize PGroupFCompiler
 Could not locate executable pgf90
 Could not locate executable pgf77
 customize AbsoftFCompiler
 Could not locate executable f90
 customize NAGFCompiler
 Found executable /usr/bin/f95
 customize Gnu95FCompiler
 customize Gnu95FCompiler using config
 C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 
 -Wall -Wstrict-prototypes -fPIC
 
 Do I have to install f77/f90/g77/g90 ? 
 
You did not send any actual error message here, so it's difficult to tell 
where exactly your install failed. But gfortran is preferred over f77 etc. 
and should in fact be automatically selected (without the 
'--fcompiler=gfortran'), 
it is apparently also found in the right place. 
Could you send us the last lines of output with the error itself, or possibly  
everything following a line starting with Traceback:.. ; 
and also the output of `gfortran -v`?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy1.6.1 install fortran compiler error

2011-12-20 Thread Derek Homeier
On 20.12.2011, at 9:01PM, Jack Bryan wrote:

 customize Gnu95FCompiler using config
 C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 
 -Wall -Wstrict-prototypes -fPIC
 
 compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core 
 -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath 
 -Inumpy/core/include 
 -I/remote/dcnl/Ding/backup_20100716/python272/include/python2.7 -c'
 gcc: _configtest.c
 gcc -pthread _configtest.o -o _configtest
 success!
 removing: _configtest.c _configtest.o _configtest
 C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 
 -Wall -Wstrict-prototypes -fPIC
 
 
 
 
 compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core 
 -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath 
 -Inumpy/core/include -I/mypath/python272/include/python2.7 -c'
 gcc: _configtest.c
 gcc -pthread _configtest.o -o _configtest
 _configtest
 failure.
 
The blas failures further up are non-fatal, but I am not sure about the 
_configtest.c, 
or why it once succeeds, then fails again - anyway the installation appears to 
have 
finished. 

 at the end:
 
 running install_egg_info
 Removing /mypath/numpy/lib/python2.7/site-packages/numpy-1.6.1-py2.7.egg-info
 Writing /mypath/numpy/lib/python2.7/site-packages/numpy-1.6.1-py2.7.egg-info
 running install_clib
 
 Then 
 
 I got: 
 
 python
 Python 2.7.2 (default, Dec 20 2011, 12:32:10)
 [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
 Type help, copyright, credits or license for more information.
 
  import numpy
 Traceback (most recent call last):
   File stdin, line 1, in module
 ImportError: No module named numpy
 
 
 I have updated PATH for bin and lib of numpy. 
 
You will need '/mypath/numpy/lib/python2.7/site-packages' in your 
PYTHONPATH - have you done that, and does it show up with 

 import sys
 sys.path()

in the Python shell?

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-06 Thread Derek Homeier
On 06.12.2011, at 11:13PM, Wes McKinney wrote:

 This isn't the place for this discussion but we should start talking
 about building a *high performance* flat file loading solution with
 good column type inference and sensible defaults, etc. It's clear that
 loadtable is aiming for highest compatibility-- for example I can read
 a 2800x30 file in  50 ms with the read_table / read_csv functions I
 wrote myself recent in Cython (compared with loadtable taking  1s as
 quoted in the pull request), but I don't handle European decimal
 formats and lots of other sources of unruliness. I personally don't
 believe in sacrificing an order of magnitude of performance in the 90%
 case for the 10% case-- so maybe it makes sense to have two functions
 around: a superfast custom CSV reader for well-behaved data, and a
 slower, but highly flexible, function like loadtable to fall back on.
 I think R has two functions read.csv and read.csv2, where read.csv2 is
 capable of dealing with things like European decimal format.

Generally I agree, there's a good case for that, but I have to point out that 
the 1s time quoted there was with all auto-detection extravaganza turned on. 
Actually, if I remember the discussions right, in default, single-pass reading 
mode, it comes even close to genfromtxt and loadtxt (on my machine 
150-200 ms for 2800 rows x 30 columns real*8). Originally loadtxt was intended 
to be that no-frills, fast reader, but in practice it is rarely faster than 
genfromtxt as the conversion from input strings to Python objects seems to 
be the bottleneck most of the time. Speeding that up using Cython certainly 
would be a big gain (and then there also is the request to make loadtxt 
memory-efficient, which I have failed to follow up on for weeks and weeks…)

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loop through values in a array and find maximum as looping

2011-12-06 Thread Derek Homeier
On 07.12.2011, at 5:07AM, Olivier Delalleau wrote:

 I *think* it may work better if you replace the last 3 lines in your loop by:
 
 a=all_TSFC[0]
 if len(all_TSFC)  1:
 N.maximum(a, TSFC, out=a)
 
 Not 100% sure that would work though, as I'm not entirely confident I 
 understand your code.
 
 -=- Olivier
 
 2011/12/6 questions anon questions.a...@gmail.com
 Something fancier I think, 
 I am able to compare the result with my previous method so I can easily see I 
 am doing something wrong.
 see code below:
 
 
 all_TSFC=[]
 for (path, dirs, files) in os.walk(MainFolder):
 for dir in dirs:
 print dir
 path=path+'/'
 for ncfile in files:
 if ncfile[-3:]=='.nc':
 print dealing with ncfiles:, ncfile
 ncfile=os.path.join(path,ncfile)
 ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
 TSFC=ncfile.variables['T_SFC'][:]
 fillvalue=ncfile.variables['T_SFC']._FillValue
 TSFC=MA.masked_values(TSFC, fillvalue)
 ncfile.close()
 all_TSFC.append(TSFC)
 a=TSFC[0]
 for b in TSFC[1:]:
 N.maximum(a,b,out=a)
 
I also understood TSFC is already the array you want to work on, so above 
you'd just take a slice and overwrite the result in the next file iteration 
anyway. 
Iterating over the list all_TSFC should be correct, but I understood you 
don't want to load the entire input into memory in you working code.
Then you can simply skip the list, just need to take care of initial conditions 
- 
something like the following should do:

path=path+'/'
a = None
for ncfile in files:
if ncfile[-3:]=='.nc':
print dealing with ncfiles:, ncfile
ncfile=os.path.join(path,ncfile)
ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
TSFC=ncfile.variables['T_SFC'][:]
fillvalue=ncfile.variables['T_SFC']._FillValue
TSFC=MA.masked_values(TSFC, fillvalue)
ncfile.close()
if not is instance(a,N.ndarray):
a=TSFC
else:
N.maximum(a, TSFC, out=a)

HTH,
Derek

 big_array=N.ma.concatenate(all_TSFC)
 Max=big_array.max(axis=0)
 print max is, Max,a is, a
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loop through values in a array and find maximum as looping

2011-12-06 Thread Derek Homeier
On 07.12.2011, at 5:54AM, questions anon wrote:

 sorry the 'all_TSFC' is for my other check of maximum using concatenate and 
 N.max, I know that works so I am comparing it to this method. The only reason 
 I need another method is for memory error issues. 
 I like the code I have written so far as it makes sense to me. I can't get 
 the extra examples I have been given to work and that is most likely because 
 I don't understand them, these are the errors I get :
 
 Traceback (most recent call last):
   File d:\plot_summarystats\test_plot_remove_memoryerror_max.py, line 46, 
 in module
 N.maximum(a,TSFC,out=a)
 ValueError: non-broadcastable output operand with shape (106,193) doesn't 
 match the broadcast shape (721,106,193)
 
 and
 
OK, then it seems we did not indeed grasp the entire scope of the problem - 
since you have initialised a from the previous array TSFC (not from TSFC[0]?!), 
this can 
only mean the arrays read in come in different shapes? I don't quite understand 
how the 
previous version did not raise an error then; but if you only want the 
(106,193)-subarray 
you have indeed to keep the loop 
   for b in TSFC[:]:
N.maximum(a,b,out=a)

But you would have to find some way to distinguish between ndim=2 and ndim=3 
input, 
if really both can occur...
 
 Traceback (most recent call last):
   File d:\plot_summarystats\test_plot_remove_memoryerror_max.py, line 45, 
 in module
 if not instance(a, N.ndarray):
 NameError: name 'instance' is not defined
 
Sorry, typing error (or devious auto-correct?) - this should be 'isinstance()'

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] upsample or scale an array

2011-12-03 Thread Derek Homeier
On 03.12.2011, at 6:22PM, Robin Kraft wrote:

 That does repeat the elements, but doesn't get them into the desired order.
 
 In [4]: print a
 [[1 2]
  [3 4]]
 
 In [7]: np.tile(a, 4)
 Out[7]: 
 array([[1, 2, 1, 2, 1, 2, 1, 2],
[3, 4, 3, 4, 3, 4, 3, 4]])
 
 In [8]: np.tile(a, 4).reshape(4,4)
 Out[8]: 
 array([[1, 2, 1, 2],
[1, 2, 1, 2],
[3, 4, 3, 4],
[3, 4, 3, 4]])
 
 It's close, but I want to repeat the elements along the two axes, effectively 
 stretching it by the lower right corner:
 
 array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
 
 It would take some more reshaping/axis rolling to get there, but it seems 
 doable.
 
 Anyone know what combination of manipulations would work with the result of 
 np.tile?
 
Rolling was the keyword:

np.rollaxis(np.tile(a, 4).reshape(2,2,-1), 2, 1).reshape(4,4))
[[1 1 2 2]
 [1 1 2 2]
 [3 3 4 4]
 [3 3 4 4]]

I leave the generalisation and timing up to you, but it seems for 
a = np.arange(M**2).reshape(M,-1)

np.rollaxis(np.tile(a, N**2).reshape(M,N,-1), 2, 1).reshape(M*N,-1) 

should do the trick.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] upsample or scale an array

2011-12-03 Thread Derek Homeier
On 03.12.2011, at 6:47PM, Olivier Delalleau wrote:

 Ah sorry, I hadn't read carefully enough what you were trying to achieve. I 
 think the double repeat solution looks like your best option then.

Considering that it is a lot shorter than fixing the tile() result, you 
are probably right (I've only now looked closer at the repeat() 
solution ;-). I'd still be interested in the performance - since I think 
none of the reshape or rollaxis operations actually move any data 
in memory (for numpy  1.6), it might still be faster. 

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data

2011-12-01 Thread Derek Homeier
On 1 Dec 2011, at 17:39, Charles R Harris wrote:

 On Thu, Dec 1, 2011 at 6:52 AM, Thouis (Ray) Jones tho...@gmail.com wrote:
 Is this expected behavior?
 
  np.array([-345,4,2,'ABC'])
 array(['-34', '4', '2', 'ABC'], dtype='|S3')
 
 
 
 Given that strings should be the result, this looks like a bug. It's a bit of 
 a corner case that probably slipped through during the recent work on 
 casting. There needs to be tests for these sorts of things, so if you find 
 more oddities post them so we can add them.

As it is not dependent on the string appearing before or after the numbers, 
numerical values appear to always be processed first before any string 
transformation, 
even if you explicitly specify the string format - consider the following 
(1.6.1):

 np.array((2, 12,0.1+2j))
 array([  2.0+0.j,  12.0+0.j,   0.1+2.j])

 np.array((2, 12,0.001+2j))
 array([  2.e+00+0.j,   1.2000e+01+0.j,   1.e-03+2.j])

 np.array((2, 12,0.001+2j), dtype='|S8')
 array(['2', '12', '(0.001+2'], dtype='|S8')

- notice the last value is only truncated because it had first been converted 
into 
a standard complex representation, so maybe the problem is already in the way 
Python treats the input. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.array() of mixed integers and strings can truncate data

2011-12-01 Thread Derek Homeier
On 1 Dec 2011, at 21:35, Chris Barker wrote:

 On 12/1/2011 9:15 AM, Derek Homeier wrote:
 np.array((2, 12,0.001+2j), dtype='|S8')
  array(['2', '12', '(0.001+2'], dtype='|S8')
 
 - notice the last value is only truncated because it had first been 
 converted into
 a standard complex representation, so maybe the problem is already in the 
 way
 Python treats the input.
 
 no -- it's truncated because you've specified a 8 char long string, and 
 the string representation of complex is longer than that. I assume that 
 numpy is using the objects __str__ or __repr__:
 
 In [13]: str(0.001+2j)
 Out[13]: '(0.001+2j)'
 
 In [14]: repr(0.001+2j)
 Out[14]: '(0.001+2j)'
 
That's what I meant with the Python-side of the issue, but you're right, 
there is no 
numerical conversion involved. 

 I think the only bug we've identified here is that numpy is selecting 
 the string size based on the longest string input, rather than checking 
 to see how long the string representation of the numeric input is as 
 well. if there is a long-enough string in there, it works fine:
 
 In [15]: np.array([-345,4,2,'ABC', 'abcde'])
 Out[15]:
 array(['-345', '4', '2', 'ABC', 'abcde'],
   dtype='|S5')
 
 An open question is what it should do if you specify the length of the 
 string dtype, but one of the values can't be fit into that size. At this 
 point, it truncates, but should it raise an error?

I would probably raise a warning rather than an error - I think if the user 
explicitly specifies 
a string length, they should be aware that the data might be truncated (and 
might even 
want this behaviour). 
Another issue could be that the string representation can look quite 
different from what 
has been typed in, like 

In [95]: np.array(('abcdefg', 12,  0.1+2j), dtype='|S12')
Out[95]: array(['abcdefg', '12', '(1e-05+2j)'], dtype='|S12')

but then I think one has to accept that _ 0.1+2j _ is not a string and thus 
cannot be 
guaranteed to be represented in that exact way - it can be either understood as 
a 
numerical object or not at all (i.e. one should just type it in as a string - 
with quotes - 
if one wants string-behaviour). 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?

2011-10-25 Thread Derek Homeier
Hi,

On 25 Oct 2011, at 21:14, Pauli Virtanen wrote:

 25.10.2011 20:29, Matthew Brett kirjoitti:
 [clip]
 In [7]: (res-1) / 2**32
 Out[7]: 8589934591.98
 
 In [8]: np.float((res-1) / 2**32)
 Out[8]: 4294967296.0
 
 Looks like a bug in the C library installed on the machine, then.
 
 It's either in wontfix territory for us, or in the cast to doubles 
 before formatting one. In the latter case, one would have to maintain a 
 list of broken C libraries (ugh).

as it appears to be a Tiger-only problem, probably the former?

On 25 Oct 2011, at 21:13, Matthew Brett wrote:

 [mb312@jerry ~]$ uname -a
 Darwin jerry.bic.berkeley.edu 8.11.0 Darwin Kernel Version 8.11.0: Wed
 Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power
 Macintosh powerpc

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Printing individual array elements with at least 15 significant digits

2011-10-15 Thread Derek Homeier
On 15.10.2011, at 9:21PM, Hugo Gagnon wrote:

 I need to print individual elements of a float64 array to a text file.
 However in the file I only get 12 significant digits, the same as with:
 
 a = np.zeros(3)
 a.fill(1./3)
 print a[0]
 0.
 len(str(a[0])) - 2
 12
 
 whereas
 
 len(repr(a[0])) - 2
 17
 
 which makes more sense since I am expecting at least 15 significant
 digits…
 
 So how can I print a np.float64 with at least 15 significant digits
 (without repr!)?

You mean like 
 '%.15e' % (1./3)
'3.333e-01'
? 

If you are using e.g. savetxt to print to the file, you can specify the format 
the same way (actually the default for savetxt is already %.18e, which 
should satisfy your demands).

HTH,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Float128 integer comparison

2011-10-15 Thread Derek Homeier
On 15.10.2011, at 9:42PM, Aronne Merrelli wrote:

 
 On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,
 
 Continuing the exploration of float128 - can anyone explain this behavior?
 
  np.float64(9223372036854775808.0) == 9223372036854775808L
 True
  np.float128(9223372036854775808.0) == 9223372036854775808L
 False
  int(np.float128(9223372036854775808.0)) == 9223372036854775808L
 True
  np.round(np.float128(9223372036854775808.0)) == 
  np.float128(9223372036854775808.0)
 True
 
  
 I know little about numpy internals, but while fiddling with this, I noticed 
 a possible clue:
 
  np.float128(9223372036854775808.0) == 9223372036854775808L
 False
  np.float128(4611686018427387904.0) == 4611686018427387904L
 True
  np.float128(9223372036854775808.0) - 9223372036854775808L
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long'
  np.float128(4611686018427387904.0) - 4611686018427387904L
 0.0
 
 
 My speculation - 9223372036854775808L is the first integer that is too big to 
 fit into a signed 64 bit integer. Python is OK with this but that means it 
 must be containing that value in some more complicated object. Since you 
 don't get the type error between float64() and long:
 
  np.float64(9223372036854775808.0) - 9223372036854775808L
 0.0
 
 Maybe there are some unimplemented pieces in numpy for dealing with 
 operations between float128 and python arbitrary longs? I could see the == 
 test just producing false in that case, because it defaults back to some 
 object equality test which isn't actually looking at the numbers.

That seems to make sense, since even upcasting from a np.float64 still lets the 
test fail:
 np.float128(np.float64(9223372036854775808.0)) == 9223372036854775808L
False
while
 np.float128(9223372036854775808.0) == np.uint64(9223372036854775808L)
True

and 
 np.float128(9223372036854775809) == np.uint64(9223372036854775809L)
False
 np.float128(np.uint(9223372036854775809L) == np.uint64(9223372036854775809L)
True

Showing again that the normal casting to, or reading in of, a np.float128 
internally inevitably 
calls the python float(), as already suggested in one of the parallel threads 
(I think this 
also came up with some of the tests for precision) - leading to different 
results than 
when you can convert from a np.int64 - this makes the outcome look even weirder:

 np.float128(9223372036854775807.0) - 
 np.float128(np.int64(9223372036854775807)) 
1.0
 np.float128(9223372036854775296.0) - 
 np.float128(np.int64(9223372036854775807)) 
1.0
 np.float128(9223372036854775295.0) - 
 np.float128(np.int64(9223372036854775807)) 
-1023.0
 np.float128(np.int64(9223372036854775296)) - 
 np.float128(np.int64(9223372036854775807)) 
-511.0

simply due to the nearest np.float64 always being equal to MAX_INT64 in the two 
first cases 
above (or anything in between)... 

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt

2011-10-11 Thread Derek Homeier
Hi Nils,

On 11 Oct 2011, at 16:34, Nils Wagner wrote:

 How do I use genfromtxt to read a file with the following 
 lines
 
  11  2.2592365264892578D+01
  22  2.2592365264892578D+01
  13  2.669845581055D+00
  33  2.2592365264892578D+01
  24  2.669845581055D+00
  44  2.2592365264892578D+01
  35  2.669845581055D+00
  55  2.2592365264892578D+01
  46  2.669845581055D+00
  66  2.2592365264892578D+01
  17  2.9814243316650391D+00
  77  1.7259031295776367D+01
  28  2.9814243316650391D+00
  88  1.7259031295776367D+01
 ...
 
 
 names =(i,j,v)
 A = 
 np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','d')],names=names)
 V = A[:]['v']
 
 V
 array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN, 
 NaN,  NaN,  NaN,
 NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN, 
 NaN,  NaN,  NaN,
 NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN, 
 NaN,  NaN,  NaN,
 NaN,  NaN,  NaN])
 
 yields NaN, while
 
 convertfunc = lambda x: x.replace('D','E')
 names =(i,j,v)
 A = 
 np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','|S24')],names=names,converters={v:convertfunc})
 V = A[:]['v'].astype(float)
 V
 array([ 22.59236526,  22.59236526,   2.6698, 
 22.59236526,
  2.6698,  22.59236526,   2.6698, 
 22.59236526,
  2.6698,  22.59236526,   2.98142433, 
 17.2590313 ,
  2.98142433,  17.2590313 ,   2.98142433, 
  2.98142433,
  2.6698,  22.59236526,   2.98142433, 
  2.98142433,
  2.6698,  22.59236526,   2.98142433, 
  2.98142433,
  2.6698,  22.59236526,   2.98142433, 
  2.98142433,
  2.6698,  22.59236526,   2.98142433, 
  2.6698,
 17.2590313 ,   2.98142433,   2.6698, 
 17.2590313 ])
 
 
 works fine.

took me a moment to figure out what the actual problem remaining was, 
but expect you'd prefer it to load directly into a float record?
The problem is simply that the converter _replaces_ the default converter 
function (which would be float(x) in this case), rather than operating on top 
of it.

Try instead
convertfunc = lambda x: float(x.replace('D','E')) 
and you should be ready to use ('v', 'd') as dtype (BTW, specifying 'names' 
is redundant in the above example). 

This behaviour is only hinted at in the docstring example, so maybe the 
documentation should be clearer here.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nice float - integer conversion?

2011-10-11 Thread Derek Homeier
On 11 Oct 2011, at 20:06, Matthew Brett wrote:

 Have I missed a fast way of doing nice float to integer conversion?
 
 By nice I mean, rounding to the nearest integer, converting NaN to 0,
 inf, -inf to the max and min of the integer range?  The astype method
 and cast functions don't do what I need here:
 
 In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16)
 Out[40]: array([1, 0, 0, 0], dtype=int16)
 
 In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf]))
 Out[41]: array([1, 0, 0, 0], dtype=int16)
 
 Have I missed something obvious?

np.[a]round comes closer to what you wish (is there consensus 
that NaN should map to 0?), but not quite there, and it's not really 
consistent either!

In [42]: c = np.zeros(4, np.int16)
In [43]: d = np.zeros(4, np.int32)
In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c)
Out[44]: array([2, 0, 0, 0], dtype=int16)

In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d)
Out[45]: array([  2, -2147483648, -2147483648, -2147483648], 
dtype=int32)

Perhaps a starting point to harmonise this behaviour and get it closer to 
your expectations (it still would not be really nice having to define the 
output array first, I guess)...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nice float - integer conversion?

2011-10-11 Thread Derek Homeier
On 11.10.2011, at 9:18PM, josef.p...@gmail.com wrote:
 
 In [42]: c = np.zeros(4, np.int16)
 In [43]: d = np.zeros(4, np.int32)
 In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c)
 Out[44]: array([2, 0, 0, 0], dtype=int16)
 
 In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d)
 Out[45]: array([  2, -2147483648, -2147483648, -2147483648], 
 dtype=int32)
 
 Perhaps a starting point to harmonise this behaviour and get it closer to
 your expectations (it still would not be really nice having to define the
 output array first, I guess)...
 
 what numpy is this?
 
This was 1.6.1
I did suppress a RuntimeWarning that was raised on the first call, though:
In [33]: np.around([1.67,np.nan,np.inf,-np.inf], decimals=1, out=d)
/sw/lib/python2.7/site-packages/numpy/core/fromnumeric.py:37: RuntimeWarning: 
invalid value encountered in multiply
  result = getattr(asarray(obj),method)(*args, **kwds)

 np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16)
 array([ 1, -32768, -32768, -32768], dtype=int16)
 np.__version__
 '1.5.1'
 a = np.ones(4, np.int16)
 a[:]=np.array([1.6, np.nan, np.inf, -np.inf])
 a
 array([ 1, -32768, -32768, -32768], dtype=int16)
 
 
 I thought we get ValueError to avoid nan to zero bugs
 
 a[2] = np.nan
 Traceback (most recent call last):
  File pyshell#22, line 1, in module
a[2] = np.nan
 ValueError: cannot convert float NaN to integer

On master, an integer out raises a TypeError for any float input - not sure I'd 
consider that an improvement…

 np.__version__
'2.0.0.dev-8f689df'
 np.around([1.6,-23.42, -13.98, 0.14], out=c)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py, 
line 2277, in around
return _wrapit(a, 'round', decimals, out)
  File /Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py, 
line 37, in _wrapit
result = getattr(asarray(obj),method)(*args, **kwds)
TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to provided 
output parameter (typecode 'h') according to the casting rule “same_kind“

I thought the NaN might have been dealt  with first, before casting to int, 
but that doesn't seem to be the case (on master, again):

 np.around([1.6,np.nan,np.inf,-np.inf])
array([  2.,  nan,  inf, -inf])
 np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int16)
array([2, 0, 0, 0], dtype=int16)
 np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int32)
array([  2, -2147483648, -2147483648, -2147483648], dtype=int32)

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] what to use in buildbot config for numpy testing

2011-09-07 Thread Derek Homeier
On 07.09.2011, at 10:52PM, Chris Kees wrote:

 Is there a recommended way to run the numpy test suite as a buildbot
 test? Just run ad python -c import numpy; numpy.test as ShellCommand
 object?

It would be numpy.test() [or numpy.test('full')]; then it depends on what you 
need 
as the return value of your test. I am using for package verification

python -c 'import numpy, sys; ret=numpy.test(full); 
sys.exit(2*len(ret.errors+ret.failures))'

so python will return with a value != 0 in case of an unsuccessful test (which 
it 
otherwise would not do). But this is just within a simple shell script. 

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-06 Thread Derek Homeier
On 02.09.2011, at 11:45PM, Christopher Jordan-Squire wrote:
 
 and unfortunately it's for 1D-arrays only).
 
 That's not bad for this use -- make a row a struct dtype, and you've got
 a 1-d array anyway -- you can optionally convert to a 2-d array after
 the fact.
 
 I don't know why I didn't think of using fromiter() when I build
 accumulator.  Though what I did is a bit more flexible -- you can add
 stuff later on, too, you don't need to do it allat once.
 
 
 I'm unsure how to use fromiter for missing data. It sounds like a
 potential solution when no data is missing, though.

Strange I haven't thought about it before either; I guess for record arrays it 
comes more natural to view them as a collection of 1D arrays. 
However, you'd need to construct a list or something of ncolumn iterators from 
the input - should not be too hard; but then 
how do you feed the ncolumn fromiter() instances synchronously from that?? 
As far as I can see there is no way to make them read one item at a time, 
row by row. Then there are additional complications with multi-D dtypes, 
and in your case, especially datetime instances, but the problem that all 
columns 
have to be read in in parallel really seems to be the showstopper here. 
Of course for flat 2D arrays of data (all the same dtype) this would work 
with 
simply reshaping the array - that's probably even the most common use case for 
loadtxt, but that method lacks way too much generality for my taste.
Back to accumulator, I suppose. 

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way to load a 1Gb file?

2011-09-06 Thread Derek Homeier
On 02.09.2011, at 1:47AM, Russell E. Owen wrote:

 I've made a pull request 
 https://github.com/numpy/numpy/pull/144
 implementing that option as a switch 'prescan'; could you review it in 
 particular regarding the following:
 
 Is the option reasonably named and documented?
 
 In the case the allocated array does not match the input data (which 
 really should never happen), right now just a warning is issued, 
 filling any excess buffer with zeros or discarding remaining input data - 
 should this rather raise an IndexError?
 
 No prediction if/when I might be able to provide this for genfromtxt, sorry!
 
 Cheers,
Derek
 
 This looks like a great improvement to me! I think the name is well 
 chosen and the help is very clear.
 
Thanks for your feedback, just a few quick comments:

 A few comments:
 - Might you rename the variable l? It is easily confused with the 
 digit 1.
 - I don't understand the l  n_valid test, so this may be off base, but 
 I'm surprised that you first massage the data and then raise an 
 exception. Is the massaged data any use after the exception is raised? 
 Naively I would expect you to issue a warning instead of raising an 
 exception if you are going to handle the error by massaging the data.
 
The exception is currently caught right after the loop, which might seem a bit 
illogical, but the idea was to handle both cases in one place (if l  n_valid, 
trying to assign to X[l] will also raise an IndexError, meaning there are data 
left in the input that could not be stored) - so the present version indeed 
just issues a warning for both cases, but that could easily be changed...

 (It is a pity that your patch duplicates so much parsing code, but I 
 don't see a better way to do it. Putting conditionals in the parsing 
 loop to decide how to handle each line based on prescan would presumably 
 slow things down too much.)

That was my idea behind it - otherwise I would also have considered moving 
it out into its own functions, but as long as the entire code more or less fits 
into 
one editor window, this did not appear an obstacle to me. 

The main update on the issue is however, that all this is currently on hold 
because some concerns have been raised about not using dynamic resizing 
instead (the extra reading pass would break streamed input), and we have 
been discussing the best way to do this in another thread related to pull 
request 
https://github.com/numpy/numpy/pull/143 
(which implements similar functionality, plus a lot more, for a genfromtxt-like 
function). So don't be surprised if the loadtxt patch comes back later, in a 
completely revised form…

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-02 Thread Derek Homeier
On 30.08.2011, at 6:21PM, Chris.Barker wrote:

 I've submitted a pull request for a new method for loading data from
 text files into a record array/masked record array.
 
 Click on the link for more info, but the general idea is to create a
 regular expression for what entries should look like and loop over the
 file, updating the regular expression if it's wrong. Once the types
 are determined the file is loaded line by line into a pre-allocated
 numpy array.
 
 nice stuff.
 
 Have you looked at my accumulator class, rather than pre-allocating? 
 Less the class itself than that ideas behind it. It's easy enough to do, 
 and would keep you from having to run through the file twice. The cost 
 of memory re-allocation as the array grows is very small.
 
 I've posted the code recently, but let me know if you want it again.

I agree it would make a very nice addition, and could complement my 
pre-allocation option for loadtxt - however there I've also been made 
aware that this approach breaks streamed input etc., so the buffer.resize(…) 
methods in accumulator would be the better way to go. 
For load table this is not quite as straightforward, though, because the type 
auto-detection, strictly done, requires to scan the entire input, because a 
column full of int could still produce a float in the last row… 
I'd say one just has to accept that this kind of auto-detection is incompatible 
with input streams, and with the necessity to scan the entire data first 
anyway, 
pre-allocating the array makes sense as well. 

For better consistency with what people have likely got used to from npyio, 
I'd recommend some minor changes:

make spaces the default delimiter

enable automatic decompression (given the modularity, could you simply 
use np.lib._datasource.open() like genfromtxt?)

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-02 Thread Derek Homeier
On 02.09.2011, at 5:50PM, Chris.Barker wrote:

 hmmm -- it seems you could jsut as well be building the array as you go, 
 and if you hit a change in the imput, re-set and start again.
 
 In my tests, I'm pretty sure that the time spent file io and string 
 parsing swamp the time it takes to allocate memory and set the values.
 
 So there is little cost, and for the common use case, it would be faster 
 and cleaner.
 
 There is a chance, of course, that you might have to re-wind and start 
 over more than once, but I suspect that that is the rare case.
 
I still haven't studied your class in detail, but one could probably actually 
just create a copy of the array read in so far, e.g. changing it from a 
dtype=[('f0', 'i8'), ('f1', 'f8')] to dtype=[('f0', 'f8'), ('f1', 'f8')]  
as required - 
or even first implement it as a list or dict of arrays, that could be 
individually 
changed and only create a record array from that at the end. 
The required copying and extra memory use would definitely pale compared 
to the text parsing or the current memory usage for the input list. 
In my loadtxt version [https://github.com/numpy/numpy/pull/144] just parsing 
the text for comment lines adds ca. 10% time, while any of the array allocation 
and copying operations should at most be at the 1% level.
 
 enable automatic decompression (given the modularity, could you simply
 use np.lib._datasource.open() like genfromtxt?)
 
 I _think_this would benefit from a one-pass solution as well -- so you 
 don't need to de-compress twice.

Absolutely; on compressed data the time for the extra pass jumps up to +30-50%.

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] load from text files Pull Request Review

2011-09-02 Thread Derek Homeier
On 02.09.2011, at 6:16PM, Christopher Jordan-Squire wrote:

 I hadn't thought of that. Interesting idea. I'm surprised that
 completely resetting the array could be faster.
 
I had experimented a bit with the fromiter function, which also increases 
the output array as needed, and this creates negligible overhead compared
to parsing the text input (it is implemented in C, though, I don't know how 
the .resize() calls would compare to that; and unfortunately it's for 1D-arrays 
only).

 In my tests, I'm pretty sure that the time spent file io and string
 parsing swamp the time it takes to allocate memory and set the values.
 
 In my tests, at least for a medium sized csv file (about 3000 rows by
 30 columns), about 10% of the time was determine the types in the
 first read through and 90% of the time was sticking the data in the
 array.
 
This would be consistent with my experience (basically testing for comment 
characters and the length of line.split(delimiter) in the first pass). 

 However, that particular test took more time for reading in because
 the data was quoted (so converting '3,25' to a float took between
 1.5x and 2x as long as '3.25') and the datetime conversion is costly.
 
 Regardless, that suggests making the data loading faster is more
 important than avoiding reading through the file twice. I guess that
 intuition probably breaks if the data doesn't fit until memory,
 though. But I haven't worked with extremely large data files before,
 so I'd appreciate refutation/confirmation of my priors.
 
The lion's share in the data loading time, by my experience, is still the 
string 
operations (like the comma conversion you quote above), so I'd always 
expect any subsequent manipulations of the numpy array data to be very fast
compared to that. Maybe this changes slightly with more complex data types like 
string records or datetime instances, but as you indicate, even for those the 
conversion seems to dominate the cost. 

Cheers,
Derek
--

Derek Homeier  Centre de Recherche Astrophysique de Lyon
ENS Lyon  46, Allée d'Italie
69364 Lyon Cedex 07, France  +33 1133 47272-8894





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] saving groups of numpy arrays to disk

2011-08-26 Thread Derek Homeier
On 25.08.2011, at 8:42PM, Chris.Barker wrote:

 On 8/24/11 9:22 AM, Anthony Scopatz wrote:
You can use Python pickling, if you do *not* have a requirement for:
 
 I can't recall why, but it seem pickling of numpy arrays has been 
 fragile and not very performant.
 
Hmm, the pure Python version might be, but, I've used cPickle for a long time 
and never noted any stability problems. And it is still noticeably faster than 
pytables, in my experience. Still, for the sake of a standardised format I'd 
go with HDF5 any time now (and usually prefer h5py now when starting 
anything new - my pytables implementation mentioned above likely is not 
the most efficient compared to cPickle). 

But with the usual disclaimers, you should be able to simply use cPickle 
as a drop-in replacement in the example below.

Cheers,
Derek

On 21.08.2011, at 2:24PM, Pauli Virtanen wrote:

 You can use Python pickling, if you do *not* have a requirement for:
 
 - real persistence, i.e., being able to easily read the data years later
 - a standard data format
 - access from non-Python programs
 - safety against malicious parties (unpickling can execute some code
  in the input -- although this is possible to control)
 
 then you can use Python pickling:
 
   import pickle
 
   file = open('out.pck', 'wb')
   pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL)
   file.close()
 
   file = open('out.pck', 'rb')
   tree = pickle.load(file)
   file.close()

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] array_equal and array_equiv comparison functions for structured arrays

2011-08-26 Thread Derek Homeier
Hi,

as the subject says, the array_* comparison functions currently do not operate 
on structured/record arrays. Pull request 
https://github.com/numpy/numpy/pull/146
implements these comparisons.

There are two commits, differing in their interpretation whether two 
arrays with different field names, but identical data, are equivalent; i.e.

res = array_equiv(array((1,2), dtype=[('i','i4'),('v','f8')]),
  array((1,2), dtype=[('n','i4'),('f','f8')]))

is True in the current HEAD, but False in its parent.
Feedback and additional comments are invited. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trim a numpy array in numpy.

2011-08-16 Thread Derek Homeier
Hi Hongchun,

On 16 Aug 2011, at 23:19, Hongchun Jin wrote:

 I have a question regarding how to trim a string array in numpy. 
 
  import numpy as np
  x = np.array(['aaa.hdf', 'bbb.hdf', 'ccc.hdf', 'ddd.hdf'])
 
 I expect to trim a certain part of each element in the array, for example 
 '.hdf', giving me ['aaa', 'bbb', 'ccc', 'ddd']. Of course, I can do a loop 
 thing. However, in my actual dataset, I have more than one million elements 
 in such an array. So I am wondering is there a faster and better way to do 
 it, like STRMID function in IDL?  I try to google it, but it turns out that I 
 can not find any discussion about it.  Thanks. 
 
For a case like above, if you really have all constant length strings and want 
to truncate to a fixed length, you could simply do 

x.astype('|S3')

For more complex cases like trimming regex patterns I can't think of a numpy 
solution right now, coding the loop in cython might be a better bet there...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trim a numpy array in numpy.

2011-08-16 Thread Derek Homeier
On 16 Aug 2011, at 23:51, Hongchun Jin wrote:

 Thanks Derek for  the quick reply. But I am sorry, I did not make it clear in 
 my last email.  Assume I have an array like 
 ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf'
 
  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf'
 
  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ...,
 
  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'
 
  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'
 
  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf']
 
 I need to get the sub-string for date and time, for example,  
 
 '2008-01-31T23-56-35ZD' in the middle of each element. In more general cases, 
 the sub-string could be any part of the string in such an array.  I hope to 
 assign the start and stop of the sub-string when I am subsetting it.  
 
Well, maybe I was a bit too quick in my reply - see the documentation for 
np.char for some vectorized array operations that might be of use. 
Unfortunately, operations like 'lstrip' and 'rstrip' don't do exactly what you 
might them expect to, but you could use for example 
np.char.split(x,'.') 
to create an array of lists of substrings and then deal with them; something 
like removing the '.hdf' suffix would already require a somewhat lengthy 
recursion:

np.char.rstrip(np.char.rstrip(np.char.rstrip(np.char.rstrip(x, 'f'), 'd'), 
'h'), '.')

To also remove the leading substring in your case clearly would lead to a very 
clumsy expression...

It turns out however, something like the above for a similar test case with a 
length 10 array takes about 3 times longer than the np.char.split() way; 
but even that is slower than a direct loop over string functions:

In [6]: %timeit -n 10 y = np.char.split(x, '.')
10 loops, best of 3: 188 ms per loop

In [7]: %timeit -n 10 y = np.char.split(x, '.'); z = np.fromiter( (l[1] for l 
in y), dtype='|S3', count=x.shape[0])
10 loops, best of 3: 218 ms per loop

In [8]: %timeit -n 10 z = np.fromiter( (l.split('.')[1] for l in x), 
dtype='|S3', count=x.shape[0])
10 loops, best of 3: 143 ms per loop

So it seems all of the vectorization in np.char is not that great after all 
(and the direct loop might still be acceptable for 1.e6 elements...)!

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way to load a 1Gb file?

2011-08-10 Thread Derek Homeier
On 10 Aug 2011, at 19:22, Russell E. Owen wrote:

 A coworker is trying to load a 1Gb text data file into a numpy array 
 using numpy.loadtxt, but he says it is using up all of his machine's 6Gb 
 of RAM. Is there a more efficient way to read such text data files?

The npyio routines (loadtxt as well as genfromtxt) first read in the entire 
data as lists, which creates of course significant overhead, but is not easy to 
circumvent, since numpy arrays are immutable - so you have to first store the 
numbers in some kind of mutable object. One could write a custom parser that 
tries to be somewhat more efficient, e.g. first reading in sub-arrays from a 
smaller buffer. Concatenating those sub-arrays would still require about twice 
the memory of the final array. I don't know if using the array.array type 
(which is mutable) is much more efficient than a list...
To really avoid any excess memory usage you'd have to know the total data size 
in advance - either by reading in the file in a first pass to count the rows, 
or explicitly specifying it to a custom reader. Basically, assuming a 
completely regular file without missing values etc., you could then read in the 
data like 

X = np.zeros((n_lines, n_columns), dtype=float)
delimiter = ' '
for n, line in enumerate(file(fname, 'r')):
X[n] = np.array(line.split(delimiter), dtype=float)

(adjust delimiter and dtype as needed...)

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way to load a 1Gb file?

2011-08-10 Thread Derek Homeier
On 10 Aug 2011, at 22:03, Gael Varoquaux wrote:

 On Wed, Aug 10, 2011 at 04:01:37PM -0400, Anne Archibald wrote:
 A 1 Gb text file is a miserable object anyway, so it might be desirable
 to convert to (say) HDF5 and then throw away the text file.
 
 +1

There might be concerns about ensuring data accessibility agains throwing the 
text file away, but converting to HDF5 would be an elegant for reading in 
without the memory issues, too (I must confess though, I've regularly read ~ 
1GB ASCII files into memory - with decent virtual memory management it did not 
turn out too bad...)

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [ANN] Cython 0.15

2011-08-07 Thread Derek Homeier
On 7 Aug 2011, at 04:09, Sturla Molden wrote:

 Den 06.08.2011 11:18, skrev Dag Sverre Seljebotn:
 We are excited to announce the release of Cython 0.15, which is a huge
 step forward in achieving full Python language coverage as well as
 many new features, optimizations, and bugfixes.
 
 
 
 This is really great. With Cython progressing like this, I might soon 
 have written my last line of Fortran. :-)

+1 (except the bit about writing Fortran, probably ;-)

I am only getting 4 errors with Python 3.1 + 3.2 (Mac OS X 10.6/x86_64):
compiling (cpp) and running numpy_bufacc_T155, numpy_cimport, numpy_parallel, 
numpy_test...
I could not find much documentation about the runtests.py script (like how to 
figure out the exact gcc command used), but I am happy to send more details 
wherever requested. Adding a '-v' flag prints the following additional info:

numpy_bufacc_T155.c: In function ‘PyInit_numpy_bufacc_T155’:
numpy_bufacc_T155.c:3652: warning: ‘return’ with no value, in function 
returning non-void
.numpy_bufacc_T155.cpp: In function ‘PyObject* PyInit_numpy_bufacc_T155()’:
numpy_bufacc_T155.cpp:3652: error: return-statement with no value, in function 
returning ‘PyObject*’
Enumpy_cimport.c: In function ‘PyInit_numpy_cimport’:
numpy_cimport.c:3327: warning: ‘return’ with no value, in function returning 
non-void
.numpy_cimport.cpp: In function ‘PyObject* PyInit_numpy_cimport()’:
numpy_cimport.cpp:3327: error: return-statement with no value, in function 
returning ‘PyObject*’
Enumpy_parallel.c: In function ‘PyInit_numpy_parallel’:
numpy_parallel.c:3824: warning: ‘return’ with no value, in function returning 
non-void
.numpy_parallel.cpp: In function ‘PyObject* PyInit_numpy_parallel()’:
numpy_parallel.cpp:3824: error: return-statement with no value, in function 
returning ‘PyObject*’
Enumpy_test.c: In function ‘PyInit_numpy_test’:
numpy_test.c:11611: warning: ‘return’ with no value, in function returning 
non-void
.numpy_test.cpp: In function ‘PyObject* PyInit_numpy_test()’:
numpy_test.cpp:11611: error: return-statement with no value, in function 
returning ‘PyObject*’

This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, 
With Python 2.5-2.7 all 5536 tests are passing!

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [ANN] Cython 0.15

2011-08-07 Thread Derek Homeier
On 7 Aug 2011, at 22:31, Paul Anton Letnes wrote:

 Looks like you have done some great work! I've been using f2py in the past, 
 but I always liked the idea of cython - gradually wrapping more and more code 
 as the need arises. I read somewhere that fortran wrapping with cython was 
 coming - dare I ask what the status on this is? Is it a goal for cython to 
 support easy fortran wrapping at all?
 
Don't know if there is one besides fwrap, but 
http://pypi.python.org/pypi/fwrap/0.1.1
builds and tests OK on python 2.[5-7]. So I am bound to continue my Fortran 
writing...

  Keep up the good work!

Absolutely agreed!
Derek



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [ANN] Cython 0.15

2011-08-07 Thread Derek Homeier
On 7 Aug 2011, at 23:27, Dag Sverre Seljebotn wrote:

 Enumpy_test.c: In function ‘PyInit_numpy_test’:
 numpy_test.c:11611: warning: ‘return’ with no value, in function returning 
 non-void
 .numpy_test.cpp: In function ‘PyObject* PyInit_numpy_test()’:
 numpy_test.cpp:11611: error: return-statement with no value, in function 
 returning ‘PyObject*’
 
 This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed,
 With Python 2.5-2.7 all 5536 tests are passing!
 
 I believe this is http://projects.scipy.org/numpy/ticket/1919
 
 Can you confirm?
 
 I don't think there's anything we can do on the Cython end to fix this, 
 if the report is correct.

Yes, the proposed patch fixes the errors! I have added a comment to the ticket, 
hopefully this can be merged soon. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] longlong format error with Python = 2.6 in scalartypes.c

2011-08-04 Thread Derek Homeier
Hi,

commits c15a807e and c135371e (thus most immediately addressed to Mark, but I 
am sending this to the list hoping for more insight on the issue) introduce a 
test failure with Python 2.5+2.6 on Mac:

FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime)
--
Traceback (most recent call last):
  File 
/Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py, 
line 219, in test_timedelta_scalar_construction
assert_equal(str(np.timedelta64(3, 's')), '3 seconds')
  File /Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py, line 
313, in assert_equal
raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: '%lld seconds'
 DESIRED: '3 seconds'

due to the lld format passed to PyUString_FromFormat in scalartypes.c. 
In the current npy_common.h I found the comment 
 *  in Python 2.6 the %lld formatter is not supported. In this
 *  case we work around the problem by using the %zd formatter.
though I did not notice that problem when I cleaned up the NPY_LONGLONG_FMT 
definitions in that file (and it is not entirely clear whether the comment only 
pertains to Windows...). Anyway changing the formatters in scalartypes.c to 
zd as well removes the failure and still works with Python 2.7 and 3.2 (at 
least on Mac OS). However I am wondering if 
a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python version 
(and if %zu is a valid formatter), and 
b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h

I am attaching a patch implementing a), but only the quick and dirty solution 
to b).

Cheers,
Derek



npy_longlong_fmt.patch
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Segmentation Fault in Numpy.test()

2011-08-02 Thread Derek Homeier
On 2 Aug 2011, at 18:57, Thomas Markovich wrote:

 It appears that uninstalling python 2.7 and installing the scipy  
 superpack with the apple standard python removes the

Did the superpack installer automatically install numpy to the  
python2.7 directory when present? Even if so, I reckon you could  
simply reinstall python2.7 after the numpy installation (still calling  
python2.6 to use numpy of course...).

 segfaulting behavior from numpy. Now it appears that just scipy is  
 segfaulting at test test_arpack.test_hermitian_modes(True, std- 
 hermitian, 'F', 2, 'SM', None, 0.5, function aslinearoperator at  
 0x1043b1848) ... Segmentation fault

Which architecture is this? Being on Snow Leopard, probably x86_46...
I remember encountering similar problems on PPC, which I suspect are  
related to stability issues with Apple's Accelerate framework.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice

2011-08-02 Thread Derek Homeier
On 2 Aug 2011, at 19:15, Christopher Barker wrote:

 In [32]: s = numpy.array(a, dtype=tfc_dtype)
 ---
 TypeError Traceback (most recent  
 call last)

 /Users/cbarker/ipython console in module()

 TypeError: expected a readable buffer object

 OK -- I can see why you'd expect that to work. However, the trick with
 structured dtypes is that the dimensionality of the inputs can be less
 than obvious -- you are passing in a 1-d list of 4 numbers -- do you
 want a 1-d array? or ? -- in this case, it's pretty obvious (as a  
 human)
 what you would want -- you have a dtype with four fields, and you're
 passing in four numbers, but there are so many possible combinations
 that numpy doesn't try to be smart about it. So as a rule, you  
 need to
 be quite specific when working with structured dtypes.

 However, the default is for numpy to map tuples to dtypes, so if you
 pass in a tuple instead, it works:

 In [34]: t = tuple(a)

 In [35]: s = numpy.array(t, dtype=tfc_dtype)

 In [36]: s
 Out[36]:
 array((32000L, 0.789131, 0.00805999, 3882.22),
   dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom',  
 'f8')])

 you were THIS close!

Thanks for the detailed discussion! BTW this works also without  
explicitly converting the words one by one:

In [1]:  l = '  32000  7.89131E-01  8.05999E-03  3.88222E+03'
In [2]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e',  
'f8'),('fom', 'f8')])
In [3]: numpy.array(tuple(l.split()), dtype=tfc_dtype)
Out[3]:
array((32000L, 0.789131, 0.00805999, 3882.22),
   dtype=[('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom',  
'f8')])

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I index array starting with 1?

2011-07-28 Thread Derek Homeier
On 29.07.2011, at 1:19AM, Stéfan van der Walt wrote:

 On Thu, Jul 28, 2011 at 4:10 PM, Anne Archibald
 aarch...@physics.mcgill.ca wrote:
 Don't forget the everything-looks-like-a-nail approach: make all your
 arrays one bigger than you need and ignore element zero.
 
 Hehe, why didn't I think of that :)
 
 I guess the kind of problem I struggle with more frequently is books
 written with summations over -m to +n.  In those cases, it's often
 convenient to use the mapping function, so that I can enter the
 formulas as they occur.

I don't want to open any cans of worms at this point, but given that Fortran90 
supports such indexing (arbitrary limits, including negative ones), there 
definitely are use cases for it (or rather, instances where it is very 
convenient at least, like in Stéfan's books). So I am wondering how much it 
would take to implement such an enhancement for the standard ndarray...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Can I index array starting with 1?

2011-07-28 Thread Derek Homeier
On 29.07.2011, at 1:38AM, Anne Archibald wrote:

 The can is open and the worms are everywhere, so:
 
 The big problem with one-based indexing for numpy is interpretation.
 In python indexing, -1 is the last element of the array, and ranges
 have a specific meaning. In a hypothetical one-based indexing scheme,
 would the last element be element 0? if not, what does looking up zero
 do? What about ranges - do ranges still include the first endpoint and
 not the second? I suppose one could choose the most pythonic of the
 1-based conventions, but do any of them provide from-the-end indexing
 without special syntax?
 
I forgot, this definitely needs to be preserved for ndarray!

 Once one had decided what to do, implementation would be pretty easy -
 just make a subclass of ndarray that replaces the indexing function.

In fact, Stéfan's reshuffling trick does nearly everything I would expect for 
using negative indices, maybe the only functionality needed to implement is 
1. define an attribute like x.start that could tell appropriate functions (e.g. 
for print(x) or plot(x)) the zero-point, so x would be evaluated e.g. at 
x[-5], wrapping around at [x-1], x[0] to x[-6]... Should have the advantage 
that anything that's not yet aware of this attribute could simply ignore it. 
2. allow to automatically set this starting point when creating something like 
x = np.zeros(-5:7) or setting a shape to (-5:7) - but maybe the latter is 
leading into very dangerous territory already...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2

2011-07-08 Thread Derek Homeier
On 07.07.2011, at 7:16PM, Robert Pyle wrote:

 .../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922:
  RuntimeWarning: invalid value encountered in absolute
  return all(less_equal(absolute(x-y), atol + rtol * absolute(y)))
 
 
 Everything else completes with 3 KNOWNFAILs and 1 SKIP.  This warning is not 
 new to this release; I've seen it before but haven't  tried tracking it down 
 until today.
 
 It arises in allclose().  The comments state If either array contains NaN, 
 then False is returned. but no test for NaN is done, and NaNs are indeed 
 what cause the warning.
 
 Inserting
 
if any(isnan(x)) or any(isnan(y)):
return False
 
 before current line number 1916 in numeric.py seems to  fix it.

The same warning is still present in the current master, I just never paid 
attention to it because the tests still pass (it does correctly identify NaNs 
because they are not less_equal the tolerance), but of course this should be 
properly fixed as you suggest. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] broacasting question

2011-06-30 Thread Derek Homeier
On 30.06.2011, at 7:32PM, Thomas K Gamble wrote:

 I'm trying to convert some IDL code to python/numpy and i'm having some 
 trouble understanding the rules for boradcasting during some operations.
 example:
 
 given the following arrays:
 a = array((2048,3577), dtype=float)
 b = array((256,25088), dtype=float)
 c = array((2048,3136), dtype=float)
 d = array((2048,3136), dtype=float)
 
 do:
 a = b * c + d
 
 In IDL, the computation is done without complaint and all array sizes are 
 preserved.  In ptyhon I get a value error concerning broadcasting.  I can 
 force it to work by taking slices, but the resulting size would be a = 
 (256x3136) rather than (2048x3577).  I admit that I don't understand IDL (or 
 python to be honest) well enough to know how it handles this to be able to 
 replicate the result properly.  Does it only operate on the smallest 
 dimensions ignoring the larger indices leaving their values unchanged?  Can 
 someone explain this to me?

If IDL does such operations silently I'd probably rather be troubled about it...
Assuming you actually meant something like a = np.ndarray((2048,3577)) 
(because np.array((2048,3577), dtype=float) would simply be the 2-vector [ 
2048. 3577.]), the shape of a indeed matches in no way the other ones. 
While b,c,d do have the same total size, thus something like

b.reshape((2048,3136) * c + d 

will work, meaning the first 8 rows of b b[:8] would be concatenated to the 
first row of the output, and so on... 
Since the total size is still smaller than a, I could only venture something is 
done like 

np.add(b.reshape(2048,3136) * c, d, out=a[:,:3136])

But to say whether this is really the equivalent result to what IDL does, one 
would have to study the IDL manual in detail or directly compare the output 
(e.g. check what happens to the values in a[:,3136:]...)

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] broacasting question

2011-06-30 Thread Derek Homeier
On 30.06.2011, at 11:57PM, Thomas K Gamble wrote:

 np.add(b.reshape(2048,3136) * c, d, out=a[:,:3136])
 
 But to say whether this is really the equivalent result to what IDL does,
 one would have to study the IDL manual in detail or directly compare the
 output (e.g. check what happens to the values in a[:,3136:]...)
 
 Cheers,
  Derek
 
 Your post gave me the cluse I needed.
 
 I had my shapes slightly off in the example I gave, but if I try:
 
 a = reshape(b.flatten('F') * c.flatten('F') + d.flatten('F'), b.shape, 
 order='F')
 
 I get a result in line with the IDL result.
 
 Another example with different total size arrays:
 
 b = np.ndarray((2048,3577), dtype=float)
 c = np.ndarray((256,25088), dtype=float)
 
 a= reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, order='F')
 
 This also gives a result like that of IDL.

Right, I forgot to point out that there are at least 2 ways to bring the arrays 
into compatible shapes (that's the reason broadcasting does not work here, 
because numpy only does automatic broadcasting if there is an unambiguous way 
to do so). So the IDL arrays being Fortran-ordered is the essential bit of 
information here. 
Just two remarks: 
I. Assigning a = reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, 
order='F') 
as above will create a new array of shape c.shape - if you wanted to put your 
results into an existing array of shape(2048,3577), you'd still have to 
explicitly say a[:,:3136] = ...
II. The flatten() operations and the assignment above all create full copies of 
the arrays, thus the np.add ufunc above together with simple reshape operations 
might improve performance somewhat - however keeping the Fortran order also 
requires some costly transpositions, as for your last example 

a = np.divide(b.T[:3136].reshape(c.T.shape).T, c, out=a)

so YMMV...

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy i/o

2011-06-27 Thread Derek Homeier
On 27.06.2011, at 6:36PM, Robert Kern wrote:

 Some late comments on the note (I was a bit surprised that HDF5 installation 
 seems to be a serious hurdle to many - maybe I've just been profiting from 
 the fink build system for OS X here - but I also was not aware that the 
 current netCDF is built on downwards-compatibility to the HDF5 standard, 
 something useful learnt again...:-)
 
 It's not so much that it's hard to build for lots of people. Rather,
 it would be quite difficult to include into numpy itself, particularly
 if we are just relying on distutils. numpy is too fundamental of a
 package to have extra dependencies.

I completely agree with that - I doubt that there is a standard data format 
that is both in wide use and simple enough to be easily included with numpy. 
Pyfits might be possible, but that again addresses probably a too specific user 
group. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy i/o

2011-06-21 Thread Derek Homeier
On 21.06.2011, at 7:58PM, Neal Becker wrote:

 I think, in addition, that hdf5 is the only one that easily interoperates 
 with 
 matlab?
 
 speaking of hdf5, I see:
 
 pyhdf5io  0.7 - Python module containing high-level hdf5 load and save 
 functions.
 h5py  2.0.0 - Read and write HDF5 files from Python
 
 Any thoughts on the relative merits of these?

In my experience, HDF5 access usually approaches disk access speed, and random 
access to sub-datasets should be significantly faster than reading in the 
entire file, though I have not been able to test this. 
I have not heard about pyhdf5io (how does it work together with numpy?) - as 
alternative to h5py I'd rather recommend pytables, though I prefer the former 
for its cleaner/simpler interface (but that probably depends on your 
programming habits). 

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] npyio - gzip 271 Error -3 while decompressing ?

2011-06-20 Thread Derek Homeier
Moin Denis,

On 20 Jun 2011, at 19:04, denis wrote:
  a separate question, have you run genfromtxt( xx.csv.gz ) lately ?

I haven't, and I was not particularly involved with it before this  
patch, so this would possibly be better addressed to the list.

 On on .../scikits.learn-0.8/scikits/learn/datasets/data.digits.csv.gz
 numpy 1.6.0, py 2.6 mac I get

X = np.genfromtxt( filename, delimiter=, )
  File /Library/Frameworks/Python.framework/Versions/2.6/lib/
 python2.6/site-packages/numpy/lib/npyio.py, line 1271, in genfromtxt
first_line = fhd.next()
  File /Library/Frameworks/Python.framework/Versions/2.6/lib/
 python2.6/gzip.py, line 438, in next
line = self.readline()
  File /Library/Frameworks/Python.framework/Versions/2.6/lib/
 python2.6/gzip.py, line 393, in readline
c = self.read(readsize)
  File /Library/Frameworks/Python.framework/Versions/2.6/lib/
 python2.6/gzip.py, line 219, in read
self._read(readsize)
  File /Library/Frameworks/Python.framework/Versions/2.6/lib/
 python2.6/gzip.py, line 271, in _read
uncompress = self.decompress.decompress(buf)
 zlib.error: Error -3 while decompressing: invalid distance too far
 back

 It would be nice to fix this too, if it hasn't been already.
 Btw the file gunzips fine.

I could reproduce that error for the gzip'ed csv files in that  
directory; it can be isolated to the underlying gzip call above -
fhd = gzip.open('digits.csv.gz', 'rbU'); fhd.next()
produces the same error for these files with all python2.x versions on  
my Mac, but not with python3.x. Also only with the 'U' mode specified,  
yet the same mode is parsing other .gz files just fine. I could not  
really track down what the 'U' flag is doing in gzip.py, but I assume  
it is specifying some kind of unbuffered read. Also it's a mystery to  
me what is different in those files that would trigger the error. I  
even read them in with loadtxt() and wrote them back using constant  
line width and/or spaces as separators, still producing the same  
exception.
The obvious place to fix this (or work around a bug in python2's  
gzip.py, whatever), would be changing the open command in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rbU'))
to omit the 'U' at least with python2. Alternatively one could do a  
test read and catch the exception, to then re-open the file with mode  
'rb'...
Pierre, if you are reading this, can you comment how important the 'U'  
is for performance considerations or such?

HTH,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-06-19 Thread Derek Homeier
On 31 May 2011, at 21:28, Pierre GM wrote:

 On May 31, 2011, at 6:37 PM, Derek Homeier wrote:

 On 31 May 2011, at 18:25, Pierre GM wrote:

 On May 31, 2011, at 5:52 PM, Derek Homeier wrote:

 I think stuff like multiple delimiters should have been dealt with
 before, as the right place to insert the ndmin code (which includes
 the decision to squeeze or not to squeeze as well as to add
 additional
 dimensions, if required) would be right at the end before the
 'unpack'
 switch, or  rather replacing the bit:

  if usemask:
  output = output.view(MaskedArray)
  output._mask = outputmask
  if unpack:
  return output.squeeze().T
  return output.squeeze()

 But there it's already not clear to me how to deal with the
 MaskedArray case...

 Oh, easy.
 You need to replace only the last three lines of genfromtxt with the
 ones from loadtxt  (808-833). Then, if usemask is True, you need to
 use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
 Comments:
 * I would raise an exception if ndmin isn't correct *before* trying
 to read the file...
 * You could define a `collapse_function` that would be
 `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on
 the values of `usemask` and `ndmin`...

Thanks, that helped to clean up a little bit.

 If you have any question about numpy.ma, don't hesitate to contact
 me directly.

 Thanks for the directions! I was not sure about the usemask case
 because it presently does not invoke .squeeze() either...

 The idea is that if `usemask` is True, you build a second array (the  
 mask), that you attach to your main array at the very end (in the  
 `output=output.view(MaskedArray), output._mask = mask` combo...).  
 Afterwards, it's a regular MaskedArray that supports the .squeeze()  
 method...

OK, in both cases output.squeeze() is now used if ndimndmin and  
usemask is False - at least it does not break any tests, so it seems  
to work with MaskedArrays as well.

 On a
 possibly related note, genfromtxt also treats the 'unpack'ing of
 structured arrays differently from loadtxt (which returns a list of
 arrays in that case) - do you know if this is on purpose, or also
 rather missing functionality (I guess it might break  
 recfromtxt()...)?

 Keep in mind that I haven't touched genfromtxt since 8-10 months or  
 so. I wouldn't be surprised that it were lagging a bit behind  
 loadtxt in terms of development. Yes, there'll be some tweaking to  
 do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the  
 defaults) and others, but nothing major.

Well, at long last I got to implement the above and added the  
corresponding tests for genfromtxt - with the exception of the  
dimension-0 cases, since genfromtxt raises an error on empty files.  
There already is a comment it should perhaps better return an empty  
array, so I am putting that idea up for discussion here again.
I tried to devise a very basic test with masked arrays, just added to  
test_withmissing now.
I also implemented the same unpacking behaviour for structured arrays  
and just made recfromtxt set unpack=False to work (or should it issue  
a warning?).

The patches are up for review as commit 8ac01636 in my iocnv-wildcard  
branch:
https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numpy 1.6.1 release candidate 1

2011-06-19 Thread Derek Homeier
Hi Ralf,

On 19 Jun 2011, at 12:28, Ralf Gommers wrote:

  numpy.test('full')
 Running unit tests for numpy
 NumPy version 1.6.1rc1
 NumPy is installed in /sw/lib/python3.2/site-packages/numpy
 Python version 3.2 (r32:88445, Mar  1 2011, 18:28:16) [GCC 4.0.1  
 (Apple Inc. build 5493)]
 nose version 1.0.0
 ...
 ==
 FAIL: test_return_character.TestF77ReturnCharacter.test_all
 --
 Traceback (most recent call last):
  File /sw/lib/python3.2/site-packages/nose/case.py, line 188, in  
 runTest
self.test(*self.arg)
  File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ 
 test_return_character.py, line 78, in test_all
self.check_function(getattr(self.module, name))
  File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ 
 test_return_character.py, line 12, in check_function
r = t(array('ab'));assert_( r==asbytes('a'),repr(r))
  File /sw/lib/python3.2/site-packages/numpy/testing/utils.py, line  
 34, in assert_
raise AssertionError(msg)
 AssertionError: b' '

 ==
 FAIL: test_return_character.TestF90ReturnCharacter.test_all
 --
 Traceback (most recent call last):
  File /sw/lib/python3.2/site-packages/nose/case.py, line 188, in  
 runTest
self.test(*self.arg)
  File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ 
 test_return_character.py, line 136, in test_all
self.check_function(getattr(self.module.f90_return_char, name))
  File /sw/lib/python3.2/site-packages/numpy/f2py/tests/ 
 test_return_character.py, line 12, in check_function
r = t(array('ab'));assert_( r==asbytes('a'),repr(r))
  File /sw/lib/python3.2/site-packages/numpy/testing/utils.py, line  
 34, in assert_
raise AssertionError(msg)
 AssertionError: b' '

 (both errors are raised on the first function tested, t0).

 Could you open a ticket for these?  If it's f2py + py3.2 + ppc only  
 I'd like to ignore them for 1.6.1.

and 3.1, but I agree. http://projects.scipy.org/numpy/ticket/1871

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt converter question

2011-06-18 Thread Derek Homeier
On 18 Jun 2011, at 04:48, gary ruben wrote:

 Thanks guys - I'm happy with the solution for now. FYI, Derek's
 suggestion doesn't work in numpy 1.5.1 either.
 For any developers following this thread, I think this might be a nice
 use case for genfromtxt to handle in future.

Numpy 1.6.0 and above is handling it in the present.

 As a corollary of this problem, I wonder whether there's a
 human-readable text format for complex numbers that genfromtxt can
 currently easily parse into a complex array? Having the hard-coded
 value for the number of columns in the converter and the genfromtxt
 call goes against the philosophy of the function's ability to form an
 array of shape matching the input layout.

genfromtxt and (for regular data such as yours) loadtxt can also parse  
the standard
'1.0+3.14j ...' format practically automatically as long as you  
specify 'dtype=complex'
(and appropriate delimiter, if required). loadtxt does not handle  
dtype=np.complex64
or np.complex256 at this time due to a bug in the default converters;  
I have just created
a patch for that.
In principle genfromtxt and loadtxt in numpy 1.6 can also handle cases  
similar to your input,
but you need to change the tuples at least to contain no spaces -
'( -1.4249, 1.7330)' - '(-1.4249,+1.7330)'
or additionally insert another delimiter like ';' - otherwise it's  
hopeless to correctly infer
the number of columns.
With such input, np.genfromtxt(a, converters=cnv, dtype=complex) works  
as expected,
and I have also just created a patch that would allow users to more  
easily specify a
default converter for all input data rather than constructing one for  
every row.

A wider-reaching automatic detection of complex formats might be  
feasible for
genfromtxt, but I'd suspect it could create quite some overhead, as I  
can think of at least
two formats that should probably be considered - tuples as above, and  
pairs of
'real,imag' without parentheses. But feel free to create a ticket if  
you think this would be
an important enhancement.

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Enhancements and bug fix for npyio

2011-06-18 Thread Derek Homeier
Hi,

in my experience numbers loaded from text files very often require  
identical conversion for all data (e.g. parsing Fortran-style double  
precision, or German vs. English decimals...).
Yet loadtxt and genfromtxt in such cases require the user to construct  
a dictionary of converters for every single data column,  a task that  
could be easily handled within the routines. I am therefore putting up  
a patch for testing that allows users to specify a
default converter for all input data as converters={-1: conv} (or  
alternatively {'all': conv}, leaving that open to discussion).
Please test the changeset (still lacking tests for the new option) at
https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard
which also contains some bug fixes to handle complex input.

Awaiting your comments,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-17 Thread Derek Homeier
On 17.06.2011, at 8:05PM, Mark Wiebe wrote:

 On Thu, Jun 16, 2011 at 8:18 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
 On 17.06.2011, at 2:02AM, Mark Wiebe wrote:
 
  ok, that was a lengthy hunt, but it's in printing the string in 
  make_iso_8601_date:
 
 tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
 fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, 
  sublen, dts-year, substr);
 
  produces
 
   np.datetime64('1970-03-23 20:00:00Z', 'D')
  printed 4[62]: dts-year=1970: 
  numpy.datetime64('-03-23','D')
 
  It seems snprintf is not using the correct format for INT64 (as I 
  happened to do in fprintf before
  realising I had to use %lld ;-) - could it be this is a general issue, 
  which just does not show up
  on little-endian machines because they happen to pass the right half of 
  the int64 to printf?
  BTW, how is this supposed to be handled (in 4 digits) if the year is 
  indeed beyond the 32bit range
  (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it 
  to int32 before print.
 
  I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it 
  if it doesn't work... What is NumPy setting it to for that platform?
 
 Of course (just felt somewhat lost among all the #defines). It clearly seems 
 to be mis-constructed
 on PowerPC 32:
 NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld,
 but this does not seem to handle int64 on big-endian Macs - explicitly 
 printing %Ld, dts-year
 also produces 0.
 Changing the snprintf format to %04 lld produces the correct output, so 
 if nothing else
 avails, I suggest to put something like
 
 #  elseif (defined(__ppc__)  || defined(__ppc64__))
 #define LONGLONG_FMT   lld
 #define ULONGLONG_FMT  llu
 #  else
 
 into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems 
 to
 work on 32bit i386 Macs just as well).
 
 Probably a minimally invasive change is best, also this kind of thing 
 deserves a comment explaining the problem that was encountered with the 
 specific platforms, so that in the future when people examine this part they 
 can understand why this is there. Do you want to make a pull request for this 
 change?
 
I'd go with the defined(__APPLE__) then, since %Ld produces wrong results on 
both 32bit platforms. More precisely, this print
%Ld - %Ld, dts-year, dts-year 
produces 0 - 1970 on ppc and 1970 - 0 on i386, while %lld - %lld prints 
1970 - 1970 on both archs. There still is an issue (I now remember this came 
up with a different test a few months ago), that none of the formats seems to 
be able to actually print numbers  2**32 (or 2**31, don't remember), but this 
seemed out of reach for anyone on this list. 

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt converter question

2011-06-17 Thread Derek Homeier
Hi Gary,

On 17.06.2011, at 5:39PM, gary ruben wrote:
 Thanks for the hints Olivier and Bruce. Based on them, the following
 is a working solution, although I still have that itchy sense that genfromtxt
 should be able to do it directly.
 
 import numpy as np
 from StringIO import StringIO
 
 a = StringIO('''\
 (-3.9700,-5.0400) (-1.1318,-2.5693) (-4.6027,-0.1426) (-1.4249, 1.7330)
 (-5.4797, 0.) ( 1.8585,-1.5502) ( 4.4145,-0.7638) (-0.4805,-1.1976)
 ( 0., 0.) ( 6.2673, 0.) (-0.4504,-0.0290) (-1.3467, 1.6579)
 ( 0., 0.) ( 0., 0.) (-3.5000, 0.) ( 2.5619,-3.3708)
 ''')
 
 b = np.genfromtxt(a, dtype=str, delimiter=18)[:,:-1]
 b = np.vectorize(lambda x: complex(*eval(x)))(b)
 
 print b

It should, I think you were very close in your earlier attempt:

 On Sat, Jun 18, 2011 at 12:31 AM, Bruce Southey bsout...@gmail.com wrote:
 On 06/17/2011 08:51 AM, Olivier Delalleau wrote:
 
 2011/6/17 Bruce Southey bsout...@gmail.com
 
 On 06/17/2011 08:22 AM, gary ruben wrote:
 Thanks Olivier,
 Your suggestion gets me a little closer to what I want, but doesn't
 quite work. Replacing the conversion with
 
 c = lambda x:np.cast[np.complex64](complex(*eval(x)))
 b = np.genfromtxt(a,converters={0:c, 1:c, 2:c,
 3:c},dtype=None,delimiter=18,usecols=range(4))
 
 produces
 
 [[(-3.9702861-5.0396185j) (-1.1318000555-2.56929993629j)
   (-4.60270023346-0.14259905j) (-1.42490005493+1.7334005j)]
   [(-5.4797000885+0j) (1.8585381-1.5501999855j)
   (4.41450023651-0.763800024986j) (-0.480500012636-1.1976706j)]
   [0j (6.26730012894+0j) (-0.4503485-0.028991655j)
   (-1.34669995308+1.65789997578j)]
   [0j 0j (-3.5+0j) (2.56189990044-3.37080001831j)]]
 
 which is not yet an array of complex numbers. It seems close to the
 solution though.

You were just overdoing it by already creating an array with the converter, 
this apparently caused genfromtxt to create a structured array from the input 
(which could be converted back to an ndarray, but that can prove tricky as 
well) - similar, if you omit the dtype=None. The following

cnv = dict.fromkeys(range(4), lambda x: complex(*eval(x)))
b = np.genfromtxt(a,converters=cnv, dtype=None, delimiter=18, usecols=range(4))

directly produces a shape(4,4) complex array for me (you may have to apply an 
.astype(np.complex64) afterwards if so desired).

BTW I think this is an interesting enough case of reading non-trivially 
structured data that it deserves to appear on some examples or cookbook page.

HTH,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt converter question

2011-06-17 Thread Derek Homeier
On 17.06.2011, at 11:01PM, Olivier Delalleau wrote:

 You were just overdoing it by already creating an array with the converter, 
 this apparently caused genfromtxt to create a structured array from the 
 input (which could be converted back to an ndarray, but that can prove 
 tricky as well) - similar, if you omit the dtype=None. The following
 
 cnv = dict.fromkeys(range(4), lambda x: complex(*eval(x)))
 b = np.genfromtxt(a,converters=cnv, dtype=None, delimiter=18, 
 usecols=range(4))
 
 directly produces a shape(4,4) complex array for me (you may have to apply 
 an .astype(np.complex64) afterwards if so desired).
 
 BTW I think this is an interesting enough case of reading non-trivially 
 structured data that it deserves to appear on some examples or cookbook page.
 
 HTH,
Derek
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 I had tried that as well and it doesn't work with numpy 1.4.1 (I get an 
 object array). It may have been fixed in a later version.

OK, I was using the current master from github, but it works in 1.6.0 as well. 
I still noticed some differences between loadtxt and genfromtxt behaviour, e.g. 
where loadtxt would be able to take a string from the converter and 
automatically convert it to a number, whereas in genfromtxt the converter still 
had to include the float() or complex()...

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Derek Homeier
Hi Mark,

On 16.06.2011, at 5:40PM, Mark Wiebe wrote:

  np.datetime64('2011-06-16 02:03:04Z', 'D')
 np.datetime64('-06-16','D')
 
 I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I 
 could not connect it
 to any of the dts-year assignments therein).
 
 This is definitely perplexing. Probably the first thing to check is whether 
 it's breaking during parsing or printing. This should always produce the same 
 result:
 
  np.datetime64('1970-03-23 20:00:00Z').astype('i8')
 7070400
 
 But maybe the test_days_creation is already checking that thoroughly enough.
 
 Then, maybe printf-ing the year value at various stages of the printing, like 
 in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in 
 make_iso_8601_date. This would at least isolate where the year is getting 
 lost.
 
ok, that was a lengthy hunt, but it's in printing the string in 
make_iso_8601_date:

tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, 
dts-year, substr);

produces

 np.datetime64('1970-03-23 20:00:00Z', 'D')
printed 4[62]: dts-year=1970: 
numpy.datetime64('-03-23','D')

It seems snprintf is not using the correct format for INT64 (as I happened to 
do in fprintf before 
realising I had to use %lld ;-) - could it be this is a general issue, which 
just does not show up 
on little-endian machines because they happen to pass the right half of the 
int64 to printf?
BTW, how is this supposed to be handled (in 4 digits) if the year is indeed 
beyond the 32bit range 
(i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to 
int32 before print.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Derek Homeier
On 17.06.2011, at 2:02AM, Mark Wiebe wrote:

 ok, that was a lengthy hunt, but it's in printing the string in 
 make_iso_8601_date:
 
tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, 
 dts-year, substr);
 
 produces
 
  np.datetime64('1970-03-23 20:00:00Z', 'D')
 printed 4[62]: dts-year=1970: 
 numpy.datetime64('-03-23','D')
 
 It seems snprintf is not using the correct format for INT64 (as I happened 
 to do in fprintf before
 realising I had to use %lld ;-) - could it be this is a general issue, 
 which just does not show up
 on little-endian machines because they happen to pass the right half of the 
 int64 to printf?
 BTW, how is this supposed to be handled (in 4 digits) if the year is indeed 
 beyond the 32bit range
 (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to 
 int32 before print.
 
 I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if 
 it doesn't work... What is NumPy setting it to for that platform?
 
Of course (just felt somewhat lost among all the #defines). It clearly seems to 
be mis-constructed 
on PowerPC 32:
NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld, 
but this does not seem to handle int64 on big-endian Macs - explicitly printing 
%Ld, dts-year 
also produces 0. 
Changing the snprintf format to %04 lld produces the correct output, so if 
nothing else 
avails, I suggest to put something like 

#  elseif (defined(__ppc__)  || defined(__ppc64__))
 #define LONGLONG_FMT   lld  
 #define ULONGLONG_FMT  llu
#  else

into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems to 
work on 32bit i386 Macs just as well).

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   >