Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Pierre Haessig
Hi,
Le 23/02/2012 02:24, Matthew Brett a écrit :
 Luckily I was in fact using longdouble in the live code,
I had never exotic floating point precision, so thanks for your post
which made me take a look at docstring and documentation.

If I got it right from the docstring, 'np.longdouble', 'np.longfloat'
are all in fact 'np.float128'.
(numpy 1.5)

However, I was surprised that float128 is not mentioned in the array of
available types in the user guide.
http://docs.scipy.org/doc/numpy/user/basics.types.html
Is there a specific reason for this absence, or is just about visiting
the documentation wiki ;-) ?

Additionally, I don't know what are the writing guidelines of the user
guide, but would it make sense to add some new numpy 1.x messages as
in the Python doc. I'm thinking here of np.float16. I know it exists
from messages on this mailing list but my 1.5 don't have it.

Best,
Pierre

PS : I found float128 mentionned in the reference
http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types
However, it is not as easily readable as the user guide (which makes
sense !).

Does the following statements mean that those types are not available on
all platforms ?
float96 96 bits, platform?  
float128 128 bits, platform?




signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] python geospatial package?

2012-02-23 Thread Vincent Schut
On 02/22/2012 10:45 PM, Chao YUE wrote:
 Hi all,

 Is anyone using some python geospatial package that can do jobs like
 intersection, etc.  the job is like you automatically extract a region
 on a global map etc.

 thanks and cheers,

 Chao

Chao,

shapely would do this, though I found it had a bit of a steep learning 
curve. Or you could go the gdal/ogr way, which uses the geos library 
under the hood (if present) to do geometrical operations like 
intersections etc.

cheers,
Vincent.


 --
 ***
 Chao YUE
 Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
 UMR 1572 CEA-CNRS-UVSQ
 Batiment 712 - Pe 119
 91191 GIF Sur YVETTE Cedex
 Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
 



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Francesc Alted
On Feb 23, 2012, at 3:06 AM, Pierre Haessig wrote:

 Hi,
 Le 23/02/2012 02:24, Matthew Brett a écrit :
 Luckily I was in fact using longdouble in the live code,
 I had never exotic floating point precision, so thanks for your post
 which made me take a look at docstring and documentation.
 
 If I got it right from the docstring, 'np.longdouble', 'np.longfloat'
 are all in fact 'np.float128'.
 (numpy 1.5)

That in fact depends on the platform you are using.  Typically, for 32-bit 
platforms, 'np.longfloat' and 'np.longdouble' are bound to 'np.float96', while 
in 64-bit are to 'np.float128'.

 However, I was surprised that float128 is not mentioned in the array of
 available types in the user guide.
 http://docs.scipy.org/doc/numpy/user/basics.types.html
 Is there a specific reason for this absence, or is just about visiting
 the documentation wiki ;-) ?

The reason is most probably that you cannot get a float96 or float128 whenever 
you want (depends on your architecture), so adding these types to the manual 
could be misleading.  However, I'd advocate to document them while warning 
about platform portability issues.

 Additionally, I don't know what are the writing guidelines of the user
 guide, but would it make sense to add some new numpy 1.x messages as
 in the Python doc. I'm thinking here of np.float16. I know it exists
 from messages on this mailing list but my 1.5 don't have it.

float16 was introduced in NumPy 1.6, IIRC.

 PS : I found float128 mentionned in the reference
 http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types
 However, it is not as easily readable as the user guide (which makes
 sense !).
 
 Does the following statements mean that those types are not available on
 all platforms ?
 float96 96 bits, platform?  
 float128 128 bits, platform?

Exactly.  I'd update this to read:

float9696 bits.  Only available on 32-bit (i386) platforms.
float128  128 bits.  Only available on 64-bit (AMD64) platforms.

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Nathaniel Smith
On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted franc...@continuum.io wrote:
 Exactly.  I'd update this to read:

 float96    96 bits.  Only available on 32-bit (i386) platforms.
 float128  128 bits.  Only available on 64-bit (AMD64) platforms.

Except float96 is actually 80 bits. (Usually?) Plus some padding...

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 12:40, Francesc Alted a écrit :
 However, I was surprised that float128 is not mentioned in the array of
  available types in the user guide.
  http://docs.scipy.org/doc/numpy/user/basics.types.html
  Is there a specific reason for this absence, or is just about visiting
  the documentation wiki ;-) ?
 The reason is most probably that you cannot get a float96 or float128 
 whenever you want (depends on your architecture), so adding these types to 
 the manual could be misleading.  However, I'd advocate to document them while 
 warning about platform portability issues
 Does the following statements mean that those types are not available on
  all platforms ?
  float96 96 bits, platform?  
  float128 128 bits, platform?
 Exactly.  I'd update this to read:

 float9696 bits.  Only available on 32-bit (i386) platforms.
 float128  128 bits.  Only available on 64-bit (AMD64) platforms.

Thanks for the enlightenment !
I was not aware of this 96 bits - 128 bits relationship.
-- 
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Francesc Alted

On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote:

 On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted franc...@continuum.io 
 wrote:
 Exactly.  I'd update this to read:
 
 float9696 bits.  Only available on 32-bit (i386) platforms.
 float128  128 bits.  Only available on 64-bit (AMD64) platforms.
 
 Except float96 is actually 80 bits. (Usually?) Plus some padding…

Good point.  The thing is that they actually use 96 bit for storage purposes 
(this is due to alignment requirements).

Another quirk related with this is that MSVC automatically maps long double to 
64-bit doubles:

http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx

Not sure on why they did that (portability issues?).

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Francesc Alted
On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote:
 On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote:
 
 On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted franc...@continuum.io 
 wrote:
 Exactly.  I'd update this to read:
 
 float9696 bits.  Only available on 32-bit (i386) platforms.
 float128  128 bits.  Only available on 64-bit (AMD64) platforms.
 
 Except float96 is actually 80 bits. (Usually?) Plus some padding…
 
 Good point.  The thing is that they actually use 96 bit for storage purposes 
 (this is due to alignment requirements).
 
 Another quirk related with this is that MSVC automatically maps long double 
 to 64-bit doubles:
 
 http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx
 
 Not sure on why they did that (portability issues?).

Hmm, yet another quirk (this time in NumPy itself).  On 32-bit platforms:

In [16]: np.longdouble
Out[16]: numpy.float96

In [17]: np.finfo(np.longdouble).eps
Out[17]: 1.084202172485504434e-19

while on 64-bit ones:

In [8]: np.longdouble
Out[8]: numpy.float128

In [9]: np.finfo(np.longdouble).eps
Out[9]: 1.084202172485504434e-19

i.e. NumPy is saying that the eps (machine epsilon) is the same on both 
platforms, despite the fact that one uses 80-bit precision and the other 
128-bit precision.  For the 80-bit, the eps should be ():

In [5]: 1 / 2**63.
Out[5]: 1.0842021724855044e-19

[http://en.wikipedia.org/wiki/Extended_precision]

which is correctly stated by NumPy, while for 128-bit (quad precision), eps 
should be:

In [6]: 1 / 2**113.
Out[6]: 9.62964972193618e-35

[http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format]

If nobody objects, I'll file a bug about this.

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] python geospatial package?

2012-02-23 Thread Kiko
2012/2/23 Vincent Schut sc...@sarvision.nl

 On 02/22/2012 10:45 PM, Chao YUE wrote:
  Hi all,
 
  Is anyone using some python geospatial package that can do jobs like
  intersection, etc.  the job is like you automatically extract a region
  on a global map etc.
 
  thanks and cheers,
 
  Chao


Depending what you want to do:

Shapely, GDAL/OGR, pyproj, Mapnik, Basemap,...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Special matrices with structure?

2012-02-23 Thread Jaakko Luttinen
Hi!

I was wondering whether it would be easy/possible/reasonable to have
classes for arrays that have special structure in order to use less
memory and speed up some computations?

For instance:
- symmetric matrix could be stored in almost half the memory required by
a non-symmetric matrix
- diagonal matrix only needs to store the diagonal vector
- Toeplitz matrix only needs to store one or two vectors
- sparse matrix only needs to store non-zero elements (some
implementations in scipy.sparse)
- and so on

If such classes were implemented, it would be nice if they worked with
numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily.

I believe this has been discussed before but google didn't help a lot..

Regards,
Jaakko
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Matthew Brett
Hi,

On Thu, Feb 23, 2012 at 4:23 AM, Francesc Alted franc...@continuum.io wrote:
 On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote:
 On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote:

 On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted franc...@continuum.io 
 wrote:
 Exactly.  I'd update this to read:

 float96    96 bits.  Only available on 32-bit (i386) platforms.
 float128  128 bits.  Only available on 64-bit (AMD64) platforms.

 Except float96 is actually 80 bits. (Usually?) Plus some padding…

 Good point.  The thing is that they actually use 96 bit for storage purposes 
 (this is due to alignment requirements).

 Another quirk related with this is that MSVC automatically maps long double 
 to 64-bit doubles:

 http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx

 Not sure on why they did that (portability issues?).

 Hmm, yet another quirk (this time in NumPy itself).  On 32-bit platforms:

 In [16]: np.longdouble
 Out[16]: numpy.float96

 In [17]: np.finfo(np.longdouble).eps
 Out[17]: 1.084202172485504434e-19

 while on 64-bit ones:

 In [8]: np.longdouble
 Out[8]: numpy.float128

 In [9]: np.finfo(np.longdouble).eps
 Out[9]: 1.084202172485504434e-19

 i.e. NumPy is saying that the eps (machine epsilon) is the same on both 
 platforms, despite the fact that one uses 80-bit precision and the other 
 128-bit precision.  For the 80-bit, the eps should be ():

 In [5]: 1 / 2**63.
 Out[5]: 1.0842021724855044e-19

 [http://en.wikipedia.org/wiki/Extended_precision]

 which is correctly stated by NumPy, while for 128-bit (quad precision), eps 
 should be:

 In [6]: 1 / 2**113.
 Out[6]: 9.62964972193618e-35

 [http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format]

 If nobody objects, I'll file a bug about this.

There was half a proposal for renaming these guys in the interests of clarity:

http://mail.scipy.org/pipermail/numpy-discussion/2011-October/058820.html

I'd be happy to write this up as a NEP.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Charles R Harris
On Thu, Feb 23, 2012 at 5:23 AM, Francesc Alted franc...@continuum.iowrote:

 On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote:
  On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote:
 
  On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted franc...@continuum.io
 wrote:
  Exactly.  I'd update this to read:
 
  float9696 bits.  Only available on 32-bit (i386) platforms.
  float128  128 bits.  Only available on 64-bit (AMD64) platforms.
 
  Except float96 is actually 80 bits. (Usually?) Plus some padding…
 
  Good point.  The thing is that they actually use 96 bit for storage
 purposes (this is due to alignment requirements).
 
  Another quirk related with this is that MSVC automatically maps long
 double to 64-bit doubles:
 
  http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx
 
  Not sure on why they did that (portability issues?).

 Hmm, yet another quirk (this time in NumPy itself).  On 32-bit platforms:

 In [16]: np.longdouble
 Out[16]: numpy.float96

 In [17]: np.finfo(np.longdouble).eps
 Out[17]: 1.084202172485504434e-19

 while on 64-bit ones:

 In [8]: np.longdouble
 Out[8]: numpy.float128

 In [9]: np.finfo(np.longdouble).eps
 Out[9]: 1.084202172485504434e-19

 i.e. NumPy is saying that the eps (machine epsilon) is the same on both
 platforms, despite the fact that one uses 80-bit precision and the other
 128-bit precision.  For the 80-bit, the eps should be ():


That's correct. They are both extended precision (80 bits), but aligned on
32bit/64bit boundaries respectively. Sun provides a true quad precision,
also called float128, while on PPC long double is an odd combination of two
doubles.

Chuck

 In [5]: 1 / 2**63.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Special matrices with structure?

2012-02-23 Thread Dag Sverre Seljebotn
On 02/23/2012 05:50 AM, Jaakko Luttinen wrote:
 Hi!

 I was wondering whether it would be easy/possible/reasonable to have
 classes for arrays that have special structure in order to use less
 memory and speed up some computations?

 For instance:
 - symmetric matrix could be stored in almost half the memory required by
 a non-symmetric matrix
 - diagonal matrix only needs to store the diagonal vector
 - Toeplitz matrix only needs to store one or two vectors
 - sparse matrix only needs to store non-zero elements (some
 implementations in scipy.sparse)
 - and so on

 If such classes were implemented, it would be nice if they worked with
 numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily.

 I believe this has been discussed before but google didn't help a lot..

I'm currently working on a library for this. The catch is that that I'm 
doing it as a work project, not a hobby project -- so only the features 
I strictly need for my PhD thesis really gets priority. That means that 
it will only really be developed for use on clusters/MPI, not so much 
for single-node LAPACK.

I'd love to pair up with someone who could make sure the library is more 
generally useful, which is my real goal (if I ever get spare time again...).

The general idea of my approach is to have lazily evaluated expressions:

A = # ... diagonal matrix
B = # ... dense matrix

L = (give(A) + give(B)).cholesky() # only symbolic!
# give means: overwrite if you want to

explain(L) # prints what it will do if it computes L
L = compute(L) # does the computation

What the code above would do is:

  - First, determine that the fastest way of doing + is to take the 
elements in A and += them inplace to the diagonal in B
  - Then, do the Cholesky in

Note that if you change the types of. The goal is to facilitate writing 
general code which doesn't know the types of the matrices, yet still 
string together the optimal chain of calls. This requires waiting with 
evaluation until an explicit compute call (which essentially does a 
compilation).

Adding matrix types and operations is done through pattern matching. 
This one can provide code like this to provide optimized code for wierd 
special cases:

@computation(RowMajorDense + ColMajorDense, RowMajorDense)
def add(a, b):
 # provide an optimized case for row-major + col-major, resulting
 # in row-major

@cost(add)
def add_cost(a, b):
 # provide estimate for cost of the above routine

The compiler looks at all the provided @computation and should 
determines the cheapest path.

My code is at https://github.com/dagss/oomatrix, but I certainly haven't 
done anything yet to make the codebase useful to anyone but me, so you 
probably shouldn't look at it, but rather ask me here.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Special matrices with structure?

2012-02-23 Thread Dag Sverre Seljebotn
On 02/23/2012 09:47 AM, Dag Sverre Seljebotn wrote:
 On 02/23/2012 05:50 AM, Jaakko Luttinen wrote:
 Hi!

 I was wondering whether it would be easy/possible/reasonable to have
 classes for arrays that have special structure in order to use less
 memory and speed up some computations?

 For instance:
 - symmetric matrix could be stored in almost half the memory required by
 a non-symmetric matrix
 - diagonal matrix only needs to store the diagonal vector
 - Toeplitz matrix only needs to store one or two vectors
 - sparse matrix only needs to store non-zero elements (some
 implementations in scipy.sparse)
 - and so on

 If such classes were implemented, it would be nice if they worked with
 numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily.

 I believe this has been discussed before but google didn't help a lot..

 I'm currently working on a library for this. The catch is that that I'm
 doing it as a work project, not a hobby project -- so only the features
 I strictly need for my PhD thesis really gets priority. That means that
 it will only really be developed for use on clusters/MPI, not so much
 for single-node LAPACK.

 I'd love to pair up with someone who could make sure the library is more
 generally useful, which is my real goal (if I ever get spare time
 again...).

 The general idea of my approach is to have lazily evaluated expressions:

 A = # ... diagonal matrix
 B = # ... dense matrix

 L = (give(A) + give(B)).cholesky() # only symbolic!
 # give means: overwrite if you want to

 explain(L) # prints what it will do if it computes L
 L = compute(L) # does the computation

 What the code above would do is:

 - First, determine that the fastest way of doing + is to take the
 elements in A and += them inplace to the diagonal in B
 - Then, do the Cholesky in

Sorry: Then, do the Cholesky inplace in the buffer of B, and use that for L.

Dag


 Note that if you change the types of. The goal is to facilitate writing
 general code which doesn't know the types of the matrices, yet still
 string together the optimal chain of calls. This requires waiting with
 evaluation until an explicit compute call (which essentially does a
 compilation).

 Adding matrix types and operations is done through pattern matching.
 This one can provide code like this to provide optimized code for wierd
 special cases:

 @computation(RowMajorDense + ColMajorDense, RowMajorDense)
 def add(a, b):
 # provide an optimized case for row-major + col-major, resulting
 # in row-major

 @cost(add)
 def add_cost(a, b):
 # provide estimate for cost of the above routine

 The compiler looks at all the provided @computation and should
 determines the cheapest path.

 My code is at https://github.com/dagss/oomatrix, but I certainly haven't
 done anything yet to make the codebase useful to anyone but me, so you
 probably shouldn't look at it, but rather ask me here.

 Dag

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 17:28, Charles R Harris a écrit :
 That's correct. They are both extended precision (80 bits), but
 aligned on 32bit/64bit boundaries respectively. Sun provides a true
 quad precision, also called float128, while on PPC long double is an
 odd combination of two doubles.
This is insane ! ;-)
-- 
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Matthew Brett
Hi,

On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig
pierre.haes...@crans.org wrote:
 Le 23/02/2012 17:28, Charles R Harris a écrit :
 That's correct. They are both extended precision (80 bits), but
 aligned on 32bit/64bit boundaries respectively. Sun provides a true
 quad precision, also called float128, while on PPC long double is an
 odd combination of two doubles.
 This is insane ! ;-)

I don't know if it's insane, but it is certainly very confusing, as
this thread the previous one show.

The question is, what would be less confusing?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Mark Wiebe
On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig
 pierre.haes...@crans.org wrote:
  Le 23/02/2012 17:28, Charles R Harris a écrit :
  That's correct. They are both extended precision (80 bits), but
  aligned on 32bit/64bit boundaries respectively. Sun provides a true
  quad precision, also called float128, while on PPC long double is an
  odd combination of two doubles.
  This is insane ! ;-)

 I don't know if it's insane, but it is certainly very confusing, as
 this thread the previous one show.

 The question is, what would be less confusing?


One approach would be to never alias longdouble as float###. Especially
float128 seems to imply that it's the IEEE standard binary128 float, which
it is on some platforms, but not on most.

Cheers,
Mark



 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Matthew Brett
Hi,

On Thu, Feb 23, 2012 at 10:45 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig
 pierre.haes...@crans.org wrote:
  Le 23/02/2012 17:28, Charles R Harris a écrit :
  That's correct. They are both extended precision (80 bits), but
  aligned on 32bit/64bit boundaries respectively. Sun provides a true
  quad precision, also called float128, while on PPC long double is an
  odd combination of two doubles.
  This is insane ! ;-)

 I don't know if it's insane, but it is certainly very confusing, as
 this thread the previous one show.

 The question is, what would be less confusing?


 One approach would be to never alias longdouble as float###. Especially
 float128 seems to imply that it's the IEEE standard binary128 float, which
 it is on some platforms, but not on most.

It's virtually never IEEE binary128.  Yarik Halchenko found a real one
on an s/360 running Debian.  Some docs seem to suggest there are Sun
machines out there with binary128, as Chuck said.  So the vast
majority of numpy users with float128 have Intel 80-bit, and some have
PPC twin-float.

Do we all agree then that 'float128' is a bad name?

In the last thread, I had the feeling there was some consensus on
renaming Intel 80s to:

float128 - float80_128
float96 - float80_96

For those platforms implementing it, maybe

float128 - float128_ieee

Maybe for PPC:

float128 - float_pair_128

and, personally, I still think it would be preferable, and less
confusing, to encourage use of 'longdouble' instead of the various
platform specific aliases.

What do you think?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Mark Wiebe
On Thu, Feb 23, 2012 at 10:55 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Thu, Feb 23, 2012 at 10:45 AM, Mark Wiebe mwwi...@gmail.com wrote:
  On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett matthew.br...@gmail.com
 
  wrote:
 
  Hi,
 
  On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig
  pierre.haes...@crans.org wrote:
   Le 23/02/2012 17:28, Charles R Harris a écrit :
   That's correct. They are both extended precision (80 bits), but
   aligned on 32bit/64bit boundaries respectively. Sun provides a true
   quad precision, also called float128, while on PPC long double is an
   odd combination of two doubles.
   This is insane ! ;-)
 
  I don't know if it's insane, but it is certainly very confusing, as
  this thread the previous one show.
 
  The question is, what would be less confusing?
 
 
  One approach would be to never alias longdouble as float###. Especially
  float128 seems to imply that it's the IEEE standard binary128 float,
 which
  it is on some platforms, but not on most.

 It's virtually never IEEE binary128.  Yarik Halchenko found a real one
 on an s/360 running Debian.  Some docs seem to suggest there are Sun
 machines out there with binary128, as Chuck said.  So the vast
 majority of numpy users with float128 have Intel 80-bit, and some have
 PPC twin-float.

 Do we all agree then that 'float128' is a bad name?

 In the last thread, I had the feeling there was some consensus on
 renaming Intel 80s to:

 float128 - float80_128
 float96 - float80_96

 For those platforms implementing it, maybe

 float128 - float128_ieee

 Maybe for PPC:

 float128 - float_pair_128

 and, personally, I still think it would be preferable, and less
 confusing, to encourage use of 'longdouble' instead of the various
 platform specific aliases.


+1, I think it's good for its name to correspond to the name in C/C++, so
that when people search for information on it they will find the relevant
information more easily. With a bunch of NumPy-specific aliases, it just
creates more hassle for everybody.

-Mark


 What do you think?

 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
dear all,

I haven't read all 180 e-mails, but I didn't see this on Travis's
initial list.

All of the existing flat file reading solutions I have seen are
not suitable for many applications, and they compare very unfavorably
to tools present in other languages, like R. Here are some of the
main issues I see:

- Memory usage: creating millions of Python objects when reading
  a large file results in horrendously bad memory utilization,
  which the Python interpreter is loathe to return to the
  operating system. Any solution using the CSV module (like
  pandas's parsers-- which are a lot faster than anything else I
  know of in Python) suffers from this problem because the data
  come out boxed in tuples of PyObjects. Try loading a 1,000,000
  x 20 CSV file into a structured array using np.genfromtxt or
  into a DataFrame using pandas.read_csv and you will immediately
  see the problem. R, by contrast, uses very little memory.

- Performance: post-processing of Python objects results in poor
  performance. Also, for the actual parsing, anything regular
  expression based (like the loadtable effort over the summer,
  all apologies to those who worked on it), is doomed to
  failure. I think having a tool with a high degree of
  compatibility and intelligence for parsing unruly small files
  does make sense though, but it's not appropriate for large,
  well-behaved files.

- Need to factorize: as soon as there is an enum dtype in
  NumPy, we will want to enable the file parsers for structured
  arrays and DataFrame to be able to factorize / convert to
  enum certain columns (for example, all string columns) during
  the parsing process, and not afterward. This is very important
  for enabling fast groupby on large datasets and reducing
  unnecessary memory usage up front (imagine a column with a
  million values, with only 10 unique values occurring). This
  would be trivial to implement using a C hash table
  implementation like khash.h

To be clear: I'm going to do this eventually whether or not it
happens in NumPy because it's an existing problem for heavy
pandas users. I see no reason why the code can't emit structured
arrays, too, so we might as well have a common library component
that I can use in pandas and specialize to the DataFrame internal
structure.

It seems clear to me that this work needs to be done at the
lowest level possible, probably all in C (or C++?) or maybe
Cython plus C utilities.

If anyone wants to get involved in this particular problem right
now, let me know!

best,
Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] mkl usage

2012-02-23 Thread Neal Becker
Is mkl only used for linear algebra?  Will it speed up e.g., elementwise 
transendental functions?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mkl usage

2012-02-23 Thread Francesc Alted
On Feb 23, 2012, at 1:33 PM, Neal Becker wrote:

 Is mkl only used for linear algebra?  Will it speed up e.g., elementwise 
 transendental functions?

Yes, MKL comes with VML that has this type of optimizations:

http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm

Also, see some speedups in a numexpr linked against MKL here:

http://code.google.com/p/numexpr/wiki/NumexprVML

See also how native multi-threading implementation in numexpr beats MKL's one 
(at least for this particular example).

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mkl usage

2012-02-23 Thread Pauli Virtanen
23.02.2012 20:44, Francesc Alted kirjoitti:
 On Feb 23, 2012, at 1:33 PM, Neal Becker wrote:
 
 Is mkl only used for linear algebra?  Will it speed up e.g., elementwise 
 transendental functions?
 
 Yes, MKL comes with VML that has this type of optimizations:

And also no, in the sense that Numpy and Scipy don't use VML.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pauli Virtanen
Hi,

23.02.2012 20:32, Wes McKinney kirjoitti:
[clip]
 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.

If you do this, one useful aim could be to design the code such that it
can be used in loadtxt, at least as a fast path for common cases. I'd
really like to avoid increasing the number of APIs for text file loading.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 20:08, Mark Wiebe a écrit :
 +1, I think it's good for its name to correspond to the name in C/C++,
 so that when people search for information on it they will find the
 relevant information more easily. With a bunch of NumPy-specific
 aliases, it just creates more hassle for everybody.
I don't fully agree.

First, this assumes that people were C-educated, at least a bit. I got
some C education, but I spent most of my scientific programming time
sitting in front of Python, Matlab, and a bit of R (in that order). In
this context, double, floats, long  and short are all esoteric incantation.
Second the C/C++ names are very unprecise with regards to their memory
content, and sometimes platform dependent. On the other float64 is
very informative.

Also, how do these name scale with extended precision (where it's
available... ;-) ) ? I wonder what may come after longdoulble/longfloat
: what about hyperlongsuperfancyextendeddoublefloat ? I feel float1024
simpler ;-)

Now, because of all the specifities you described, this seems to be a
complex topic. I guess that good  documented aliases help people
understand this very complexity.

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Travis Oliphant
This is actually on my short-list as well --- it just didn't make it to the 
list. 

In fact, we have someone starting work on it this week.  It is his first 
project so it will take him a little time to get up to speed on it, but he will 
contact Wes and work with him and report progress to this list. 

Integration with np.loadtxt is a high-priority.  I think loadtxt is now the 3rd 
or 4th text-reading interface I've seen in NumPy.  I have no interest in 
making a new one if we can avoid it.   But, we do need to make it faster with 
less memory overhead for simple cases like Wes describes.

-Travis



On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:

 Hi,
 
 23.02.2012 20:32, Wes McKinney kirjoitti:
 [clip]
 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.
 
 If you do this, one useful aim could be to design the code such that it
 can be used in loadtxt, at least as a fast path for common cases. I'd
 really like to avoid increasing the number of APIs for text file loading.
 
 -- 
 Pauli Virtanen
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:08 PM, Travis Oliphant tra...@continuum.io wrote:
 This is actually on my short-list as well --- it just didn't make it to the 
 list.

 In fact, we have someone starting work on it this week.  It is his first 
 project so it will take him a little time to get up to speed on it, but he 
 will contact Wes and work with him and report progress to this list.

 Integration with np.loadtxt is a high-priority.  I think loadtxt is now the 
 3rd or 4th text-reading interface I've seen in NumPy.  I have no interest 
 in making a new one if we can avoid it.   But, we do need to make it faster 
 with less memory overhead for simple cases like Wes describes.

 -Travis

Yeah, what I envision is just an infrastructural parsing engine to
replace the pure Python guts of np.loadtxt, np.genfromtxt, and the csv
module + Cython guts of pandas.read_{csv, table, excel}. It needs to
be somewhat adaptable to some of the domain specific decisions of
structured arrays vs. DataFrames-- like I use Python objects for
strings, but one consequence of this is that I can intern strings
(only one PyObject per unique string value occurring) where structured
arrays cannot, so you get much better performance and memory usage
that way. That's soon to change, though, I gather, at which point I'll
almost definitely (!) move to pointer arrays instead of dtype=object
arrays.

- Wes



 On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:

 Hi,

 23.02.2012 20:32, Wes McKinney kirjoitti:
 [clip]
 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.

 If you do this, one useful aim could be to design the code such that it
 can be used in loadtxt, at least as a fast path for common cases. I'd
 really like to avoid increasing the number of APIs for text file loading.

 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mkl usage

2012-02-23 Thread Neal Becker
Pauli Virtanen wrote:

 23.02.2012 20:44, Francesc Alted kirjoitti:
 On Feb 23, 2012, at 1:33 PM, Neal Becker wrote:
 
 Is mkl only used for linear algebra?  Will it speed up e.g., elementwise
 transendental functions?
 
 Yes, MKL comes with VML that has this type of optimizations:
 
 And also no, in the sense that Numpy and Scipy don't use VML.
 

My question is:

Should I purchase MKL?

To what extent will it speed up my existing python code, without my having to 
exert (much) effort?

So that would be numpy/scipy.

I'd entertain trying other things, if it wasn't much effort.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Warren Weckesser
On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant tra...@continuum.iowrote:

 This is actually on my short-list as well --- it just didn't make it to
 the list.

 In fact, we have someone starting work on it this week.  It is his first
 project so it will take him a little time to get up to speed on it, but he
 will contact Wes and work with him and report progress to this list.

 Integration with np.loadtxt is a high-priority.  I think loadtxt is now
 the 3rd or 4th text-reading interface I've seen in NumPy.  I have no
 interest in making a new one if we can avoid it.   But, we do need to make
 it faster with less memory overhead for simple cases like Wes describes.

 -Travis



I have a proof of concept CSV reader written in C (with a Cython
wrapper).  I'll put it on github this weekend.

Warren




 On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:

  Hi,
 
  23.02.2012 20:32, Wes McKinney kirjoitti:
  [clip]
  To be clear: I'm going to do this eventually whether or not it
  happens in NumPy because it's an existing problem for heavy
  pandas users. I see no reason why the code can't emit structured
  arrays, too, so we might as well have a common library component
  that I can use in pandas and specialize to the DataFrame internal
  structure.
 
  If you do this, one useful aim could be to design the code such that it
  can be used in loadtxt, at least as a fast path for common cases. I'd
  really like to avoid increasing the number of APIs for text file loading.
 
  --
  Pauli Virtanen
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Wes -

I designed the recfile package to fill this need.  It might be a start.  

Some features: 

- the ability to efficiently read any subset of the data without
  loading the whole file.
- reads directly into a recarray, so no overheads.
- object oriented interface, mimicking recarray slicing.
- also supports writing

Currently it is fixed-width fields only.  It is C++, but wouldn't be
hard to convert it C if that is a requirement.  Also, it works for
binary or ascii.

http://code.google.com/p/recfile/

the trunk is pretty far past the most recent release.

Erin Scott Sheldon

Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012:
 dear all,
 
 I haven't read all 180 e-mails, but I didn't see this on Travis's
 initial list.
 
 All of the existing flat file reading solutions I have seen are
 not suitable for many applications, and they compare very unfavorably
 to tools present in other languages, like R. Here are some of the
 main issues I see:
 
 - Memory usage: creating millions of Python objects when reading
   a large file results in horrendously bad memory utilization,
   which the Python interpreter is loathe to return to the
   operating system. Any solution using the CSV module (like
   pandas's parsers-- which are a lot faster than anything else I
   know of in Python) suffers from this problem because the data
   come out boxed in tuples of PyObjects. Try loading a 1,000,000
   x 20 CSV file into a structured array using np.genfromtxt or
   into a DataFrame using pandas.read_csv and you will immediately
   see the problem. R, by contrast, uses very little memory.
 
 - Performance: post-processing of Python objects results in poor
   performance. Also, for the actual parsing, anything regular
   expression based (like the loadtable effort over the summer,
   all apologies to those who worked on it), is doomed to
   failure. I think having a tool with a high degree of
   compatibility and intelligence for parsing unruly small files
   does make sense though, but it's not appropriate for large,
   well-behaved files.
 
 - Need to factorize: as soon as there is an enum dtype in
   NumPy, we will want to enable the file parsers for structured
   arrays and DataFrame to be able to factorize / convert to
   enum certain columns (for example, all string columns) during
   the parsing process, and not afterward. This is very important
   for enabling fast groupby on large datasets and reducing
   unnecessary memory usage up front (imagine a column with a
   million values, with only 10 unique values occurring). This
   would be trivial to implement using a C hash table
   implementation like khash.h
 
 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.
 
 It seems clear to me that this work needs to be done at the
 lowest level possible, probably all in C (or C++?) or maybe
 Cython plus C utilities.
 
 If anyone wants to get involved in this particular problem right
 now, let me know!
 
 best,
 Wes
-- 
Erin Scott Sheldon
Brookhaven National Laboratory
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:19 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:

 On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant tra...@continuum.io
 wrote:

 This is actually on my short-list as well --- it just didn't make it to
 the list.

 In fact, we have someone starting work on it this week.  It is his first
 project so it will take him a little time to get up to speed on it, but he
 will contact Wes and work with him and report progress to this list.

 Integration with np.loadtxt is a high-priority.  I think loadtxt is now
 the 3rd or 4th text-reading interface I've seen in NumPy.  I have no
 interest in making a new one if we can avoid it.   But, we do need to make
 it faster with less memory overhead for simple cases like Wes describes.

 -Travis



 I have a proof of concept CSV reader written in C (with a Cython
 wrapper).  I'll put it on github this weekend.

 Warren

Sweet, between this, Continuum folks, and me and my guys I think we
can come up with something good and suits all our needs. We should set
up some realistic performance test cases that we can monitor via
vbench (wesm/vbench) while we're work on the project.

- W




 On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:

  Hi,
 
  23.02.2012 20:32, Wes McKinney kirjoitti:
  [clip]
  To be clear: I'm going to do this eventually whether or not it
  happens in NumPy because it's an existing problem for heavy
  pandas users. I see no reason why the code can't emit structured
  arrays, too, so we might as well have a common library component
  that I can use in pandas and specialize to the DataFrame internal
  structure.
 
  If you do this, one useful aim could be to design the code such that it
  can be used in loadtxt, at least as a fast path for common cases. I'd
  really like to avoid increasing the number of APIs for text file
  loading.
 
  --
  Pauli Virtanen
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon erin.shel...@gmail.com wrote:
 Wes -

 I designed the recfile package to fill this need.  It might be a start.

 Some features:

    - the ability to efficiently read any subset of the data without
      loading the whole file.
    - reads directly into a recarray, so no overheads.
    - object oriented interface, mimicking recarray slicing.
    - also supports writing

 Currently it is fixed-width fields only.  It is C++, but wouldn't be
 hard to convert it C if that is a requirement.  Also, it works for
 binary or ascii.

    http://code.google.com/p/recfile/

 the trunk is pretty far past the most recent release.

 Erin Scott Sheldon

Can you relicense as BSD-compatible?

 Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012:
 dear all,

 I haven't read all 180 e-mails, but I didn't see this on Travis's
 initial list.

 All of the existing flat file reading solutions I have seen are
 not suitable for many applications, and they compare very unfavorably
 to tools present in other languages, like R. Here are some of the
 main issues I see:

 - Memory usage: creating millions of Python objects when reading
   a large file results in horrendously bad memory utilization,
   which the Python interpreter is loathe to return to the
   operating system. Any solution using the CSV module (like
   pandas's parsers-- which are a lot faster than anything else I
   know of in Python) suffers from this problem because the data
   come out boxed in tuples of PyObjects. Try loading a 1,000,000
   x 20 CSV file into a structured array using np.genfromtxt or
   into a DataFrame using pandas.read_csv and you will immediately
   see the problem. R, by contrast, uses very little memory.

 - Performance: post-processing of Python objects results in poor
   performance. Also, for the actual parsing, anything regular
   expression based (like the loadtable effort over the summer,
   all apologies to those who worked on it), is doomed to
   failure. I think having a tool with a high degree of
   compatibility and intelligence for parsing unruly small files
   does make sense though, but it's not appropriate for large,
   well-behaved files.

 - Need to factorize: as soon as there is an enum dtype in
   NumPy, we will want to enable the file parsers for structured
   arrays and DataFrame to be able to factorize / convert to
   enum certain columns (for example, all string columns) during
   the parsing process, and not afterward. This is very important
   for enabling fast groupby on large datasets and reducing
   unnecessary memory usage up front (imagine a column with a
   million values, with only 10 unique values occurring). This
   would be trivial to implement using a C hash table
   implementation like khash.h

 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.

 It seems clear to me that this work needs to be done at the
 lowest level possible, probably all in C (or C++?) or maybe
 Cython plus C utilities.

 If anyone wants to get involved in this particular problem right
 now, let me know!

 best,
 Wes
 --
 Erin Scott Sheldon
 Brookhaven National Laboratory
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Éric Depagne
Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit :
 
That would indeed be great. Reading large files is a real pain whatever the 
python method used.

BTW, could you tell us what you mean by large files?

cheers, 
Éric.

 Sweet, between this, Continuum folks, and me and my guys I think we
 can come up with something good and suits all our needs. We should set
 up some realistic performance test cases that we can monitor via
 vbench (wesm/vbench) while we're work on the project.
 
Un clavier azerty en vaut deux
--
Éric Depagnee...@depagne.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 15:24:44 -0500 2012:
 On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon erin.shel...@gmail.com wrote:
  I designed the recfile package to fill this need.  It might be a start.
 Can you relicense as BSD-compatible?

If required, that would be fine with me.
-e

 
  Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012:
  dear all,
 
  I haven't read all 180 e-mails, but I didn't see this on Travis's
  initial list.
 
  All of the existing flat file reading solutions I have seen are
  not suitable for many applications, and they compare very unfavorably
  to tools present in other languages, like R. Here are some of the
  main issues I see:
 
  - Memory usage: creating millions of Python objects when reading
    a large file results in horrendously bad memory utilization,
    which the Python interpreter is loathe to return to the
    operating system. Any solution using the CSV module (like
    pandas's parsers-- which are a lot faster than anything else I
    know of in Python) suffers from this problem because the data
    come out boxed in tuples of PyObjects. Try loading a 1,000,000
    x 20 CSV file into a structured array using np.genfromtxt or
    into a DataFrame using pandas.read_csv and you will immediately
    see the problem. R, by contrast, uses very little memory.
 
  - Performance: post-processing of Python objects results in poor
    performance. Also, for the actual parsing, anything regular
    expression based (like the loadtable effort over the summer,
    all apologies to those who worked on it), is doomed to
    failure. I think having a tool with a high degree of
    compatibility and intelligence for parsing unruly small files
    does make sense though, but it's not appropriate for large,
    well-behaved files.
 
  - Need to factorize: as soon as there is an enum dtype in
    NumPy, we will want to enable the file parsers for structured
    arrays and DataFrame to be able to factorize / convert to
    enum certain columns (for example, all string columns) during
    the parsing process, and not afterward. This is very important
    for enabling fast groupby on large datasets and reducing
    unnecessary memory usage up front (imagine a column with a
    million values, with only 10 unique values occurring). This
    would be trivial to implement using a C hash table
    implementation like khash.h
 
  To be clear: I'm going to do this eventually whether or not it
  happens in NumPy because it's an existing problem for heavy
  pandas users. I see no reason why the code can't emit structured
  arrays, too, so we might as well have a common library component
  that I can use in pandas and specialize to the DataFrame internal
  structure.
 
  It seems clear to me that this work needs to be done at the
  lowest level possible, probably all in C (or C++?) or maybe
  Cython plus C utilities.
 
  If anyone wants to get involved in this particular problem right
  now, let me know!
 
  best,
  Wes
  --
  Erin Scott Sheldon
  Brookhaven National Laboratory
-- 
Erin Scott Sheldon
Brookhaven National Laboratory
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mkl usage

2012-02-23 Thread Francesc Alted
On Feb 23, 2012, at 2:19 PM, Neal Becker wrote:

 Pauli Virtanen wrote:
 
 23.02.2012 20:44, Francesc Alted kirjoitti:
 On Feb 23, 2012, at 1:33 PM, Neal Becker wrote:
 
 Is mkl only used for linear algebra?  Will it speed up e.g., elementwise
 transendental functions?
 
 Yes, MKL comes with VML that has this type of optimizations:
 
 And also no, in the sense that Numpy and Scipy don't use VML.
 
 
 My question is:
 
 Should I purchase MKL?
 
 To what extent will it speed up my existing python code, without my having to 
 exert (much) effort?
 
 So that would be numpy/scipy.

Pauli already answered you.  If you are restricted to use numpy/scipy and your 
aim is to accelerate the evaluation of transcendental functions, then there is 
no point in purchasing MKL.  If you can open your spectrum and use numexpr, 
then I think you should ponder about it.

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 20:32, Wes McKinney a écrit :
 If anyone wants to get involved in this particular problem right
 now, let me know!
Hi Wes,

I'm totally out of the implementations issues you described, but I have
some million-lines-long CSV files so that I experience some slowdown
when loading those.
I'll be very glad to use any upgraded loadfromtxt/genfromtxt/anyfunction
once it's out !

Best,
Pierre

(and this reminds me shamefully that I still didn't take the time to
give a serious try at your pandas...)



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 21:08, Travis Oliphant a écrit :
 I think loadtxt is now the 3rd or 4th text-reading interface I've seen in 
 NumPy.  
Ok, now I understand why I got confused ;-)
-- 
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:31 PM, Éric Depagne e...@depagne.org wrote:
 Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit :

 That would indeed be great. Reading large files is a real pain whatever the
 python method used.

 BTW, could you tell us what you mean by large files?

 cheers,
 Éric.

Reasonably wide CSV files with hundreds of thousands to millions of
rows. I have a separate interest in JSON handling but that is a
different kind of problem, and probably just a matter of forking
ultrajson and having it not produce Python-object-based data
structures.

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012:
 Reasonably wide CSV files with hundreds of thousands to millions of
 rows. I have a separate interest in JSON handling but that is a
 different kind of problem, and probably just a matter of forking
 ultrajson and having it not produce Python-object-based data
 structures.

As a benchmark, recfile can read an uncached file with 350,000 lines and
32 columns in about 5 seconds.  File size ~220M

-e
-- 
Erin Scott Sheldon
Brookhaven National Laboratory
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:55 PM, Erin Sheldon erin.shel...@gmail.com wrote:
 Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012:
 Reasonably wide CSV files with hundreds of thousands to millions of
 rows. I have a separate interest in JSON handling but that is a
 different kind of problem, and probably just a matter of forking
 ultrajson and having it not produce Python-object-based data
 structures.

 As a benchmark, recfile can read an uncached file with 350,000 lines and
 32 columns in about 5 seconds.  File size ~220M

 -e
 --
 Erin Scott Sheldon
 Brookhaven National Laboratory

That's pretty good. That's faster than pandas's csv-module+Cython
approach almost certainly (but I haven't run your code to get a read
on how much my hardware makes a difference), but that's not shocking
at all:

In [1]: df = DataFrame(np.random.randn(35, 32))

In [2]: df.to_csv('/home/wesm/tmp/foo.csv')

In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv')
CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s
Wall time: 7.04 s

I must think that skipping the process of creating 11.2 mm Python
string objects and then individually converting each of them to float.

Note for reference (i'm skipping the first row which has the column
labels from above):

In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv',
dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys:
0.48 s, total: 24.65 s
Wall time: 24.67 s

In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv',
delimiter=',', skiprows=1)
CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s
Wall time: 11.32 s

In this last case for example, around 500 MB of RAM is taken up for an
array that should only be about 80-90MB. If you're a data scientist
working in Python, this is _not good_.

-W
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Gael Varoquaux
On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
 In this last case for example, around 500 MB of RAM is taken up for an
 array that should only be about 80-90MB. If you're a data scientist
 working in Python, this is _not good_.

But why, oh why, are people storing big data in CSV?

G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Éric Depagne
 But why, oh why, are people storing big data in CSV?
Well, that's what scientist do :-)

Éric.
 
 G
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

Un clavier azerty en vaut deux
--
Éric Depagnee...@depagne.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Robert Kern
On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
 In this last case for example, around 500 MB of RAM is taken up for an
 array that should only be about 80-90MB. If you're a data scientist
 working in Python, this is _not good_.

 But why, oh why, are people storing big data in CSV?

Because everyone can read it. It's not so much storage as transmission.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012:
 That's pretty good. That's faster than pandas's csv-module+Cython
 approach almost certainly (but I haven't run your code to get a read
 on how much my hardware makes a difference), but that's not shocking
 at all:
 
 In [1]: df = DataFrame(np.random.randn(35, 32))
 
 In [2]: df.to_csv('/home/wesm/tmp/foo.csv')
 
 In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv')
 CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s
 Wall time: 7.04 s
 
 I must think that skipping the process of creating 11.2 mm Python
 string objects and then individually converting each of them to float.
 
 Note for reference (i'm skipping the first row which has the column
 labels from above):
 
 In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv',
 dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys:
 0.48 s, total: 24.65 s
 Wall time: 24.67 s
 
 In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv',
 delimiter=',', skiprows=1)
 CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s
 Wall time: 11.32 s
 
 In this last case for example, around 500 MB of RAM is taken up for an
 array that should only be about 80-90MB. If you're a data scientist
 working in Python, this is _not good_.

It might be good to compare on recarrays, which are a bit more complex.
Can you try one of these .dat files?

http://www.cosmo.bnl.gov/www/esheldon/data/lensing/scat/05/

The dtype is

[('ra', 'f8'),
 ('dec', 'f8'),
 ('g1', 'f8'),
 ('g2', 'f8'),
 ('err', 'f8'),
 ('scinv', 'f8', 27)]

-- 
Erin Scott Sheldon
Brookhaven National Laboratory
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Benjamin Root
On Thu, Feb 23, 2012 at 3:14 PM, Robert Kern robert.k...@gmail.com wrote:

 On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
  On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
  In this last case for example, around 500 MB of RAM is taken up for an
  array that should only be about 80-90MB. If you're a data scientist
  working in Python, this is _not good_.
 
  But why, oh why, are people storing big data in CSV?

 Because everyone can read it. It's not so much storage as transmission.


Because their labmate/officemate/advisor is using Excel...

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 4:20 PM, Erin Sheldon erin.shel...@gmail.com wrote:
 Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012:
 That's pretty good. That's faster than pandas's csv-module+Cython
 approach almost certainly (but I haven't run your code to get a read
 on how much my hardware makes a difference), but that's not shocking
 at all:

 In [1]: df = DataFrame(np.random.randn(35, 32))

 In [2]: df.to_csv('/home/wesm/tmp/foo.csv')

 In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv')
 CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s
 Wall time: 7.04 s

 I must think that skipping the process of creating 11.2 mm Python
 string objects and then individually converting each of them to float.

 Note for reference (i'm skipping the first row which has the column
 labels from above):

 In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv',
 dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys:
 0.48 s, total: 24.65 s
 Wall time: 24.67 s

 In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv',
 delimiter=',', skiprows=1)
 CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s
 Wall time: 11.32 s

 In this last case for example, around 500 MB of RAM is taken up for an
 array that should only be about 80-90MB. If you're a data scientist
 working in Python, this is _not good_.

 It might be good to compare on recarrays, which are a bit more complex.
 Can you try one of these .dat files?

    http://www.cosmo.bnl.gov/www/esheldon/data/lensing/scat/05/

 The dtype is

 [('ra', 'f8'),
  ('dec', 'f8'),
  ('g1', 'f8'),
  ('g2', 'f8'),
  ('err', 'f8'),
  ('scinv', 'f8', 27)]

 --
 Erin Scott Sheldon
 Brookhaven National Laboratory

Forgot this one that is also widely used:

In [28]: %time recs =
matplotlib.mlab.csv2rec('/home/wesm/tmp/foo.csv', skiprows=1)
CPU times: user 65.16 s, sys: 0.30 s, total: 65.46 s
Wall time: 65.55 s

ok with one of those dat files and the dtype I get:

In [18]: %time arr =
np.genfromtxt('/home/wesm/Downloads/scat-05-000.dat', dtype=dtype,
skip_header=0, delimiter=' ')
CPU times: user 17.52 s, sys: 0.14 s, total: 17.66 s
Wall time: 17.67 s

difference not so stark in this case. I don't produce structured arrays, though

In [26]: %time arr =
read_table('/home/wesm/Downloads/scat-05-000.dat', header=None, sep='
')
CPU times: user 10.15 s, sys: 0.10 s, total: 10.25 s
Wall time: 10.26 s

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 22:38, Benjamin Root a écrit :
 labmate/officemate/advisor is using Excel...
... or an industrial partner with its windows-based software that can
export (when it works) some very nice field data from a proprietary
Honeywell data logger.

CSV data is better than no data ! (and better than XLS data !)

About the *big* data aspect of Gael's question, this reminds me a
software project saying [1] that I would distort the following way :
'' Q : How does a CSV data file get to be a million line long ?
  A : One line at a time ! ''
And my experience with some time series measurements was really about
this : small changes in the data rate, a slightly longer acquisition
period, and that's it !

Pierre
(I shamefully confess I spent several hours writing *ad-hoc* Python
scripts full of regexps and generators just to fix various tiny details
of those CSV files... but in the end it worked !)

[1] I just quickly googled one day at a time for a reference and ended
up on http://en.wikipedia.org/wiki/The_Mythical_Man-Month



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3

2012-02-23 Thread Patrick Armstrong
Hi there,

I'm having a problem building NumPy on Python 2.7.1 and OS X 10.7.3. Here is my 
build log:

https://gist.github.com/1895377

Does anyone have any idea what might be happening? I get a very similar error 
when compiling with clang.

Installing a binary really isn't an option for me due to some specifics of my 
project. Does anyone have an idea what might be wrong?

Thanks.

--patrick
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Announcing Theano 0.5

2012-02-23 Thread Pascal Lamblin
===
 Announcing Theano 0.5
===

This is a major version, with lots of new features, bug fixes, and some
interface changes (deprecated or potentially misleading features were
removed).

Upgrading to Theano 0.5 is recommended for everyone, but you should first make
sure that your code does not raise deprecation warnings with Theano 0.4.1.
Otherwise, in one case the results can change. In other cases, the warnings are
turned into errors (see below for details).

For those using the bleeding edge version in the
git repository, we encourage you to update to the `0.5` tag.

If you have updated to 0.5rc1 or 0.5rc2, you are highly encouraged to
update to 0.5, as some bugs introduced in those versions have now been
fixed, see items marked with '#' in the lists below.


What's New
--

Highlight:
 * Moved to github: http://github.com/Theano/Theano/
 * Old trac ticket moved to assembla ticket: 
http://www.assembla.com/spaces/theano/tickets
 * Theano vision: 
http://deeplearning.net/software/theano/introduction.html#theano-vision (Many 
people)
 * Theano with GPU works in some cases on Windows now. Still experimental. 
(Sebastian Urban)
 * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
   and dot(vector, vector). (James, Frédéric, Pascal)
 * C implementation of Alloc. (James, Pascal)
 * theano.grad() now also work with sparse variable. (Arnaud)
 * Macro to implement the Jacobian/Hessian with 
theano.tensor.{jacobian,hessian} (Razvan)
 * See the Interface changes.


Interface Behavior Changes:
 * The current default value of the parameter axis of
   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
   numpy: None. i.e. operate on all dimensions of the tensor.
   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
   a warning since Theano 0.3 released Nov. 23rd, 2010)
 * The current output dtype of sum with input dtype [u]int* is now always 
[u]int64.
   You can specify the output dtype with a new dtype parameter to sum.
   The output dtype is the one using for the summation.
   There is no warning in previous Theano version about this.
   The consequence is that the sum is done in a dtype with more precision than 
before.
   So the sum could be slower, but will be more resistent to overflow.
   This new behavior is the same as numpy. (Olivier, Pascal)
 # When using a GPU, detect faulty nvidia drivers. This was detected
   when running Theano tests. Now this is always tested. Faulty
   drivers results in in wrong results for reduce operations. (Frederic B.)


Interface Features Removed (most were deprecated):
 * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
   were accepted only by theano.function().
   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
 * tensor.grad(cost, wrt) now always returns an object of the same type as wrt
   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
 * A few tag.shape and Join.vec_length left have been removed. (Frederic)
 * The .value attribute of shared variables is removed, use shared.set_value()
   or shared.get_value() instead. (Frederic)
 * Theano config option home is not used anymore as it was redundant with 
base_compiledir.
   If you use it, Theano will now raise an error. (Olivier D.)
 * scan interface changes: (Razvan Pascanu)
* The use of `return_steps` for specifying how many entries of the output
  to return has been removed. Instead, apply a subtensor to the output
  returned by scan to select a certain slice.
* The inner function (that scan receives) should return its outputs and
  updates following this order:
[outputs], [updates], [condition].
  One can skip any of the three if not used, but the order has to stay 
unchanged.

Interface bug fix:
 * Rop in some case should have returned a list of one Theano variable,
   but returned the variable itself. (Razvan)

New deprecation (will be removed in Theano 0.6, warning generated if you use 
them):
 * tensor.shared() renamed to tensor._shared(). You probably want to
   call theano.shared() instead! (Olivier D.)


Bug fixes (incorrect results):
 * On CPU, if the convolution had received explicit shape information,
   they where not checked at runtime.  This caused wrong result if the
   input shape was not the one expected. (Frederic, reported by Sander
   Dieleman)
 * Theoretical bug: in some case we could have GPUSum return bad value.
   We were not able to reproduce this problem
 * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction 
on this dim):
   01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
 * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. 
(James)
 * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
   The grad is now disabled and returns an error. (Frederic)
 * An expression of the form 1 / (exp(x) +- constant) was systematically 
matched to 1 / 

Re: [Numpy-discussion] np.longlong casts to int

2012-02-23 Thread Matthew Brett
Hi,

On Thu, Feb 23, 2012 at 2:56 PM, Pierre Haessig
pierre.haes...@crans.org wrote:
 Le 23/02/2012 20:08, Mark Wiebe a écrit :
 +1, I think it's good for its name to correspond to the name in C/C++,
 so that when people search for information on it they will find the
 relevant information more easily. With a bunch of NumPy-specific
 aliases, it just creates more hassle for everybody.
 I don't fully agree.

 First, this assumes that people were C-educated, at least a bit. I got
 some C education, but I spent most of my scientific programming time
 sitting in front of Python, Matlab, and a bit of R (in that order). In
 this context, double, floats, long  and short are all esoteric incantation.
 Second the C/C++ names are very unprecise with regards to their memory
 content, and sometimes platform dependent. On the other float64 is
 very informative.

Right - no proposal to change float64 because it's not ambiguous - it
is both binary64 IEEE floating point format and 64 bit width.

The confusion here is for float128 - which is very occasionally IEEE
binary128 and can be at least two other things (PPC twin double, and
Intel 80 bit padded to 128 bits).  Some of us were also surprised to
find float96 is the same precision as float128 (being an 80 bit Intel
padded to 96 bits).

The renaming is an attempt to make it less confusing.   Do you agree
the renaming is less confusing?  Do you have another proposal?

Preferring 'longdouble' is precisely to flag up to people that they
may need to do some more research to find out what exactly that is.
Which is correct :)

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Test survey that I have been putting together

2012-02-23 Thread Travis Oliphant
Hey all, 

I would like to gather concrete information about NumPy users and have some 
data to look at regarding the user base and features that are of interest.   

We have been putting together a survey that I would love feedback on from 
members of this list. If you have time and are interested in helping us 
gather information for improving NumPy, could you please take and fill out 
information on the following survey: 

https://www.surveymonkey.com/s/numpy_list_survey

After you complete the survey, I would really appreciate any feedback on 
questions that could be improved, removed, or added.  

Once we incoporate your feedback, we will distribute the survey more broadly 
and will report back the main results of the survey to this list. 

Thank you, 

-Travis
 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Drew Frank
For convenience, here's a link to the mailing list thread on this topic
from a couple months ago:
http://thread.gmane.org/gmane.comp.python.numeric.general/47094 .

Drew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Paul Anton Letnes
As others on this list, I've also been confused a bit by the prolific numpy 
interfaces to reading text. Would it be an idea to create some sort of object 
oriented solution for this purpose?

reader = np.FileReader('my_file.txt')
reader.loadtxt() # for backwards compat.; np.loadtxt could instantiate a reader 
and call this function if one wants to keep the interface
reader.very_general_and_typically_slow_reading(missing_data=True)
reader.my_files_look_like_this_plz_be_fast(fmt='%20.8e', separator=',', ncol=2)
reader.cvs_read() # same as above, but with sensible defaults
reader.lazy_read() # returns a generator/iterator, so you can slice out a small 
part of a huge array, for instance, even when working with text (yes, 
inefficient)
reader.convert_line_by_line(myfunc) # line-by-line call myfunc, letting the 
user somehow convert easily to his/her format of choice: netcdf, hdf5, ... Not 
fast, but convenient

Another option is to create a hierarchy of readers implemented as classes. Not 
sure if the benefits outweigh the disadvantages.

Just a crazy idea - it would at least gather all the file reading interfaces 
into one place (or one object hierarchy) so folks know where to look. The whole 
numpy namespace is a bit cluttered, imho, and for newbies it would be 
beneficial to use submodules to a greater extent than today - but that's a more 
long-term discussion.

Paul


On 23. feb. 2012, at 21:08, Travis Oliphant wrote:

 This is actually on my short-list as well --- it just didn't make it to the 
 list. 
 
 In fact, we have someone starting work on it this week.  It is his first 
 project so it will take him a little time to get up to speed on it, but he 
 will contact Wes and work with him and report progress to this list. 
 
 Integration with np.loadtxt is a high-priority.  I think loadtxt is now the 
 3rd or 4th text-reading interface I've seen in NumPy.  I have no interest 
 in making a new one if we can avoid it.   But, we do need to make it faster 
 with less memory overhead for simple cases like Wes describes.
 
 -Travis
 
 
 
 On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:
 
 Hi,
 
 23.02.2012 20:32, Wes McKinney kirjoitti:
 [clip]
 To be clear: I'm going to do this eventually whether or not it
 happens in NumPy because it's an existing problem for heavy
 pandas users. I see no reason why the code can't emit structured
 arrays, too, so we might as well have a common library component
 that I can use in pandas and specialize to the DataFrame internal
 structure.
 
 If you do this, one useful aim could be to design the code such that it
 can be used in loadtxt, at least as a fast path for common cases. I'd
 really like to avoid increasing the number of APIs for text file loading.
 
 -- 
 Pauli Virtanen
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion