Re: [Numpy-discussion] loadtxt and usecols

2015-11-13 Thread Irvin Probst

On 11/11/2015 18:38, Sebastian Berg wrote:


Sounds fine to me, and considering the squeeze logic (which I think is
unfortunate, but it is not something you can easily change), I would be
for simply adding logic to accept a single integral argument and
otherwise not change anything.
[...]

As said before, the other/additional thing that might be very helpful is
trying to give a more useful error message.



I've modified my PR to (hopefully) match these requests.
https://github.com/numpy/numpy/pull/6656

Regards.

--
Irvin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-11 Thread Sebastian Berg
On Di, 2015-11-10 at 17:39 +0100, Irvin Probst wrote:
> On 10/11/2015 16:52, Daπid wrote:
> > 42,  is exactly the same as (42,) If you want a tuple of 
> > tuples, you have to do ((42,),), but then it raises: TypeError: list 
> > indices must be integers, not tuple.
> 
> My bad, I wrote that too fast, please forget this.
> 
> > I think loadtxt should be a tool to read text files in the least 
> > surprising fashion, and a text file is a 1 or 2D container, so it 
> > shouldn't return any other shapes.
> 
> And I *do* agree with the "shouldn't return any other shapes" part of 
> your phrase. What I was trying to say, admitedly with a very bogus 
> example, is that either loadtxt() should always output an array whose 
> shape matches the shape of the object passed to usecol or it should 
> never do it, and I'm if favor of never.

Sounds fine to me, and considering the squeeze logic (which I think is
unfortunate, but it is not something you can easily change), I would be
for simply adding logic to accept a single integral argument and
otherwise not change anything.
I am personally against the flattening and even the array-like logic [1]
currently in the PR, it seems like arbitrary generality for my taste
without any obvious application.

As said before, the other/additional thing that might be very helpful is
trying to give a more useful error message.

- Sebastian


[1] Almost all 1-d array-likes will be sequences/iterables in any case,
those that are not are so obscure that there is no point in explicitly
supporting them.


> I'm perfectly aware that what I suggest would break the current behavior 
> of usecols=(2,) so I know it does not have the slightest probability of 
> being accepted but still, I think that the "least surprising fashion" is 
> to always return an 2-D array because for many, many, many people a text 
> data file has N lines and M columns and N=1 or M=1 is not a specific case.
> 
> Anyway I will of course modify my PR according to any decision made here.
> 
> In your example:
> >
> > a=[[[2,],[],[],],[],[],[]]
> > foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)
> >
> > What would the shape of foo be?
> 
> As I said in my previous email:
> 
>  > should just work and return me a 2-D (or 1-D if you like) array with 
> the data I asked for
> 
> So, 1-D or 2-D it is up to you, but as long as there is no ambiguity in 
> which columns the user is asking for it should imho work.
> 
> Regards.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Benjamin Root
Just pointing out np.loadtxt(..., ndmin=2) will always return a 2D array.
Notice that without that option, the result is effectively squeezed. So if
you don't specify that option, and you load up a CSV file with only one
row, you will get a very differently shaped array than if you load up a CSV
file with two rows.

Ben Root

On Tue, Nov 10, 2015 at 10:07 AM, Irvin Probst <
irvin.pro...@ensta-bretagne.fr> wrote:

> On 10/11/2015 14:17, Sebastian Berg wrote:
>
>> Actually, it is the "sequence special case" type ;). (matlab does not
>> have this, since matlab always returns 2-D I realized).
>>
>> As I said, if usecols is like indexing, the result should mimic:
>>
>> arr = np.loadtxt(f)
>> arr = arr[usecols]
>>
>> in which case a 1-D array is returned if you put in a scalar into
>> usecols (and you could even generalize usecols to higher dimensional
>> array-likes).
>> The way you implemented it -- which is fine, but I want to stress that
>> there is a real decision being made here --, you always see it as a
>> sequence but allow a scalar for convenience (i.e. always return a 2-D
>> array). It is a `sequence of ints or int` type argument and not an
>> array-like argument in my opinion.
>>
>
> I think we have two separate problems here:
>
> The first one is whether loadtxt should always return a 2D array or should
> it match the shape of the usecol argument. From a CS guy point of view I do
> understand your concern here. Now from a teacher point of view I know many
> people expect to get a "matrix" (thank you Matlab...) and the "purity" of
> matching the dimension of the usecol variable will be seen by many people
> [1] as a nerdy useless heavyness noone cares of (no offense). So whatever
> you, seadoned numpy devs from this mailing list, decide I think it should
> be explained in the docstring with a very clear wording.
>
> My own opinion on this first problem is that loadtxt() should always
> return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it means
> I want to read the whole file and then I explicitely ask for transforming
> the 2-D array loadtxt() returned into a 1-D array. Otoh if I write
> loadtxt(f, usecol=42) it means I don't want to read the other columns and I
> want only this one, but it does not mean that I want to change the returned
> array from 2-D to 1-D. I know this new behavior might break a lot of
> existing code as usecol=(42,) used to return a 1-D array, but
> usecol=42, also returns a 1-D array so the current behavior is not
> consistent imho.
>
> The second problem is about the wording in the docstring, when I see
> "sequence of int or int" I uderstand I will have to cast into a 1-D python
> list whatever wicked N-dimensional object I use to store my column indexes,
> or hope list(my_object) will do it fine. On the other hand when I read
> "array-like" the function is telling me I don't have to worry about my
> object, as long as numpy knows how to cast it into an array it will be fine.
>
> Anyway I think something like that:
>
> import numpy as np
> a=[[[2,],[],[],],[],[],[]]
> foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)
>
> should just work and return me a 2-D (or 1-D if you like) array with the
> data I asked for and I don't think "a" here is an int or a sequence of int
> (but it's a good example of why loadtxt() should not match the shape of the
> usecol argument).
>
> To make it short, let the reading function read the data in a consistent
> and predictible way and then let the user explicitely change the data's
> shape into anything he likes.
>
> Regards.
>
> [1] read non CS people trying to switch to numpy/scipy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Sebastian Berg
On Di, 2015-11-10 at 10:24 -0500, Benjamin Root wrote:
> Just pointing out np.loadtxt(..., ndmin=2) will always return a 2D
> array. Notice that without that option, the result is effectively
> squeezed. So if you don't specify that option, and you load up a CSV
> file with only one row, you will get a very differently shaped array
> than if you load up a CSV file with two rows.
> 

Oh, well I personally think that default squeeze is an abomination :).

Anyway, I just wanted to point out that it is two different possible
logics, and we have to pick one.
I have a slight preference for the indexing/array-like interpretation,
but I am aware that from a usage point of view the sequence one is
likely better.
I could throw in another option: Throw an explicit error instead of the
general.

Anyway, I *really* do not have an opinion about what is better.

Array-like would only suggest that you also accept buffer interface
objects or array_interface stuff. Which in this case is really
unnecessary I think.

- Sebastian


> 
> Ben Root
> 
> 
> On Tue, Nov 10, 2015 at 10:07 AM, Irvin Probst
>  wrote:
> On 10/11/2015 14:17, Sebastian Berg wrote:
> Actually, it is the "sequence special case" type ;).
> (matlab does not
> have this, since matlab always returns 2-D I
> realized).
> 
> As I said, if usecols is like indexing, the result
> should mimic:
> 
> arr = np.loadtxt(f)
> arr = arr[usecols]
> 
> in which case a 1-D array is returned if you put in a
> scalar into
> usecols (and you could even generalize usecols to
> higher dimensional
> array-likes).
> The way you implemented it -- which is fine, but I
> want to stress that
> there is a real decision being made here --, you
> always see it as a
> sequence but allow a scalar for convenience (i.e.
> always return a 2-D
> array). It is a `sequence of ints or int` type
> argument and not an
> array-like argument in my opinion.
> 
> I think we have two separate problems here:
> 
> The first one is whether loadtxt should always return a 2D
> array or should it match the shape of the usecol argument.
> From a CS guy point of view I do understand your concern here.
> Now from a teacher point of view I know many people expect to
> get a "matrix" (thank you Matlab...) and the "purity" of
> matching the dimension of the usecol variable will be seen by
> many people [1] as a nerdy useless heavyness noone cares of
> (no offense). So whatever you, seadoned numpy devs from this
> mailing list, decide I think it should be explained in the
> docstring with a very clear wording.
> 
> My own opinion on this first problem is that loadtxt() should
> always return a 2D array, no less, no more. If I write
> np.loadtxt(f)[42] it means I want to read the whole file and
> then I explicitely ask for transforming the 2-D array
> loadtxt() returned into a 1-D array. Otoh if I write
> loadtxt(f, usecol=42) it means I don't want to read the other
> columns and I want only this one, but it does not mean that I
> want to change the returned array from 2-D to 1-D. I know this
> new behavior might break a lot of existing code as
> usecol=(42,) used to return a 1-D array, but
> usecol=42, also returns a 1-D array so the current
> behavior is not consistent imho.
> 
> The second problem is about the wording in the docstring, when
> I see "sequence of int or int" I uderstand I will have to cast
> into a 1-D python list whatever wicked N-dimensional object I
> use to store my column indexes, or hope list(my_object) will
> do it fine. On the other hand when I read "array-like" the
> function is telling me I don't have to worry about my object,
> as long as numpy knows how to cast it into an array it will be
> fine.
> 
> Anyway I think something like that:
> 
> import numpy as np
> a=[[[2,],[],[],],[],[],[]]
> foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)
> 
> should just work and return me a 2-D (or 1-D if you like)
> array with the data I asked for and I don't think "a" here is
> an int or a sequence of int (but it's a good example of why
> loadtxt() should not match the shape of the usecol argument).
> 
> To make it short, let the reading function read the data in a
> 

Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Irvin Probst

On 10/11/2015 14:17, Sebastian Berg wrote:

Actually, it is the "sequence special case" type ;). (matlab does not
have this, since matlab always returns 2-D I realized).

As I said, if usecols is like indexing, the result should mimic:

arr = np.loadtxt(f)
arr = arr[usecols]

in which case a 1-D array is returned if you put in a scalar into
usecols (and you could even generalize usecols to higher dimensional
array-likes).
The way you implemented it -- which is fine, but I want to stress that
there is a real decision being made here --, you always see it as a
sequence but allow a scalar for convenience (i.e. always return a 2-D
array). It is a `sequence of ints or int` type argument and not an
array-like argument in my opinion.


I think we have two separate problems here:

The first one is whether loadtxt should always return a 2D array or 
should it match the shape of the usecol argument. From a CS guy point of 
view I do understand your concern here. Now from a teacher point of view 
I know many people expect to get a "matrix" (thank you Matlab...) and 
the "purity" of matching the dimension of the usecol variable will be 
seen by many people [1] as a nerdy useless heavyness noone cares of (no 
offense). So whatever you, seadoned numpy devs from this mailing list, 
decide I think it should be explained in the docstring with a very clear 
wording.


My own opinion on this first problem is that loadtxt() should always 
return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it 
means I want to read the whole file and then I explicitely ask for 
transforming the 2-D array loadtxt() returned into a 1-D array. Otoh if 
I write loadtxt(f, usecol=42) it means I don't want to read the other 
columns and I want only this one, but it does not mean that I want to 
change the returned array from 2-D to 1-D. I know this new behavior 
might break a lot of existing code as usecol=(42,) used to return a 1-D 
array, but usecol=42, also returns a 1-D array so the current 
behavior is not consistent imho.


The second problem is about the wording in the docstring, when I see 
"sequence of int or int" I uderstand I will have to cast into a 1-D 
python list whatever wicked N-dimensional object I use to store my 
column indexes, or hope list(my_object) will do it fine. On the other 
hand when I read "array-like" the function is telling me I don't have to 
worry about my object, as long as numpy knows how to cast it into an 
array it will be fine.


Anyway I think something like that:

import numpy as np
a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

should just work and return me a 2-D (or 1-D if you like) array with the 
data I asked for and I don't think "a" here is an int or a sequence of 
int (but it's a good example of why loadtxt() should not match the shape 
of the usecol argument).


To make it short, let the reading function read the data in a consistent 
and predictible way and then let the user explicitely change the data's 
shape into anything he likes.


Regards.

[1] read non CS people trying to switch to numpy/scipy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Daπid
On 10 November 2015 at 16:07, Irvin Probst 
wrote:

> I know this new behavior might break a lot of existing code as
> usecol=(42,) used to return a 1-D array, but usecol=42, also
> returns a 1-D array so the current behavior is not consistent imho.


42,  is exactly the same as (42,) If you want a tuple of tuples,
you have to do ((42,),), but then it raises: TypeError: list indices must
be integers, not tuple.

What numpy cares about is that whatever object you give it is iterable, and
its entries are ints, so usecol={0:'a', 5:'b'} is perfectly valid.

I think loadtxt should be a tool to read text files in the least surprising
fashion, and a text file is a 1 or 2D container, so it shouldn't return any
other shapes. Any fancy stuff one may want to do with the output should be
done with the typical indexing tricks. If I want a single column, I would
first be very surprised if I got a 2D array (I was bitten by this design in
MATLAB many many times). For the rare cases where I do want a "fake" 2D
array, I can make it explicit by expanding it with arr[:, np.newaxis], and
then I know that the shape will be (N, 1) and not (1, N). Thus, usecols
should be int or sequence of ints, and the result 1 or 2D.


In your example:

a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

What would the shape of foo be?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Irvin Probst

On 10/11/2015 16:52, Daπid wrote:
42,  is exactly the same as (42,) If you want a tuple of 
tuples, you have to do ((42,),), but then it raises: TypeError: list 
indices must be integers, not tuple.


My bad, I wrote that too fast, please forget this.

I think loadtxt should be a tool to read text files in the least 
surprising fashion, and a text file is a 1 or 2D container, so it 
shouldn't return any other shapes.


And I *do* agree with the "shouldn't return any other shapes" part of 
your phrase. What I was trying to say, admitedly with a very bogus 
example, is that either loadtxt() should always output an array whose 
shape matches the shape of the object passed to usecol or it should 
never do it, and I'm if favor of never.
I'm perfectly aware that what I suggest would break the current behavior 
of usecols=(2,) so I know it does not have the slightest probability of 
being accepted but still, I think that the "least surprising fashion" is 
to always return an 2-D array because for many, many, many people a text 
data file has N lines and M columns and N=1 or M=1 is not a specific case.


Anyway I will of course modify my PR according to any decision made here.

In your example:


a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

What would the shape of foo be?


As I said in my previous email:

> should just work and return me a 2-D (or 1-D if you like) array with 
the data I asked for


So, 1-D or 2-D it is up to you, but as long as there is no ambiguity in 
which columns the user is asking for it should imho work.


Regards.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Sebastian Berg
On Mo, 2015-11-09 at 20:36 +0100, Ralf Gommers wrote:
> 
> 
> On Mon, Nov 9, 2015 at 7:42 PM, Benjamin Root 
> wrote:
> My personal rule for flexible inputs like that is that it
> should be encouraged so long as it does not introduce
> ambiguity. Furthermore, Allowing a scalar as an input doesn't
> add a congitive disconnect on the user on how to specify
> multiple columns. Therefore, I'd give this a +1.
> 
> 
> On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst
>  wrote:
> Hi,
> I've recently seen many students, coming from Matlab,
> struggling against the usecols argument of loadtxt.
> Most of them tried something like:
> loadtxt("foo.bar", usecols=2) or the ones with better
> documentation reading skills tried loadtxt("foo.bar",
> usecols=(2)) but none of them understood they had to
> write usecols=[2] or usecols=(2,).
> 
> Is there a policy in numpy stating that this kind of
> arguments must be sequences ?
> 
> 
> There isn't. In many/most cases it's array_like, which means scalar,
> sequence or array.
>  

Agree, I think we have, or should have, to types of things there (well,
three since we certainly have "must be sequence").
Args such as "axes" which is typically just one, so we allow scalar, but
can often be generalized to a sequence. And things that are array-likes
(and broadcasting).

So, if this is an array-like, however, the "correct" result could be
different by broadcasting between `1` and `(1,)` analogous to indexing
the full array with usecols:

usecols=1 result:
array([2, 3, 4, 5])

usecols=(1,) result [1]:
array([[2, 3, 4, 5]])

since a scalar row (so just one row) is read and not a 2D array. I tend
to say it should be an array-like argument and not a generalized
sequence argument, just wanted to note that, since I am not sure what
matlab does.

- Sebastian


[1] could go further and do `usecols=[[1]]` and get
`array([[[2, 3, 4, 5]]])`

> 
> I think that being able to an int or a sequence when a
> single column is needed would make this function a bit
> more user friendly for beginners. I would gladly
> submit a PR if noone disagrees.
> 
> +1
> 
> 
> Ralf
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Irvin Probst

On 10/11/2015 09:19, Sebastian Berg wrote:

since a scalar row (so just one row) is read and not a 2D array. I tend
to say it should be an array-like argument and not a generalized
sequence argument, just wanted to note that, since I am not sure what
matlab does.


Hi,
By default Matlab reads everything, silently fails on what can't be 
converted into a float and the user has to guess what was read or not.

Say you have a file like this:

2010-01-01 00:00:00 3.026
2010-01-01 01:00:00 4.049
2010-01-01 02:00:00 4.865


>> M=load('CONCARNEAU_2010.txt');
>> M(1:3,:)

ans =

   1.0e+03 *

2.0100 00.0030
2.01000.00100.0040
2.01000.00200.0049


I think this is a terrible way of doing it even if newcomers might find 
this handy. There are of course optionnal arguments (even regexps !) but 
to my knowledge almost no Matlab user even knows these arguments are there.


Anyway, I made a PR here https://github.com/numpy/numpy/pull/6656 with 
usecols as an array-like.


Regards.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-10 Thread Sebastian Berg
On Di, 2015-11-10 at 10:24 +0100, Irvin Probst wrote:
> On 10/11/2015 09:19, Sebastian Berg wrote:
> > since a scalar row (so just one row) is read and not a 2D array. I tend
> > to say it should be an array-like argument and not a generalized
> > sequence argument, just wanted to note that, since I am not sure what
> > matlab does.
> 
> Hi,
> By default Matlab reads everything, silently fails on what can't be 
> converted into a float and the user has to guess what was read or not.
> Say you have a file like this:
> 
> 2010-01-01 00:00:00 3.026
> 2010-01-01 01:00:00 4.049
> 2010-01-01 02:00:00 4.865
> 
> 
>  >> M=load('CONCARNEAU_2010.txt');
>  >> M(1:3,:)
> 
> ans =
> 
> 1.0e+03 *
> 
>  2.0100 00.0030
>  2.01000.00100.0040
>  2.01000.00200.0049
> 
> 
> I think this is a terrible way of doing it even if newcomers might find 
> this handy. There are of course optionnal arguments (even regexps !) but 
> to my knowledge almost no Matlab user even knows these arguments are there.
> 
> Anyway, I made a PR here https://github.com/numpy/numpy/pull/6656 with 
> usecols as an array-like.
> 

Actually, it is the "sequence special case" type ;). (matlab does not
have this, since matlab always returns 2-D I realized).

As I said, if usecols is like indexing, the result should mimic:

arr = np.loadtxt(f)
arr = arr[usecols]

in which case a 1-D array is returned if you put in a scalar into
usecols (and you could even generalize usecols to higher dimensional
array-likes).
The way you implemented it -- which is fine, but I want to stress that
there is a real decision being made here --, you always see it as a
sequence but allow a scalar for convenience (i.e. always return a 2-D
array). It is a `sequence of ints or int` type argument and not an
array-like argument in my opinion.

- Sebastian


> Regards.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-09 Thread Benjamin Root
My personal rule for flexible inputs like that is that it should be
encouraged so long as it does not introduce ambiguity. Furthermore,
Allowing a scalar as an input doesn't add a congitive disconnect on the
user on how to specify multiple columns. Therefore, I'd give this a +1.

On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst  wrote:

> Hi,
> I've recently seen many students, coming from Matlab, struggling against
> the usecols argument of loadtxt. Most of them tried something like:
> loadtxt("foo.bar", usecols=2) or the ones with better documentation
> reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them
> understood they had to write usecols=[2] or usecols=(2,).
>
> Is there a policy in numpy stating that this kind of arguments must be
> sequences ? I think that being able to an int or a sequence when a single
> column is needed would make this function a bit more user friendly for
> beginners. I would gladly submit a PR if noone disagrees.
>
> Regards.
>
> --
> Irvin
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and usecols

2015-11-09 Thread Ralf Gommers
On Mon, Nov 9, 2015 at 7:42 PM, Benjamin Root  wrote:

> My personal rule for flexible inputs like that is that it should be
> encouraged so long as it does not introduce ambiguity. Furthermore,
> Allowing a scalar as an input doesn't add a congitive disconnect on the
> user on how to specify multiple columns. Therefore, I'd give this a +1.
>
> On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst <
> irvin.pro...@ensta-bretagne.fr> wrote:
>
>> Hi,
>> I've recently seen many students, coming from Matlab, struggling against
>> the usecols argument of loadtxt. Most of them tried something like:
>> loadtxt("foo.bar", usecols=2) or the ones with better documentation
>> reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them
>> understood they had to write usecols=[2] or usecols=(2,).
>>
>> Is there a policy in numpy stating that this kind of arguments must be
>> sequences ?
>
>
There isn't. In many/most cases it's array_like, which means scalar,
sequence or array.


> I think that being able to an int or a sequence when a single column is
>> needed would make this function a bit more user friendly for beginners. I
>> would gladly submit a PR if noone disagrees.
>>
>
+1

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt and usecols

2015-11-09 Thread Irvin Probst

Hi,
I've recently seen many students, coming from Matlab, struggling against 
the usecols argument of loadtxt. Most of them tried something like:
loadtxt("foo.bar", usecols=2) or the ones with better documentation 
reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them 
understood they had to write usecols=[2] or usecols=(2,).


Is there a policy in numpy stating that this kind of arguments must be 
sequences ? I think that being able to an int or a sequence when a 
single column is needed would make this function a bit more user 
friendly for beginners. I would gladly submit a PR if noone disagrees.


Regards.

--
Irvin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-06-19 Thread Derek Homeier
On 31 May 2011, at 21:28, Pierre GM wrote:

 On May 31, 2011, at 6:37 PM, Derek Homeier wrote:

 On 31 May 2011, at 18:25, Pierre GM wrote:

 On May 31, 2011, at 5:52 PM, Derek Homeier wrote:

 I think stuff like multiple delimiters should have been dealt with
 before, as the right place to insert the ndmin code (which includes
 the decision to squeeze or not to squeeze as well as to add
 additional
 dimensions, if required) would be right at the end before the
 'unpack'
 switch, or  rather replacing the bit:

  if usemask:
  output = output.view(MaskedArray)
  output._mask = outputmask
  if unpack:
  return output.squeeze().T
  return output.squeeze()

 But there it's already not clear to me how to deal with the
 MaskedArray case...

 Oh, easy.
 You need to replace only the last three lines of genfromtxt with the
 ones from loadtxt  (808-833). Then, if usemask is True, you need to
 use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
 Comments:
 * I would raise an exception if ndmin isn't correct *before* trying
 to read the file...
 * You could define a `collapse_function` that would be
 `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on
 the values of `usemask` and `ndmin`...

Thanks, that helped to clean up a little bit.

 If you have any question about numpy.ma, don't hesitate to contact
 me directly.

 Thanks for the directions! I was not sure about the usemask case
 because it presently does not invoke .squeeze() either...

 The idea is that if `usemask` is True, you build a second array (the  
 mask), that you attach to your main array at the very end (in the  
 `output=output.view(MaskedArray), output._mask = mask` combo...).  
 Afterwards, it's a regular MaskedArray that supports the .squeeze()  
 method...

OK, in both cases output.squeeze() is now used if ndimndmin and  
usemask is False - at least it does not break any tests, so it seems  
to work with MaskedArrays as well.

 On a
 possibly related note, genfromtxt also treats the 'unpack'ing of
 structured arrays differently from loadtxt (which returns a list of
 arrays in that case) - do you know if this is on purpose, or also
 rather missing functionality (I guess it might break  
 recfromtxt()...)?

 Keep in mind that I haven't touched genfromtxt since 8-10 months or  
 so. I wouldn't be surprised that it were lagging a bit behind  
 loadtxt in terms of development. Yes, there'll be some tweaking to  
 do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the  
 defaults) and others, but nothing major.

Well, at long last I got to implement the above and added the  
corresponding tests for genfromtxt - with the exception of the  
dimension-0 cases, since genfromtxt raises an error on empty files.  
There already is a comment it should perhaps better return an empty  
array, so I am putting that idea up for discussion here again.
I tried to devise a very basic test with masked arrays, just added to  
test_withmissing now.
I also implemented the same unpacking behaviour for structured arrays  
and just made recfromtxt set unpack=False to work (or should it issue  
a warning?).

The patches are up for review as commit 8ac01636 in my iocnv-wildcard  
branch:
https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-06-03 Thread cgraves


Derek Homeier wrote:
 
 Hi Chris,
 
 On 31 May 2011, at 13:56, cgraves wrote:
 
 I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
 option,
 however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
 Should they have inherited the option? Who can make it happen?
 
 you are mistaken, genfromtxt is not using loadtxt (and could not  
 possibly, since it has the more complex parser to handle missing  
 data); thus ndmin could not be inherited automatically.
 It certainly would make sense to provide the same functionality for  
 genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
 I'd go ahead and file a feature (enhancement) request. I can't promise  
 I can take care of it myself, as I am less familiar with genfromtxt,  
 but I'd certainly have a look at it.
 
 Does anyone have an opinion whether this is a case for reopening (yet  
 again...)
 http://projects.scipy.org/numpy/ticket/1562
 or create a new ticket?
 

Thanks Derek. That would be greatly appreciated! Based on the 
follow-up messages in this thread, it looks like (hopefully) there will 
not be too much additional work in implementing it. For now I'll just 
use the temporary fix, a .reshape(-1), on any recfromtxt's that 
might read in a single row of data..

Kind regards,
Chris
-- 
View this message in context: 
http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31769169.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread cgraves


Ralf Gommers-2 wrote:
 
 On Fri, May 6, 2011 at 12:57 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
 

 On 6 May 2011, at 07:53, Ralf Gommers wrote:

 
   Looks okay, and I agree that it's better to fix it now. The timing
   is a bit unfortunate though, just after RC2. I'll have closer look
   tomorrow and if it can go in, probably tag RC3.
  
   If in the meantime a few more people could test this, that would be
   helpful.
  
   Ralf
  
   I agree, wish I had time to push this before rc2. I could add the
   explanatory comments
   mentioned above and switch to use the atleast_[12]d() solution, test
   that and push it
   in a couple of minutes, or should I better leave it as is now for
   testing?
 
  Quick follow-up: I just applied the above changes, added some tests to
  cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
  i386+ppc
  + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
  and do
  my (first) pull request...
 
  Go ahead, I'll have a look at it tonight. Thanks for testing on
  several Pythons, that definitely helps.


 Done, the request only appears on my repo
 https://github.com/dhomeier/numpy/

 is that correct? If someone could test it on Linux and Windows as
 well...

 
 Committed, thanks for all the work.
 
 The pull request was in the wrong place, that's a minor flaw in the github
 UI. After you press Pull Request you need to read the small print to see
 where it's going.
 
 Cheers,
 Ralf
 
 


Dear all,

I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin option,
however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
Should they have inherited the option? Who can make it happen?

Best,
Chris 

-- 
View this message in context: 
http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31740152.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
Hi Chris,

On 31 May 2011, at 13:56, cgraves wrote:

 I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
 option,
 however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
 Should they have inherited the option? Who can make it happen?

you are mistaken, genfromtxt is not using loadtxt (and could not  
possibly, since it has the more complex parser to handle missing  
data); thus ndmin could not be inherited automatically.
It certainly would make sense to provide the same functionality for  
genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
I'd go ahead and file a feature (enhancement) request. I can't promise  
I can take care of it myself, as I am less familiar with genfromtxt,  
but I'd certainly have a look at it.

Does anyone have an opinion whether this is a case for reopening (yet  
again...)
http://projects.scipy.org/numpy/ticket/1562
or create a new ticket?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 4:53 PM, Derek Homeier wrote:

 Hi Chris,
 
 On 31 May 2011, at 13:56, cgraves wrote:
 
 I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
 option,
 however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
 Should they have inherited the option? Who can make it happen?
 
 you are mistaken, genfromtxt is not using loadtxt (and could not  
 possibly, since it has the more complex parser to handle missing  
 data); thus ndmin could not be inherited automatically.
 It certainly would make sense to provide the same functionality for  
 genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
 I'd go ahead and file a feature (enhancement) request. I can't promise  
 I can take care of it myself, as I am less familiar with genfromtxt,  
 but I'd certainly have a look at it.

Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be 
squeezed before being returned, right ? You can add some test at the very end 
of genfromtxt to check what to do with the output (whether to squeeze it or 
not, whether to transpose it or not)... If you don't mind doing it, I'd be 
quite grateful (I don't have time to work on numpy these days, much to my 
regret). Don't forget to change the user manual as well...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Bruce Southey
On 05/31/2011 10:18 AM, Pierre GM wrote:
 On May 31, 2011, at 4:53 PM, Derek Homeier wrote:

 Hi Chris,

 On 31 May 2011, at 13:56, cgraves wrote:

 I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin
 option,
 however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
 Should they have inherited the option? Who can make it happen?
 you are mistaken, genfromtxt is not using loadtxt (and could not
 possibly, since it has the more complex parser to handle missing
 data); thus ndmin could not be inherited automatically.
 It certainly would make sense to provide the same functionality for
 genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so
 I'd go ahead and file a feature (enhancement) request. I can't promise
 I can take care of it myself, as I am less familiar with genfromtxt,
 but I'd certainly have a look at it.
 Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be 
 squeezed before being returned, right ? You can add some test at the very end 
 of genfromtxt to check what to do with the output (whether to squeeze it or 
 not, whether to transpose it or not)... If you don't mind doing it, I'd be 
 quite grateful (I don't have time to work on numpy these days, much to my 
 regret). Don't forget to change the user manual as well...
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
(Different function so different ticket.)

Sure you can change the end of the code but that may hide various 
problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially 
handling missing values and using converter functions. So I think that 
some examples must be provided that can not be handled by providing a 
suitable converter or that require multiple assumptions about input file 
(such as having more than one delimiter).

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Benjamin Root
On Tue, May 31, 2011 at 10:33 AM, Bruce Southey bsout...@gmail.com wrote:

 On 05/31/2011 10:18 AM, Pierre GM wrote:
  On May 31, 2011, at 4:53 PM, Derek Homeier wrote:
 
  Hi Chris,
 
  On 31 May 2011, at 13:56, cgraves wrote:
 
  I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin
  option,
  however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
  Should they have inherited the option? Who can make it happen?
  you are mistaken, genfromtxt is not using loadtxt (and could not
  possibly, since it has the more complex parser to handle missing
  data); thus ndmin could not be inherited automatically.
  It certainly would make sense to provide the same functionality for
  genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so
  I'd go ahead and file a feature (enhancement) request. I can't promise
  I can take care of it myself, as I am less familiar with genfromtxt,
  but I'd certainly have a look at it.
  Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must
 be squeezed before being returned, right ? You can add some test at the very
 end of genfromtxt to check what to do with the output (whether to squeeze it
 or not, whether to transpose it or not)... If you don't mind doing it, I'd
 be quite grateful (I don't have time to work on numpy these days, much to my
 regret). Don't forget to change the user manual as well...
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 (Different function so different ticket.)

 Sure you can change the end of the code but that may hide various
 problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially
 handling missing values and using converter functions. So I think that
 some examples must be provided that can not be handled by providing a
 suitable converter or that require multiple assumptions about input file
 (such as having more than one delimiter).

 Bruce


At this point, I wonder if it might be smarter to create a .atleast_Nd()
function and use that everywhere it is needed.  Having similar logic
tailored for each loading function might be a little dangerous if bug fixes
are made to one, but not the others.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 17:33, Bruce Southey wrote:

 It certainly would make sense to provide the same functionality for
 genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt),  
 so
 I'd go ahead and file a feature (enhancement) request. I can't  
 promise
 I can take care of it myself, as I am less familiar with genfromtxt,
 but I'd certainly have a look at it.
 Oh, that shouldn't be too difficult: 'ndmin' tells whether the  
 array must be squeezed before being returned, right ? You can add  
 some test at the very end of genfromtxt to check what to do with  
 the output (whether to squeeze it or not, whether to transpose it  
 or not)... If you don't mind doing it, I'd be quite grateful (I  
 don't have time to work on numpy these days, much to my regret).  
 Don't forget to change the user manual as well...
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 (Different function so different ticket.)

 Sure you can change the end of the code but that may hide various
 problem. Unlike loadtxt, genfromtxt has a lot of flexibility  
 especially
 handling missing values and using converter functions. So I think that
 some examples must be provided that can not be handled by providing a
 suitable converter or that require multiple assumptions about input  
 file
 (such as having more than one delimiter).

I think stuff like multiple delimiters should have been dealt with  
before, as the right place to insert the ndmin code (which includes  
the decision to squeeze or not to squeeze as well as to add additional  
dimensions, if required) would be right at the end before the 'unpack'  
switch, or  rather replacing the bit:

 if usemask:
 output = output.view(MaskedArray)
 output._mask = outputmask
 if unpack:
 return output.squeeze().T
 return output.squeeze()

But there it's already not clear to me how to deal with the  
MaskedArray case...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 17:45, Benjamin Root wrote:

 At this point, I wonder if it might be smarter to create  
 a .atleast_Nd() function and use that everywhere it is needed.   
 Having similar logic tailored for each loading function might be a  
 little dangerous if bug fixes are made to one, but not the others.

Like a generalised version of .atleast_1d / .atleast_2d?
It would also have to include an .atmost_Nd functionality of some  
sorts, to replace the .squeeze(), generally a good idea (e.g.  
something like np.atleast_Nd(X, ndmin=0, ndmax=-1), where the default  
is not to reduce the maximum number of dimensions...).
But for the io routines the situation is a bit more complex, since  
different shapes are expected to be returned  depending on the text  
input (e.g. (1, N) for a single row vs. (N, 1) for a single column of  
data).

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
 
 I think stuff like multiple delimiters should have been dealt with  
 before, as the right place to insert the ndmin code (which includes  
 the decision to squeeze or not to squeeze as well as to add additional  
 dimensions, if required) would be right at the end before the 'unpack'  
 switch, or  rather replacing the bit:
 
 if usemask:
 output = output.view(MaskedArray)
 output._mask = outputmask
 if unpack:
 return output.squeeze().T
 return output.squeeze()
 
 But there it's already not clear to me how to deal with the  
 MaskedArray case...

Oh, easy.
You need to replace only the last three lines of genfromtxt with the ones from 
loadtxt  (808-833). Then, if usemask is True, you need to use ma.atleast_Xd 
instead of np.atleast_Xd. Et voilà.
Comments:
* I would raise an exception if ndmin isn't correct *before* trying to read the 
file... 
* You could define a `collapse_function` that would be `np.atleast_1d`, 
`np.atleast_2d`, `ma.atleast_1d`... depending on the values of `usemask` and 
`ndmin`...
If you have any question about numpy.ma, don't hesitate to contact me directly.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 18:25, Pierre GM wrote:

 On May 31, 2011, at 5:52 PM, Derek Homeier wrote:

 I think stuff like multiple delimiters should have been dealt with
 before, as the right place to insert the ndmin code (which includes
 the decision to squeeze or not to squeeze as well as to add  
 additional
 dimensions, if required) would be right at the end before the  
 'unpack'
 switch, or  rather replacing the bit:

if usemask:
output = output.view(MaskedArray)
output._mask = outputmask
if unpack:
return output.squeeze().T
return output.squeeze()

 But there it's already not clear to me how to deal with the
 MaskedArray case...

 Oh, easy.
 You need to replace only the last three lines of genfromtxt with the  
 ones from loadtxt  (808-833). Then, if usemask is True, you need to  
 use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
 Comments:
 * I would raise an exception if ndmin isn't correct *before* trying  
 to read the file...
 * You could define a `collapse_function` that would be  
 `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on  
 the values of `usemask` and `ndmin`...
 If you have any question about numpy.ma, don't hesitate to contact  
 me directly.

Thanks for the directions! I was not sure about the usemask case  
because it presently does not invoke .squeeze() either... On a  
possibly related note, genfromtxt also treats the 'unpack'ing of  
structured arrays differently from loadtxt (which returns a list of  
arrays in that case) - do you know if this is on purpose, or also  
rather missing functionality (I guess it might break recfromtxt()...)?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 6:37 PM, Derek Homeier wrote:

 On 31 May 2011, at 18:25, Pierre GM wrote:
 
 On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
 
 I think stuff like multiple delimiters should have been dealt with
 before, as the right place to insert the ndmin code (which includes
 the decision to squeeze or not to squeeze as well as to add  
 additional
 dimensions, if required) would be right at the end before the  
 'unpack'
 switch, or  rather replacing the bit:
 
   if usemask:
   output = output.view(MaskedArray)
   output._mask = outputmask
   if unpack:
   return output.squeeze().T
   return output.squeeze()
 
 But there it's already not clear to me how to deal with the
 MaskedArray case...
 
 Oh, easy.
 You need to replace only the last three lines of genfromtxt with the  
 ones from loadtxt  (808-833). Then, if usemask is True, you need to  
 use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
 Comments:
 * I would raise an exception if ndmin isn't correct *before* trying  
 to read the file...
 * You could define a `collapse_function` that would be  
 `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on  
 the values of `usemask` and `ndmin`...
 If you have any question about numpy.ma, don't hesitate to contact  
 me directly.
 
 Thanks for the directions! I was not sure about the usemask case  
 because it presently does not invoke .squeeze() either...

The idea is that if `usemask` is True, you build a second array (the mask), 
that you attach to your main array at the very end (in the 
`output=output.view(MaskedArray), output._mask = mask` combo...). Afterwards, 
it's a regular MaskedArray that supports the .squeeze() method...


 On a  
 possibly related note, genfromtxt also treats the 'unpack'ing of  
 structured arrays differently from loadtxt (which returns a list of  
 arrays in that case) - do you know if this is on purpose, or also  
 rather missing functionality (I guess it might break recfromtxt()...)?

Keep in mind that I haven't touched genfromtxt since 8-10 months or so. I 
wouldn't be surprised that it were lagging a bit behind loadtxt in terms of 
development. Yes, there'll be some tweaking to do for recfromtxt (it's OK for 
now if `ndmin` and `unpack` are the defaults) and others, but nothing major.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-07 Thread Ralf Gommers
On Fri, May 6, 2011 at 12:57 PM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:


 On 6 May 2011, at 07:53, Ralf Gommers wrote:

 
   Looks okay, and I agree that it's better to fix it now. The timing
   is a bit unfortunate though, just after RC2. I'll have closer look
   tomorrow and if it can go in, probably tag RC3.
  
   If in the meantime a few more people could test this, that would be
   helpful.
  
   Ralf
  
   I agree, wish I had time to push this before rc2. I could add the
   explanatory comments
   mentioned above and switch to use the atleast_[12]d() solution, test
   that and push it
   in a couple of minutes, or should I better leave it as is now for
   testing?
 
  Quick follow-up: I just applied the above changes, added some tests to
  cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
  i386+ppc
  + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
  and do
  my (first) pull request...
 
  Go ahead, I'll have a look at it tonight. Thanks for testing on
  several Pythons, that definitely helps.


 Done, the request only appears on my repo
 https://github.com/dhomeier/numpy/

 is that correct? If someone could test it on Linux and Windows as
 well...


Committed, thanks for all the work.

The pull request was in the wrong place, that's a minor flaw in the github
UI. After you press Pull Request you need to read the small print to see
where it's going.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-06 Thread Derek Homeier

On 6 May 2011, at 07:53, Ralf Gommers wrote:


  Looks okay, and I agree that it's better to fix it now. The timing
  is a bit unfortunate though, just after RC2. I'll have closer look
  tomorrow and if it can go in, probably tag RC3.
 
  If in the meantime a few more people could test this, that would be
  helpful.
 
  Ralf
 
  I agree, wish I had time to push this before rc2. I could add the
  explanatory comments
  mentioned above and switch to use the atleast_[12]d() solution, test
  that and push it
  in a couple of minutes, or should I better leave it as is now for
  testing?

 Quick follow-up: I just applied the above changes, added some tests to
 cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
 i386+ppc
 + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
 and do
 my (first) pull request...

 Go ahead, I'll have a look at it tonight. Thanks for testing on  
 several Pythons, that definitely helps.


Done, the request only appears on my repo
https://github.com/dhomeier/numpy/

is that correct? If someone could test it on Linux and Windows as  
well...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 4. mai 2011, at 20.33, Benjamin Root wrote:

  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
 for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
 will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access
 etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the
 original corner cases that ndmin was supposed to address.  Keep in mind that
 np.loadtxt takes a one-column data file and a one-row data file down to the
 same shape.  I don't see how the current code is able to produce the correct
 array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
 counting the number of rows and columns read?  Could we use those to help
 guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would be
 very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).

 What if one does things the other way around - avoid calling squeeze until
 _after_ doing the atleast_Nd() magic? That way the row/column information
 should be conserved, right? Also, we avoid transposing, memory use, ...

 Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.

 Paul



@Derek, good catch with noticing the error in the tests. We do still need to
handle the case I mentioned, however.  I have attached an example script to
demonstrate the issue.  In this script, I would expect the second-to-last
array to be a shape of (1, 5).  I believe that the single-row, multi-column
case would actually be the more common type of edge-case encountered by
users than the others.  Therefore, I believe that this ndmin fix is not
adequate until this is addressed.

@Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would
just undo whatever we had just done.  Also, wrt the transpose, a (1, 10)
array looks the same in memory as a (10, 1) array, right?

Ben Root


loadtest.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 10:49 AM, Benjamin Root ben.r...@ou.edu wrote:



 On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 4. mai 2011, at 20.33, Benjamin Root wrote:

  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
 for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
 will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access
 etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the
 original corner cases that ndmin was supposed to address.  Keep in mind that
 np.loadtxt takes a one-column data file and a one-row data file down to the
 same shape.  I don't see how the current code is able to produce the correct
 array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
 counting the number of rows and columns read?  Could we use those to help
 guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).

 What if one does things the other way around - avoid calling squeeze until
 _after_ doing the atleast_Nd() magic? That way the row/column information
 should be conserved, right? Also, we avoid transposing, memory use, ...

 Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.

 Paul



 @Derek, good catch with noticing the error in the tests. We do still need
 to handle the case I mentioned, however.  I have attached an example script
 to demonstrate the issue.  In this script, I would expect the second-to-last
 array to be a shape of (1, 5).  I believe that the single-row, multi-column
 case would actually be the more common type of edge-case encountered by
 users than the others.  Therefore, I believe that this ndmin fix is not
 adequate until this is addressed.


Apologies Derek, your patch does address the issue I raised.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Paul Anton Letnes

On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
 On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
 On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
  de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written 
   for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it 
   will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the 
  original corner cases that ndmin was supposed to address.  Keep in mind 
  that np.loadtxt takes a one-column data file and a one-row data file down 
  to the same shape.  I don't see how the current code is able to produce the 
  correct array shape when ndmin=2.  Do we have some sort of counter in 
  loadtxt for counting the number of rows and columns read?  Could we use 
  those to help guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would be 
  very clear as to the code's intent.  I don't think we have to worry about 
  memory usage if we limit its use to only situations where ndmin is greater 
  than the number of dimensions of the array.  In those cases, the array is 
  either an empty result, a scalar value (in which memory access is trivial), 
  or 1-d (in which a transpose is cheap).
 
 What if one does things the other way around - avoid calling squeeze until 
 _after_ doing the atleast_Nd() magic? That way the row/column information 
 should be conserved, right? Also, we avoid transposing, memory use, ...
 
 Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
 read as a 2D array.
 
 Paul
 
 
 
 @Derek, good catch with noticing the error in the tests. We do still need to 
 handle the case I mentioned, however.  I have attached an example script to 
 demonstrate the issue.  In this script, I would expect the second-to-last 
 array to be a shape of (1, 5).  I believe that the single-row, multi-column 
 case would actually be the more common type of edge-case encountered by users 
 than the others.  Therefore, I believe that this ndmin fix is not adequate 
 until this is addressed.
 
 @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would 
 just undo whatever we had just done.  Also, wrt the transpose, a (1, 10) 
 array looks the same in memory as a (10, 1) array, right?
Agree. I thought more along the lines of (pseudocode-ish)
if ndmin == 0:
squeeze()
if ndmin == 1:
atleast_1D()
elif ndmin == 2:
atleast_2D()
else:
I don't rightly know what would go here, maybe raise ValueError?

That would avoid the squeeze call before the atleast_Nd magic. But the code was 
changed, so I think my comment doesn't make sense anymore. It's probably fine 
the way it is!

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still need
 to handle the case I mentioned, however.  I have attached an example script
 to demonstrate the issue.  In this script, I would expect the second-to-last
 array to be a shape of (1, 5).  I believe that the single-row, multi-column
 case would actually be the more common type of edge-case encountered by
 users than the others.  Therefore, I believe that this ndmin fix is not
 adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the code
 was changed, so I think my comment doesn't make sense anymore. It's probably
 fine the way it is!

 Paul


I have thought of that too, but the problem with that approach is that after
reading the file, X will have 2 or 3 dimensions, regardless of how many
singleton dims were in the file.  A squeeze will always be needed.  Also,
the purpose of squeeze is opposite that of the atleast_*d() functions:
squeeze reduces dimensions, while atleast_*d will add dimensions.

Therefore, I re-iterate... the patch by Derek gets the job done.  I have
tested it for a wide variety of inputs for both regular arrays and record
arrays.  Is there room for improvements?  Yes, but I think that can wait for
later.  Derek's patch however fixes an important bug in the ndmin
implementation and should be included for the release.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
regression?

Thanks,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote:



 On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it
 would be very clear as to the code's intent.  I don't think we have to worry
 about memory usage if we limit its use to only situations where ndmin is
 greater than the number of dimensions of the array.  In those cases, the
 array is either an empty result, a scalar value (in which memory access is
 trivial), or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
 regression?

 Thanks,
 Ralf



I don't know if he did a pull-request or not, but here is the link he
provided earlier in the thread.

https://github.com/dhomeier/numpy/compare/master...ndmin-cols

Technically, this is not a regression as the ndmin feature is new in this
release.  However, the problem that ndmin is supposed to address is not
fixed by the current implementation for the rc.  Essentially, a single-row,
multi-column file with ndmin=2 comes out as a Nx1 array which is the same
result for a multi-row, single-column file.  My feeling is that if we let
the current implementation stand as is, and developers use it in their code,
then fixing it in a later 

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:46 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers 
 ralf.gomm...@googlemail.comwrote:



 On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down 
 to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it
 would be very clear as to the code's intent.  I don't think we have to 
 worry
 about memory usage if we limit its use to only situations where ndmin is
 greater than the number of dimensions of the array.  In those cases, the
 array is either an empty result, a scalar value (in which memory access is
 trivial), or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
 regression?

 Thanks,
 Ralf



 I don't know if he did a pull-request or not, but here is the link he
 provided earlier in the thread.

 https://github.com/dhomeier/numpy/compare/master...ndmin-cols

 Technically, this is not a regression as the ndmin feature is new in this
 release.


Yes right, I forgot this was a recent change.


 However, the problem that ndmin is supposed to address is not fixed by the
 current implementation for the rc.  Essentially, a single-row, multi-column
 file with ndmin=2 comes out as a Nx1 array which is the same result for a
 multi-row, single-column 

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Derek Homeier

On 5 May 2011, at 22:53, Derek Homeier wrote:


 However, the problem that ndmin is supposed to address is not fixed
 by the current implementation for the rc.  Essentially, a single-
 row, multi-column file with ndmin=2 comes out as a Nx1 array which
 is the same result for a multi-row, single-column file.  My feeling
 is that if we let the current implementation stand as is, and
 developers use it in their code, then fixing it in a later release
 would introduce more problems (maybe the devels would transpose the
 result themselves or something).  Better to fix it now in rc with
 the two lines of code (and the correction to the tests), then to
 introduce a buggy feature that will be hard to fix in future
 releases, IMHO.

 Looks okay, and I agree that it's better to fix it now. The timing
 is a bit unfortunate though, just after RC2. I'll have closer look
 tomorrow and if it can go in, probably tag RC3.

 If in the meantime a few more people could test this, that would be
 helpful.

 Ralf

 I agree, wish I had time to push this before rc2. I could add the
 explanatory comments
 mentioned above and switch to use the atleast_[12]d() solution, test
 that and push it
 in a couple of minutes, or should I better leave it as is now for
 testing?

Quick follow-up: I just applied the above changes, added some tests to
cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5  
i386+ppc
+ 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo  
and do
my (first) pull request...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Fri, May 6, 2011 at 12:12 AM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:


 On 5 May 2011, at 22:53, Derek Homeier wrote:

 
  However, the problem that ndmin is supposed to address is not fixed
  by the current implementation for the rc.  Essentially, a single-
  row, multi-column file with ndmin=2 comes out as a Nx1 array which
  is the same result for a multi-row, single-column file.  My feeling
  is that if we let the current implementation stand as is, and
  developers use it in their code, then fixing it in a later release
  would introduce more problems (maybe the devels would transpose the
  result themselves or something).  Better to fix it now in rc with
  the two lines of code (and the correction to the tests), then to
  introduce a buggy feature that will be hard to fix in future
  releases, IMHO.
 
  Looks okay, and I agree that it's better to fix it now. The timing
  is a bit unfortunate though, just after RC2. I'll have closer look
  tomorrow and if it can go in, probably tag RC3.
 
  If in the meantime a few more people could test this, that would be
  helpful.
 
  Ralf
 
  I agree, wish I had time to push this before rc2. I could add the
  explanatory comments
  mentioned above and switch to use the atleast_[12]d() solution, test
  that and push it
  in a couple of minutes, or should I better leave it as is now for
  testing?

 Quick follow-up: I just applied the above changes, added some tests to
 cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
 i386+ppc
 + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
 and do
 my (first) pull request...


Go ahead, I'll have a look at it tonight. Thanks for testing on several
Pythons, that definitely helps.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Derek Homeier
Hi Paul,

I've got back to your suggestion re. the ndmin flag for loadtxt from a few 
weeks ago...

On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote:

 1562:
  I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code, however.
 
 See comments on Trac as well.
 
 Your patch is better, but there is one thing I disagree with.
 808if X.ndim  ndmin:
 809if ndmin == 1:
 810X.shape = (X.size, )
 811elif ndmin == 2:
 812X.shape = (X.size, 1) 
 The last line should be:
 812X.shape = (1, X.size) 
 If someone wants a 2D array out, they would most likely expect a one-row file 
 to come out as a one-row array, not the other way around. IMHO.

I think you are completely right for the test case with one row. More generally 
though, 
since a file of N rows and M columns is read into an array of shape (N, M), 
ndmin=2 
should enforce X.shape = (1, X.size) for single-row input, and X.shape = 
(X.size, 1) 
for single-column input.
I thought this would be handled automatically by preserving the original 2 
dimensions, 
but apparently with single-row/multi-column input an extra dimension 1 is 
prepended 
when the array is returned from the parser. I've put up a fix for this at 

https://github.com/dhomeier/numpy/compare/master...ndmin-cols

and also tested the patch against 1.6.0.rc2.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Paul Anton Letnes

On 4. mai 2011, at 17.34, Derek Homeier wrote:

 Hi Paul,
 
 I've got back to your suggestion re. the ndmin flag for loadtxt from a few 
 weeks ago...
 
 On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote:
 
 1562:
 I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code, however.
 
 See comments on Trac as well.
 
 Your patch is better, but there is one thing I disagree with.
 808if X.ndim  ndmin:
 809if ndmin == 1:
 810X.shape = (X.size, )
 811elif ndmin == 2:
 812X.shape = (X.size, 1) 
 The last line should be:
 812X.shape = (1, X.size) 
 If someone wants a 2D array out, they would most likely expect a one-row 
 file to come out as a one-row array, not the other way around. IMHO.
 
 I think you are completely right for the test case with one row. More 
 generally though, 
 since a file of N rows and M columns is read into an array of shape (N, M), 
 ndmin=2 
 should enforce X.shape = (1, X.size) for single-row input, and X.shape = 
 (X.size, 1) 
 for single-column input.
 I thought this would be handled automatically by preserving the original 2 
 dimensions, 
 but apparently with single-row/multi-column input an extra dimension 1 is 
 prepended 
 when the array is returned from the parser. I've put up a fix for this at 
 
 https://github.com/dhomeier/numpy/compare/master...ndmin-cols
 
 and also tested the patch against 1.6.0.rc2.
 
 Cheers,
   Derek


Looks sensible to me at least!

But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
reintroduce the 'transposed' problem?

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Derek Homeier
On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:

 But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
 this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
 reintroduce the 'transposed' problem?

Yes, good point, one could replace the 
X.shape = (X.size, ) with X = np.atleast_1d(X), 
but for the ndmin=2 case, we'd need to replace 
X.shape = (X.size, 1) with X = np.atleast_2d(X).T - 
not sure which solution is more efficient in terms of memory access etc...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Benjamin Root
On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:

 On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:

  But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
 for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
 will reintroduce the 'transposed' problem?

 Yes, good point, one could replace the
 X.shape = (X.size, ) with X = np.atleast_1d(X),
 but for the ndmin=2 case, we'd need to replace
 X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
 not sure which solution is more efficient in terms of memory access etc...

 Cheers,
 Derek


I can confirm that the current behavior is not sufficient for all of the
original corner cases that ndmin was supposed to address.  Keep in mind that
np.loadtxt takes a one-column data file and a one-row data file down to the
same shape.  I don't see how the current code is able to produce the correct
array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
counting the number of rows and columns read?  Could we use those to help
guide the ndmin=2 case?

I think that using atleast_1d(X) might be a bit overkill, but it would be
very clear as to the code's intent.  I don't think we have to worry about
memory usage if we limit its use to only situations where ndmin is greater
than the number of dimensions of the array.  In those cases, the array is
either an empty result, a scalar value (in which memory access is trivial),
or 1-d (in which a transpose is cheap).

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Paul Anton Letnes

On 4. mai 2011, at 20.33, Benjamin Root wrote:

 On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
 On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
  But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
  this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
  reintroduce the 'transposed' problem?
 
 Yes, good point, one could replace the
 X.shape = (X.size, ) with X = np.atleast_1d(X),
 but for the ndmin=2 case, we'd need to replace
 X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
 not sure which solution is more efficient in terms of memory access etc...
 
 Cheers,
Derek
 
 
 I can confirm that the current behavior is not sufficient for all of the 
 original corner cases that ndmin was supposed to address.  Keep in mind that 
 np.loadtxt takes a one-column data file and a one-row data file down to the 
 same shape.  I don't see how the current code is able to produce the correct 
 array shape when ndmin=2.  Do we have some sort of counter in loadtxt for 
 counting the number of rows and columns read?  Could we use those to help 
 guide the ndmin=2 case?
 
 I think that using atleast_1d(X) might be a bit overkill, but it would be 
 very clear as to the code's intent.  I don't think we have to worry about 
 memory usage if we limit its use to only situations where ndmin is greater 
 than the number of dimensions of the array.  In those cases, the array is 
 either an empty result, a scalar value (in which memory access is trivial), 
 or 1-d (in which a transpose is cheap).

What if one does things the other way around - avoid calling squeeze until 
_after_ doing the atleast_Nd() magic? That way the row/column information 
should be conserved, right? Also, we avoid transposing, memory use, ...

Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
read as a 2D array.

Paul


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-04-04 Thread Bruce Southey
On 03/31/2011 12:02 PM, Derek Homeier wrote:
 On 31 Mar 2011, at 17:03, Bruce Southey wrote:

 This is an invalid ticket because the docstring clearly states that in
 3 different, yet critical places, that missing values are not handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
This function aims to be a fast reader for simply formatted
 files.  The
 `genfromtxt` function provides more sophisticated handling of,
 e.g.,
 lines with missing values.

 Really I am trying to separate the usage of loadtxt and genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was added
 in 2009. So really certain features of loadtxt have been  'kept' for
 backwards compatibility purposes yet these features can be 'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.

 OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
 you could probably say also for historical reasons since I have not
 used genfromtxt much so far.
 Anyway the docstring statement Converters can also be used to
   provide a default value for missing data:
 then appears quite misleading, or an invitation to abuse, if you will.
 This should better be removed from the documentation then, or users
 explicitly discouraged from using converters instead of genfromtxt
 (I don't see how you could completely prevent using converters in
 this way).

 The patch is incorrect because it should not include a space in the
 split() as indicated in the comment by the original reporter. Of
 The split('\r\n') alone caused test_dtype_with_object(self) to fail,
 probably
 because it relies on stripping the blanks. But maybe the test is ill-
 formed?

 course a corrected patch alone still is not sufficient to address the
 problem without the user providing the correct converter. Also you
 start to run into problems with multiple delimiters (such as one space
 versus two spaces) so you start down the path to add all the features
 that duplicate genfromtxt.
 Given that genfromtxt provides that functionality more conveniently,
 I agree again users should be encouraged to use this instead of
 converters.
 But the actual tab-problem causes in fact an issue not related to
 missing
 values at all (well, depending on what you call a missing value).
 I am describing an example on the ticket.

 Cheers,
   Derek

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
Okay I see that 1071 got closed which I am fine with.

I think that your following example should be a test because the two 
spaces should not be removed with a tab delimiter:
np.loadtxt(StringIO(aa\tbb\n \t \ncc\t), delimiter='\t', 
dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))

Thanks very much for fixing this!
Bruce



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-04-04 Thread Charles R Harris
On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey bsout...@gmail.com wrote:

 On 03/31/2011 12:02 PM, Derek Homeier wrote:
  On 31 Mar 2011, at 17:03, Bruce Southey wrote:
 
  This is an invalid ticket because the docstring clearly states that in
  3 different, yet critical places, that missing values are not handled
  here:
 
  Each row in the text file must have the same number of values.
  genfromtxt : Load data with missing values handled as specified.
 This function aims to be a fast reader for simply formatted
  files.  The
  `genfromtxt` function provides more sophisticated handling of,
  e.g.,
  lines with missing values.
 
  Really I am trying to separate the usage of loadtxt and genfromtxt to
  avoid unnecessary duplication and confusion. Part of this is
  historical because loadtxt was added in 2007 and genfromtxt was added
  in 2009. So really certain features of loadtxt have been  'kept' for
  backwards compatibility purposes yet these features can be 'abused' to
  handle missing data. But I really consider that any missing values
  should cause loadtxt to fail.
 
  OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
  you could probably say also for historical reasons since I have not
  used genfromtxt much so far.
  Anyway the docstring statement Converters can also be used to
provide a default value for missing data:
  then appears quite misleading, or an invitation to abuse, if you will.
  This should better be removed from the documentation then, or users
  explicitly discouraged from using converters instead of genfromtxt
  (I don't see how you could completely prevent using converters in
  this way).
 
  The patch is incorrect because it should not include a space in the
  split() as indicated in the comment by the original reporter. Of
  The split('\r\n') alone caused test_dtype_with_object(self) to fail,
  probably
  because it relies on stripping the blanks. But maybe the test is ill-
  formed?
 
  course a corrected patch alone still is not sufficient to address the
  problem without the user providing the correct converter. Also you
  start to run into problems with multiple delimiters (such as one space
  versus two spaces) so you start down the path to add all the features
  that duplicate genfromtxt.
  Given that genfromtxt provides that functionality more conveniently,
  I agree again users should be encouraged to use this instead of
  converters.
  But the actual tab-problem causes in fact an issue not related to
  missing
  values at all (well, depending on what you call a missing value).
  I am describing an example on the ticket.
 
  Cheers,
Derek
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 Okay I see that 1071 got closed which I am fine with.

 I think that your following example should be a test because the two
 spaces should not be removed with a tab delimiter:
 np.loadtxt(StringIO(aa\tbb\n \t \ncc\t), delimiter='\t',
 dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))


Make a test and we'll put it in.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-04-04 Thread Bruce Southey

On 04/04/2011 11:20 AM, Charles R Harris wrote:



On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey bsout...@gmail.com 
mailto:bsout...@gmail.com wrote:


On 03/31/2011 12:02 PM, Derek Homeier wrote:
 On 31 Mar 2011, at 17:03, Bruce Southey wrote:

 This is an invalid ticket because the docstring clearly states
that in
 3 different, yet critical places, that missing values are not
handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
This function aims to be a fast reader for simply formatted
 files.  The
 `genfromtxt` function provides more sophisticated handling of,
 e.g.,
 lines with missing values.

 Really I am trying to separate the usage of loadtxt and
genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was
added
 in 2009. So really certain features of loadtxt have been
 'kept' for
 backwards compatibility purposes yet these features can be
'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.

 OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
 you could probably say also for historical reasons since I have not
 used genfromtxt much so far.
 Anyway the docstring statement Converters can also be used to
   provide a default value for missing data:
 then appears quite misleading, or an invitation to abuse, if you
will.
 This should better be removed from the documentation then, or users
 explicitly discouraged from using converters instead of genfromtxt
 (I don't see how you could completely prevent using converters in
 this way).

 The patch is incorrect because it should not include a space in the
 split() as indicated in the comment by the original reporter. Of
 The split('\r\n') alone caused test_dtype_with_object(self) to fail,
 probably
 because it relies on stripping the blanks. But maybe the test is
ill-
 formed?

 course a corrected patch alone still is not sufficient to
address the
 problem without the user providing the correct converter. Also you
 start to run into problems with multiple delimiters (such as
one space
 versus two spaces) so you start down the path to add all the
features
 that duplicate genfromtxt.
 Given that genfromtxt provides that functionality more conveniently,
 I agree again users should be encouraged to use this instead of
 converters.
 But the actual tab-problem causes in fact an issue not related to
 missing
 values at all (well, depending on what you call a missing value).
 I am describing an example on the ticket.

 Cheers,
   Derek

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
Okay I see that 1071 got closed which I am fine with.

I think that your following example should be a test because the two
spaces should not be removed with a tab delimiter:
np.loadtxt(StringIO(aa\tbb\n \t \ncc\t), delimiter='\t',
dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))


Make a test and we'll put it in.

Chuck



I know!
Trying to write one made me realize that loadtxt is not handling string 
arrays correctly. So I have to check more on this as I think loadtxt is 
giving a 1-d array instead of a 2-d array.


I do agree with you Pierre but this is a nice corner case that Derek 
raised where a space does not necessarily mean a missing value when 
there is a tab delimiter:


data = StringIO(aa\tbb\n \t \ncc\tdd)
dt=np.dtype([('label', 'S2'), ('comment', 'S2')])
test = np.loadtxt(data, delimiter=\t, dtype=dt)
control = np.array([['aa','bb'], [' ', ' '],['cc','dd']], dtype=dt)

So 'test' and 'control' should give the same array.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-04-04 Thread Charles R Harris
On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey bsout...@gmail.com wrote:

  On 04/04/2011 11:20 AM, Charles R Harris wrote:



 On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey bsout...@gmail.com wrote:

 On 03/31/2011 12:02 PM, Derek Homeier wrote:
   On 31 Mar 2011, at 17:03, Bruce Southey wrote:
 
  This is an invalid ticket because the docstring clearly states that in
  3 different, yet critical places, that missing values are not handled
  here:
 
  Each row in the text file must have the same number of values.
  genfromtxt : Load data with missing values handled as specified.
 This function aims to be a fast reader for simply formatted
  files.  The
  `genfromtxt` function provides more sophisticated handling of,
  e.g.,
  lines with missing values.
 
  Really I am trying to separate the usage of loadtxt and genfromtxt to
  avoid unnecessary duplication and confusion. Part of this is
  historical because loadtxt was added in 2007 and genfromtxt was added
  in 2009. So really certain features of loadtxt have been  'kept' for
  backwards compatibility purposes yet these features can be 'abused' to
  handle missing data. But I really consider that any missing values
  should cause loadtxt to fail.
 
  OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
  you could probably say also for historical reasons since I have not
  used genfromtxt much so far.
  Anyway the docstring statement Converters can also be used to
provide a default value for missing data:
  then appears quite misleading, or an invitation to abuse, if you will.
  This should better be removed from the documentation then, or users
  explicitly discouraged from using converters instead of genfromtxt
  (I don't see how you could completely prevent using converters in
  this way).
 
  The patch is incorrect because it should not include a space in the
  split() as indicated in the comment by the original reporter. Of
  The split('\r\n') alone caused test_dtype_with_object(self) to fail,
  probably
  because it relies on stripping the blanks. But maybe the test is ill-
  formed?
 
  course a corrected patch alone still is not sufficient to address the
  problem without the user providing the correct converter. Also you
  start to run into problems with multiple delimiters (such as one space
  versus two spaces) so you start down the path to add all the features
  that duplicate genfromtxt.
  Given that genfromtxt provides that functionality more conveniently,
  I agree again users should be encouraged to use this instead of
  converters.
  But the actual tab-problem causes in fact an issue not related to
  missing
  values at all (well, depending on what you call a missing value).
  I am describing an example on the ticket.
 
  Cheers,
Derek
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
  Okay I see that 1071 got closed which I am fine with.

 I think that your following example should be a test because the two
 spaces should not be removed with a tab delimiter:
 np.loadtxt(StringIO(aa\tbb\n \t \ncc\t), delimiter='\t',
 dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))


 Make a test and we'll put it in.

 Chuck


  I know!
 Trying to write one made me realize that loadtxt is not handling string
 arrays correctly. So I have to check more on this as I think loadtxt is
 giving a 1-d array instead of a 2-d array.


Tests often have that side effect.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Ralf Gommers
On Thu, Mar 31, 2011 at 4:53 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
 paul.anton.let...@gmail.com wrote:

 On 26. mars 2011, at 21.44, Derek Homeier wrote:

  Hi Paul,
 
  having had a look at the other tickets you dug up,
 
  My opinions are my own, and in detail, they are:
  1752:
    I attach a possible patch. FWIW, I agree with the request. The
  patch is written to be compatible with the fix in ticket #1562, but
  I did not test that yet.
 
  Tested, see also my comments on Trac.

 Great!

  1731:
    This seems like a rather trivial feature enhancement. I attach a
  possible patch.
 
  Agreed. Haven't tested it though.

 Great!

  1616:
    The suggested patch seems reasonable to me, but I do not have a
  full list of what objects loadtxt supports today as opposed to what
  this patch will support.

 Looks like you got this one. Just remember to make it compatible with
 #1752. Should be easy.

  1562:
    I attach a possible patch. This could also be the default
  behavior to my mind, since the function caller can simply call
  numpy.squeeze if needed. Changing default behavior would probably
  break old code, however.
 
  See comments on Trac as well.

 Your patch is better, but there is one thing I disagree with.
 808    if X.ndim  ndmin:
 809        if ndmin == 1:
 810            X.shape = (X.size, )
 811        elif ndmin == 2:
 812            X.shape = (X.size, 1)
 The last line should be:
 812            X.shape = (1, X.size)
 If someone wants a 2D array out, they would most likely expect a one-row
 file to come out as a one-row array, not the other way around. IMHO.

  1458:
    The fix suggested in the ticket seems reasonable, but I have
  never used record arrays, so I am not sure  of this.
 
  There were some issues with Python3, and I also had some general
  reservations
  as noted on Trac - basically, it makes 'unpack' equivalent to
  transposing for 2D-arrays,
  but to splitting into fields for 1D-recarrays. My question was, what's
  going to happen
  when you get to 2D-recarrays? Currently this is not an issue since
  loadtxt can only
  read 2D regular or 1D structured arrays. But this might change if the
  data block
  functionality (see below) were to be implemented - data could then be
  returned as
  3D arrays or 2D structured arrays... Still, it would probably make
  most sense (or at
  least give the widest functionality) to have 'unpack=True' always
  return a list or iterator
  over columns.

 OK, I don't know recarrays, as I said.

  1445:
    Adding this functionality could break old code, as some old
  datafiles may have empty lines which are now simply ignored. I do
  not think the feature is a good idea. It could rather be implemented
  as a separate function.
  1107:
    I do not see the need for this enhancement. In my eyes, the
  usecols kwarg does this and more. Perhaps I am misunderstanding
  something here.
 
  Agree about #1445, and the bit about 'usecols' - 'numcols' would just
  provide a
  shorter call to e.g. read the first 20 columns of a file (well, not
  even that much
  over 'usecols=range(20)'...), don't think that justifies an extra
  argument.
  But the 'datablocks' provides something new, that a number of people
  seem
  to miss from e.g. gnuplot (including me, actually ;-). And it would
  also satisfy the
  request from #1445 without breaking backwards compatibility.
  I've been wondering if could instead specify the separator lines
  through the
  parameter, e.g. blocksep=['None', 'blank','invalid'], not sure if
  that would make
  it more useful...

 What about writing a separate function, e.g. loadblocktxt, and have it
 separate the chunks and call loadtxt for each chunk? Just a thought. Another
 possibility would be to write a function that would let you load a set of
 text files in a directory, and return a dict of datasets, one per file. One
 could write a similar save-function, too. They would just need to call
 loadtxt/savetxt on a per-file basis.

  1071:
       It is not clear to me whether loadtxt is supposed to support
  missing values in the fashion indicated in the ticket.
 
  In principle it should at least allow you to, by the use of converters
  as described there.
  The problem is, the default delimiter is described as 'any
  whitespace', which in the
  present implementation obviously includes any number of blanks or
  tabs. These
  are therefore treated differently from delimiters like ',' or ''. I'd
  reckon there are
  too many people actually relying on this behaviour to silently change it
  (e.g. I know plenty of tables with columns separated by either one or
  several
  tabs depending on the length of the previous entry). But the tab is
  apparently also
  treated differently if explicitly specified with delimiter='\t' -
  and in that case using
  a converter à la {2: lambda s: float(s or 'Nan')} is working for
  fields in the middle of
  the line, but not at 

Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Bruce Southey
On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
 paul.anton.let...@gmail.com wrote:

 On 26. mars 2011, at 21.44, Derek Homeier wrote:

  Hi Paul,
 
  having had a look at the other tickets you dug up,
 
[snip]

  1071:
       It is not clear to me whether loadtxt is supposed to support
  missing values in the fashion indicated in the ticket.
 
  In principle it should at least allow you to, by the use of converters
  as described there.
  The problem is, the default delimiter is described as 'any
  whitespace', which in the
  present implementation obviously includes any number of blanks or
  tabs. These
  are therefore treated differently from delimiters like ',' or ''. I'd
  reckon there are
  too many people actually relying on this behaviour to silently change it
  (e.g. I know plenty of tables with columns separated by either one or
  several
  tabs depending on the length of the previous entry). But the tab is
  apparently also
  treated differently if explicitly specified with delimiter='\t' -
  and in that case using
  a converter à la {2: lambda s: float(s or 'Nan')} is working for
  fields in the middle of
  the line, but not at the end - clearly warrants improvement. I've
  prepared a patch
  working for Python3 as well.

 Great!

This is an invalid ticket because the docstring clearly states that in
3 different, yet critical places, that missing values are not handled
here:

Each row in the text file must have the same number of values.
genfromtxt : Load data with missing values handled as specified.
This function aims to be a fast reader for simply formatted files.  The
`genfromtxt` function provides more sophisticated handling of, e.g.,
lines with missing values.

Really I am trying to separate the usage of loadtxt and genfromtxt to
avoid unnecessary duplication and confusion. Part of this is
historical because loadtxt was added in 2007 and genfromtxt was added
in 2009. So really certain features of loadtxt have been  'kept' for
backwards compatibility purposes yet these features can be 'abused' to
handle missing data. But I really consider that any missing values
should cause loadtxt to fail.

The patch is incorrect because it should not include a space in the
split() as indicated in the comment by the original reporter. Of
course a corrected patch alone still is not sufficient to address the
problem without the user providing the correct converter. Also you
start to run into problems with multiple delimiters (such as one space
versus two spaces) so you start down the path to add all the features
that duplicate genfromtxt.


Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Ralf Gommers
On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southey bsout...@gmail.com wrote:
 On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:


 On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
 paul.anton.let...@gmail.com wrote:

 On 26. mars 2011, at 21.44, Derek Homeier wrote:

  Hi Paul,
 
  having had a look at the other tickets you dug up,
 
 [snip]

  1071:
       It is not clear to me whether loadtxt is supposed to support
  missing values in the fashion indicated in the ticket.
 
  In principle it should at least allow you to, by the use of converters
  as described there.
  The problem is, the default delimiter is described as 'any
  whitespace', which in the
  present implementation obviously includes any number of blanks or
  tabs. These
  are therefore treated differently from delimiters like ',' or ''. I'd
  reckon there are
  too many people actually relying on this behaviour to silently change it
  (e.g. I know plenty of tables with columns separated by either one or
  several
  tabs depending on the length of the previous entry). But the tab is
  apparently also
  treated differently if explicitly specified with delimiter='\t' -
  and in that case using
  a converter à la {2: lambda s: float(s or 'Nan')} is working for
  fields in the middle of
  the line, but not at the end - clearly warrants improvement. I've
  prepared a patch
  working for Python3 as well.

 Great!

 This is an invalid ticket because the docstring clearly states that in
 3 different, yet critical places, that missing values are not handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
     This function aims to be a fast reader for simply formatted files.  The
    `genfromtxt` function provides more sophisticated handling of, e.g.,
    lines with missing values.

 Really I am trying to separate the usage of loadtxt and genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was added
 in 2009. So really certain features of loadtxt have been  'kept' for
 backwards compatibility purposes yet these features can be 'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.

I agree with you Bruce, but it would be easier to discuss this on the
tickets instead of here. Could you add your comments there please?

Ralf


 The patch is incorrect because it should not include a space in the
 split() as indicated in the comment by the original reporter. Of
 course a corrected patch alone still is not sufficient to address the
 problem without the user providing the correct converter. Also you
 start to run into problems with multiple delimiters (such as one space
 versus two spaces) so you start down the path to add all the features
 that duplicate genfromtxt.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Bruce Southey
On 03/31/2011 10:08 AM, Ralf Gommers wrote:
 On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southeybsout...@gmail.com  wrote:
 On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris
 charlesr.har...@gmail.com  wrote:

 On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
 paul.anton.let...@gmail.com  wrote:
 On 26. mars 2011, at 21.44, Derek Homeier wrote:

 Hi Paul,

 having had a look at the other tickets you dug up,

 [snip]
 1071:
   It is not clear to me whether loadtxt is supposed to support
 missing values in the fashion indicated in the ticket.
 In principle it should at least allow you to, by the use of converters
 as described there.
 The problem is, the default delimiter is described as 'any
 whitespace', which in the
 present implementation obviously includes any number of blanks or
 tabs. These
 are therefore treated differently from delimiters like ',' or ''. I'd
 reckon there are
 too many people actually relying on this behaviour to silently change it
 (e.g. I know plenty of tables with columns separated by either one or
 several
 tabs depending on the length of the previous entry). But the tab is
 apparently also
 treated differently if explicitly specified with delimiter='\t' -
 and in that case using
 a converter à la {2: lambda s: float(s or 'Nan')} is working for
 fields in the middle of
 the line, but not at the end - clearly warrants improvement. I've
 prepared a patch
 working for Python3 as well.
 Great!

 This is an invalid ticket because the docstring clearly states that in
 3 different, yet critical places, that missing values are not handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
  This function aims to be a fast reader for simply formatted files.  The
 `genfromtxt` function provides more sophisticated handling of, e.g.,
 lines with missing values.

 Really I am trying to separate the usage of loadtxt and genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was added
 in 2009. So really certain features of loadtxt have been  'kept' for
 backwards compatibility purposes yet these features can be 'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.
 I agree with you Bruce, but it would be easier to discuss this on the
 tickets instead of here. Could you add your comments there please?

 Ralf

'Easier' seems a contradiction when you have use captcha...
Sure I will add more comments there.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Derek Homeier
On 31 Mar 2011, at 17:03, Bruce Southey wrote:

 This is an invalid ticket because the docstring clearly states that in
 3 different, yet critical places, that missing values are not handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
This function aims to be a fast reader for simply formatted  
 files.  The
`genfromtxt` function provides more sophisticated handling of,  
 e.g.,
lines with missing values.

 Really I am trying to separate the usage of loadtxt and genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was added
 in 2009. So really certain features of loadtxt have been  'kept' for
 backwards compatibility purposes yet these features can be 'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.

OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
you could probably say also for historical reasons since I have not
used genfromtxt much so far.
Anyway the docstring statement Converters can also be used to
 provide a default value for missing data:
then appears quite misleading, or an invitation to abuse, if you will.
This should better be removed from the documentation then, or users
explicitly discouraged from using converters instead of genfromtxt
(I don't see how you could completely prevent using converters in
this way).

 The patch is incorrect because it should not include a space in the
 split() as indicated in the comment by the original reporter. Of

The split('\r\n') alone caused test_dtype_with_object(self) to fail,  
probably
because it relies on stripping the blanks. But maybe the test is ill- 
formed?

 course a corrected patch alone still is not sufficient to address the
 problem without the user providing the correct converter. Also you
 start to run into problems with multiple delimiters (such as one space
 versus two spaces) so you start down the path to add all the features
 that duplicate genfromtxt.

Given that genfromtxt provides that functionality more conveniently,
I agree again users should be encouraged to use this instead of  
converters.
But the actual tab-problem causes in fact an issue not related to  
missing
values at all (well, depending on what you call a missing value).
I am describing an example on the ticket.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Bruce Southey
On 03/31/2011 12:02 PM, Derek Homeier wrote:
 On 31 Mar 2011, at 17:03, Bruce Southey wrote:

 This is an invalid ticket because the docstring clearly states that in
 3 different, yet critical places, that missing values are not handled
 here:

 Each row in the text file must have the same number of values.
 genfromtxt : Load data with missing values handled as specified.
This function aims to be a fast reader for simply formatted
 files.  The
 `genfromtxt` function provides more sophisticated handling of,
 e.g.,
 lines with missing values.

 Really I am trying to separate the usage of loadtxt and genfromtxt to
 avoid unnecessary duplication and confusion. Part of this is
 historical because loadtxt was added in 2007 and genfromtxt was added
 in 2009. So really certain features of loadtxt have been  'kept' for
 backwards compatibility purposes yet these features can be 'abused' to
 handle missing data. But I really consider that any missing values
 should cause loadtxt to fail.

 OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
 you could probably say also for historical reasons since I have not
 used genfromtxt much so far.
 Anyway the docstring statement Converters can also be used to
   provide a default value for missing data:
 then appears quite misleading, or an invitation to abuse, if you will.
 This should better be removed from the documentation then, or users
 explicitly discouraged from using converters instead of genfromtxt
 (I don't see how you could completely prevent using converters in
 this way).

 The patch is incorrect because it should not include a space in the
 split() as indicated in the comment by the original reporter. Of
 The split('\r\n') alone caused test_dtype_with_object(self) to fail,
 probably
 because it relies on stripping the blanks. But maybe the test is ill-
 formed?

 course a corrected patch alone still is not sufficient to address the
 problem without the user providing the correct converter. Also you
 start to run into problems with multiple delimiters (such as one space
 versus two spaces) so you start down the path to add all the features
 that duplicate genfromtxt.
 Given that genfromtxt provides that functionality more conveniently,
 I agree again users should be encouraged to use this instead of
 converters.
 But the actual tab-problem causes in fact an issue not related to
 missing
 values at all (well, depending on what you call a missing value).
 I am describing an example on the ticket.

 Cheers,
   Derek

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
I am really not disagreeing that much with you. Rather that, as you have 
shown, it is very easy to increase the complexity of examples that 
loadtxt does not handle.  By missing value I mean when no data value is 
stored for the variable in the current observation (via Wikipedia) 
since encoded missing values (such as '.', 'NA' and 'NaN') can be 
recovered.


Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-31 Thread Charles R Harris
On Thu, Mar 31, 2011 at 8:42 AM, Ralf Gommers
ralf.gomm...@googlemail.comwrote:

 On Thu, Mar 31, 2011 at 4:53 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
  paul.anton.let...@gmail.com wrote:
 


snip

If you look in Trac under All Tickets by Milestone you'll find all
 nine tickets together under 1.6.0. Five are bug fixes, four are
 enhancements. There are some missing tests, but all tickets have
 proposed patches.


OK. I changed 1562 to enhancement because it adds a keyword. With that
change the current status looks like this.

Bug Fixes:

1163 -- convert int64 correctly
1458 -- make loadtxt unpack structured arrays
1071 -- loadtxt fails if the last column contains empty value, under
discussion
1565 -- duplicate of 1163

Enhancements:

1107 -- support for blocks of data, adds two keywords.
1562 -- add ndmin keyword to aid in getting correct dimensions, doesn't
apply on top of previous.
1616 -- remove use of readline so input isn't restricted to files.
1731 -- allow loadtxt to read given number of rows, adds keyword.
1752 -- return empty array when empty file encountered, conflicts with 1616.


Some of this might should go into genfromtxt. None of the patches have
tests.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-30 Thread Charles R Harris
On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 26. mars 2011, at 21.44, Derek Homeier wrote:

  Hi Paul,
 
  having had a look at the other tickets you dug up,
 
  My opinions are my own, and in detail, they are:
  1752:
I attach a possible patch. FWIW, I agree with the request. The
  patch is written to be compatible with the fix in ticket #1562, but
  I did not test that yet.
 
  Tested, see also my comments on Trac.

 Great!

  1731:
This seems like a rather trivial feature enhancement. I attach a
  possible patch.
 
  Agreed. Haven't tested it though.

 Great!

  1616:
The suggested patch seems reasonable to me, but I do not have a
  full list of what objects loadtxt supports today as opposed to what
  this patch will support.

 Looks like you got this one. Just remember to make it compatible with
 #1752. Should be easy.

  1562:
I attach a possible patch. This could also be the default
  behavior to my mind, since the function caller can simply call
  numpy.squeeze if needed. Changing default behavior would probably
  break old code, however.
 
  See comments on Trac as well.

 Your patch is better, but there is one thing I disagree with.
 808if X.ndim  ndmin:
 809if ndmin == 1:
 810X.shape = (X.size, )
 811elif ndmin == 2:
 812X.shape = (X.size, 1)
 The last line should be:
 812X.shape = (1, X.size)
 If someone wants a 2D array out, they would most likely expect a one-row
 file to come out as a one-row array, not the other way around. IMHO.

  1458:
The fix suggested in the ticket seems reasonable, but I have
  never used record arrays, so I am not sure  of this.
 
  There were some issues with Python3, and I also had some general
  reservations
  as noted on Trac - basically, it makes 'unpack' equivalent to
  transposing for 2D-arrays,
  but to splitting into fields for 1D-recarrays. My question was, what's
  going to happen
  when you get to 2D-recarrays? Currently this is not an issue since
  loadtxt can only
  read 2D regular or 1D structured arrays. But this might change if the
  data block
  functionality (see below) were to be implemented - data could then be
  returned as
  3D arrays or 2D structured arrays... Still, it would probably make
  most sense (or at
  least give the widest functionality) to have 'unpack=True' always
  return a list or iterator
  over columns.

 OK, I don't know recarrays, as I said.

  1445:
Adding this functionality could break old code, as some old
  datafiles may have empty lines which are now simply ignored. I do
  not think the feature is a good idea. It could rather be implemented
  as a separate function.
  1107:
I do not see the need for this enhancement. In my eyes, the
  usecols kwarg does this and more. Perhaps I am misunderstanding
  something here.
 
  Agree about #1445, and the bit about 'usecols' - 'numcols' would just
  provide a
  shorter call to e.g. read the first 20 columns of a file (well, not
  even that much
  over 'usecols=range(20)'...), don't think that justifies an extra
  argument.
  But the 'datablocks' provides something new, that a number of people
  seem
  to miss from e.g. gnuplot (including me, actually ;-). And it would
  also satisfy the
  request from #1445 without breaking backwards compatibility.
  I've been wondering if could instead specify the separator lines
  through the
  parameter, e.g. blocksep=['None', 'blank','invalid'], not sure if
  that would make
  it more useful...

 What about writing a separate function, e.g. loadblocktxt, and have it
 separate the chunks and call loadtxt for each chunk? Just a thought. Another
 possibility would be to write a function that would let you load a set of
 text files in a directory, and return a dict of datasets, one per file. One
 could write a similar save-function, too. They would just need to call
 loadtxt/savetxt on a per-file basis.

  1071:
   It is not clear to me whether loadtxt is supposed to support
  missing values in the fashion indicated in the ticket.
 
  In principle it should at least allow you to, by the use of converters
  as described there.
  The problem is, the default delimiter is described as 'any
  whitespace', which in the
  present implementation obviously includes any number of blanks or
  tabs. These
  are therefore treated differently from delimiters like ',' or ''. I'd
  reckon there are
  too many people actually relying on this behaviour to silently change it
  (e.g. I know plenty of tables with columns separated by either one or
  several
  tabs depending on the length of the previous entry). But the tab is
  apparently also
  treated differently if explicitly specified with delimiter='\t' -
  and in that case using
  a converter à la {2: lambda s: float(s or 'Nan')} is working for
  fields in the middle of
  the line, but not at the end - clearly warrants improvement. I've
  prepared a patch
  working for Python3 

Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-29 Thread Stéfan van der Walt
On Sun, Mar 27, 2011 at 12:09 PM, Paul Anton Letnes
paul.anton.let...@gmail.com wrote:
 I am sure someone has been using this functionality to convert floats to 
 ints. Changing will break their code. I am not sure how big a deal that would 
 be. Also, I am of the opinion that one should _first_ write a program that 
 works _correctly_, and only afterwards worry about performance.

While I'd agree in most cases, keep in mind that np.loadtxt is
supposed to be a fast but simpler alternative to np.genfromtxt.  If
np.loadtxt becomes much slower, there's not much need to keep these
separate any longer.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-27 Thread Paul Anton Letnes

On 26. mars 2011, at 21.44, Derek Homeier wrote:

 Hi Paul,
 
 having had a look at the other tickets you dug up,
 
 My opinions are my own, and in detail, they are:
 1752:
   I attach a possible patch. FWIW, I agree with the request. The  
 patch is written to be compatible with the fix in ticket #1562, but  
 I did not test that yet.
 
 Tested, see also my comments on Trac.

Great!

 1731:
   This seems like a rather trivial feature enhancement. I attach a  
 possible patch.
 
 Agreed. Haven't tested it though.

Great!

 1616:
   The suggested patch seems reasonable to me, but I do not have a  
 full list of what objects loadtxt supports today as opposed to what  
 this patch will support.

Looks like you got this one. Just remember to make it compatible with #1752. 
Should be easy.

 1562:
   I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code, however.
 
 See comments on Trac as well.

Your patch is better, but there is one thing I disagree with.
808if X.ndim  ndmin:
809if ndmin == 1:
810X.shape = (X.size, )
811elif ndmin == 2:
812X.shape = (X.size, 1) 
The last line should be:
812X.shape = (1, X.size) 
If someone wants a 2D array out, they would most likely expect a one-row file 
to come out as a one-row array, not the other way around. IMHO.

 1458:
   The fix suggested in the ticket seems reasonable, but I have  
 never used record arrays, so I am not sure  of this.
 
 There were some issues with Python3, and I also had some general  
 reservations
 as noted on Trac - basically, it makes 'unpack' equivalent to  
 transposing for 2D-arrays,
 but to splitting into fields for 1D-recarrays. My question was, what's  
 going to happen
 when you get to 2D-recarrays? Currently this is not an issue since  
 loadtxt can only
 read 2D regular or 1D structured arrays. But this might change if the  
 data block
 functionality (see below) were to be implemented - data could then be  
 returned as
 3D arrays or 2D structured arrays... Still, it would probably make  
 most sense (or at
 least give the widest functionality) to have 'unpack=True' always  
 return a list or iterator
 over columns.

OK, I don't know recarrays, as I said.

 1445:
   Adding this functionality could break old code, as some old  
 datafiles may have empty lines which are now simply ignored. I do  
 not think the feature is a good idea. It could rather be implemented  
 as a separate function.
 1107:
   I do not see the need for this enhancement. In my eyes, the  
 usecols kwarg does this and more. Perhaps I am misunderstanding  
 something here.
 
 Agree about #1445, and the bit about 'usecols' - 'numcols' would just  
 provide a
 shorter call to e.g. read the first 20 columns of a file (well, not  
 even that much
 over 'usecols=range(20)'...), don't think that justifies an extra  
 argument.
 But the 'datablocks' provides something new, that a number of people  
 seem
 to miss from e.g. gnuplot (including me, actually ;-). And it would  
 also satisfy the
 request from #1445 without breaking backwards compatibility.
 I've been wondering if could instead specify the separator lines  
 through the
 parameter, e.g. blocksep=['None', 'blank','invalid'], not sure if  
 that would make
 it more useful...

What about writing a separate function, e.g. loadblocktxt, and have it separate 
the chunks and call loadtxt for each chunk? Just a thought. Another possibility 
would be to write a function that would let you load a set of text files in a 
directory, and return a dict of datasets, one per file. One could write a 
similar save-function, too. They would just need to call loadtxt/savetxt on a 
per-file basis.

 1071:
  It is not clear to me whether loadtxt is supposed to support  
 missing values in the fashion indicated in the ticket.
 
 In principle it should at least allow you to, by the use of converters  
 as described there.
 The problem is, the default delimiter is described as 'any  
 whitespace', which in the
 present implementation obviously includes any number of blanks or  
 tabs. These
 are therefore treated differently from delimiters like ',' or ''. I'd  
 reckon there are
 too many people actually relying on this behaviour to silently change it
 (e.g. I know plenty of tables with columns separated by either one or  
 several
 tabs depending on the length of the previous entry). But the tab is  
 apparently also
 treated differently if explicitly specified with delimiter='\t' -  
 and in that case using
 a converter à la {2: lambda s: float(s or 'Nan')} is working for  
 fields in the middle of
 the line, but not at the end - clearly warrants improvement. I've  
 prepared a patch
 working for Python3 as well.

Great!

 1163:
 1565:
   These tickets seem to have the same origin of the problem. I  
 attach one 

Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Paul Anton Letnes
Hi!

I have had a look at the list of numpy.loadtxt tickets. I have never 
contributed to numpy before, so I may be doing stupid things - don't be afraid 
to let me know!

My opinions are my own, and in detail, they are:
1752:
I attach a possible patch. FWIW, I agree with the request. The patch is 
written to be compatible with the fix in ticket #1562, but I did not test that 
yet.
1731:
This seems like a rather trivial feature enhancement. I attach a possible 
patch.
1616:
The suggested patch seems reasonable to me, but I do not have a full list 
of what objects loadtxt supports today as opposed to what this patch will 
support.
1562:
I attach a possible patch. This could also be the default behavior to my 
mind, since the function caller can simply call numpy.squeeze if needed. 
Changing default behavior would probably break old code, however.
1458:
The fix suggested in the ticket seems reasonable, but I have never used 
record arrays, so I am not sure  of this.
1445:
Adding this functionality could break old code, as some old datafiles may 
have empty lines which are now simply ignored. I do not think the feature is a 
good idea. It could rather be implemented as a separate function.
1107:
I do not see the need for this enhancement. In my eyes, the usecols kwarg 
does this and more. Perhaps I am misunderstanding something here.
1071:
It is not clear to me whether loadtxt is supposed to support missing 
values in the fashion indicated in the ticket.
1163:
1565:
These tickets seem to have the same origin of the problem. I attach one 
possible patch. The previously suggested patches that I've seen will not 
correctly convert floats to ints, which I believe my patch will.

I hope you find this useful! Is there some way of submitting the patches for 
review in a more convenient fashion than e-mail?

Cheers,
Paul.



1562.patch
Description: Binary data


1163.patch
Description: Binary data


1731.patch
Description: Binary data


1752.patch
Description: Binary data



On 25. mars 2011, at 16.06, Charles R Harris wrote:

 Hi All,
 
 Could someone with an interest in loadtxt/savetxt look through the associated 
 tickets? A search on the tickets using either of those keys will return 
 fairly lengthy lists.
 
 Chuck
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Pauli Virtanen
Hi,

Thanks!

On Sat, 26 Mar 2011 13:11:46 +0100, Paul Anton Letnes wrote:
[clip]
 I hope you find this useful! Is there some way of submitting the patches
 for review in a more convenient fashion than e-mail?

You can attach them on the trac to each ticket. That way they'll be easy 
to find later on.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Derek Homeier
Hi,

On 26 Mar 2011, at 14:36, Pauli Virtanen wrote:

 On Sat, 26 Mar 2011 13:11:46 +0100, Paul Anton Letnes wrote:
 [clip]
 I hope you find this useful! Is there some way of submitting the  
 patches
 for review in a more convenient fashion than e-mail?

 You can attach them on the trac to each ticket. That way they'll be  
 easy
 to find later on.

I've got some comments on 1562, and I'd attach a revised patch then -  
just
a general question: should I then change Milestone to 1.6.0 and  
Version
to 'devel'?

 1562:
I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code,

Seems the fastest solution unless someone wants to change numpy.squeeze
as well. But the present patch does not call np.squeeze any more at  
all, so I
propose to restore that behaviour for X.ndim  ndmin to remain really  
backwards
compatible. It also seems easier to code when making the default  
ndmin=0.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Derek Homeier
Hi again,

On 26 Mar 2011, at 15:20, Derek Homeier wrote:

 1562:
   I attach a possible patch. This could also be the default
 behavior to my mind, since the function caller can simply call
 numpy.squeeze if needed. Changing default behavior would probably
 break old code,

 Seems the fastest solution unless someone wants to change  
 numpy.squeeze
 as well. But the present patch does not call np.squeeze any more at
 all, so I
 propose to restore that behaviour for X.ndim  ndmin to remain really
 backwards
 compatible. It also seems easier to code when making the default
 ndmin=0.

I've got another somewhat general question: since it would probably be  
nice to
have a test for this, I found one could simply add something along the  
lines of

assert_equal(a.shape, x.shape)

to test_io.py - test_shaped_dtype(self)
or should one generally create a new test for such things (might still  
be better
in this case, since test_shaped_dtype does not really test different  
ndim)?

Cheers,
Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Paul Anton Letnes
Hi Derek!

On 26. mars 2011, at 15.48, Derek Homeier wrote:

 Hi again,
 
 On 26 Mar 2011, at 15:20, Derek Homeier wrote:
 
 1562:
  I attach a possible patch. This could also be the default
 behavior to my mind, since the function caller can simply call
 numpy.squeeze if needed. Changing default behavior would probably
 break old code,
 
 Seems the fastest solution unless someone wants to change  
 numpy.squeeze
 as well. But the present patch does not call np.squeeze any more at
 all, so I
 propose to restore that behaviour for X.ndim  ndmin to remain really
 backwards
 compatible. It also seems easier to code when making the default
 ndmin=0.
 
 I've got another somewhat general question: since it would probably be  
 nice to
 have a test for this, I found one could simply add something along the  
 lines of
 
 assert_equal(a.shape, x.shape)
 
 to test_io.py - test_shaped_dtype(self)
 or should one generally create a new test for such things (might still  
 be better
 in this case, since test_shaped_dtype does not really test different  
 ndim)?
 
 Cheers,
   Derek

It would be nice to see your patch. I uploaded all of mine as mentioned. I'm no 
testing expert, but I am sure someone else will comment on it.

Paul.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Charles R Harris
On Sat, Mar 26, 2011 at 8:53 AM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:

 Hi Derek!

 On 26. mars 2011, at 15.48, Derek Homeier wrote:

  Hi again,
 
  On 26 Mar 2011, at 15:20, Derek Homeier wrote:
 
  1562:
   I attach a possible patch. This could also be the default
  behavior to my mind, since the function caller can simply call
  numpy.squeeze if needed. Changing default behavior would probably
  break old code,
 
  Seems the fastest solution unless someone wants to change
  numpy.squeeze
  as well. But the present patch does not call np.squeeze any more at
  all, so I
  propose to restore that behaviour for X.ndim  ndmin to remain really
  backwards
  compatible. It also seems easier to code when making the default
  ndmin=0.
 
  I've got another somewhat general question: since it would probably be
  nice to
  have a test for this, I found one could simply add something along the
  lines of
 
  assert_equal(a.shape, x.shape)
 
  to test_io.py - test_shaped_dtype(self)
  or should one generally create a new test for such things (might still
  be better
  in this case, since test_shaped_dtype does not really test different
  ndim)?
 
  Cheers,
Derek

 It would be nice to see your patch. I uploaded all of mine as mentioned.
 I'm no testing expert, but I am sure someone else will comment on it.


I put all these patches together at
https://github.com/charris/numpy/tree/loadtxt-savetxt. Please pull from
there to continue work on loadtxt/savetxt so as to avoid conflicts in the
patches. One of the numpy tests is failing, I assume from patch conflicts,
and more tests for the tickets are needed in any case. Also, new keywords
should be added to the end, not put in the middle of existing keywords.

I haven't reviewed the patches, just tried to get them organized. Also, I
have Derek as the author on all of them, that can be changed if it is
decided the credit should go elsewhere ;) Thanks for the work you all have
been doing on these tickets.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-26 Thread Derek Homeier
Hi Paul,

having had a look at the other tickets you dug up,

 My opinions are my own, and in detail, they are:
 1752:
I attach a possible patch. FWIW, I agree with the request. The  
 patch is written to be compatible with the fix in ticket #1562, but  
 I did not test that yet.

Tested, see also my comments on Trac.

 1731:
This seems like a rather trivial feature enhancement. I attach a  
 possible patch.

Agreed. Haven't tested it though.

 1616:
The suggested patch seems reasonable to me, but I do not have a  
 full list of what objects loadtxt supports today as opposed to what  
 this patch will support.

 1562:
I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code, however.

See comments on Trac as well.

 1458:
The fix suggested in the ticket seems reasonable, but I have  
 never used record arrays, so I am not sure  of this.

There were some issues with Python3, and I also had some general  
reservations
as noted on Trac - basically, it makes 'unpack' equivalent to  
transposing for 2D-arrays,
but to splitting into fields for 1D-recarrays. My question was, what's  
going to happen
when you get to 2D-recarrays? Currently this is not an issue since  
loadtxt can only
read 2D regular or 1D structured arrays. But this might change if the  
data block
functionality (see below) were to be implemented - data could then be  
returned as
3D arrays or 2D structured arrays... Still, it would probably make  
most sense (or at
least give the widest functionality) to have 'unpack=True' always  
return a list or iterator
over columns.

 1445:
Adding this functionality could break old code, as some old  
 datafiles may have empty lines which are now simply ignored. I do  
 not think the feature is a good idea. It could rather be implemented  
 as a separate function.
 1107:
I do not see the need for this enhancement. In my eyes, the  
 usecols kwarg does this and more. Perhaps I am misunderstanding  
 something here.

Agree about #1445, and the bit about 'usecols' - 'numcols' would just  
provide a
shorter call to e.g. read the first 20 columns of a file (well, not  
even that much
over 'usecols=range(20)'...), don't think that justifies an extra  
argument.
But the 'datablocks' provides something new, that a number of people  
seem
to miss from e.g. gnuplot (including me, actually ;-). And it would  
also satisfy the
request from #1445 without breaking backwards compatibility.
I've been wondering if could instead specify the separator lines  
through the
parameter, e.g. blocksep=['None', 'blank','invalid'], not sure if  
that would make
it more useful...

 1071:
   It is not clear to me whether loadtxt is supposed to support  
 missing values in the fashion indicated in the ticket.

In principle it should at least allow you to, by the use of converters  
as described there.
The problem is, the default delimiter is described as 'any  
whitespace', which in the
present implementation obviously includes any number of blanks or  
tabs. These
are therefore treated differently from delimiters like ',' or ''. I'd  
reckon there are
too many people actually relying on this behaviour to silently change it
(e.g. I know plenty of tables with columns separated by either one or  
several
tabs depending on the length of the previous entry). But the tab is  
apparently also
treated differently if explicitly specified with delimiter='\t' -  
and in that case using
a converter à la {2: lambda s: float(s or 'Nan')} is working for  
fields in the middle of
the line, but not at the end - clearly warrants improvement. I've  
prepared a patch
working for Python3 as well.

 1163:
 1565:
These tickets seem to have the same origin of the problem. I  
 attach one possible patch. The previously suggested patches that  
 I've seen will not correctly convert floats to ints, which I believe  
 my patch will.

+1, though I am a bit concerned that prompting to raise a ValueError  
for every
element could impede performance. I'd probably still enclose it into an
if issubclass(typ, np.uint64) or issubclass(typ, np.int64):
just like in npio.patch. I also thought one might switch to  
int(float128(x)) in that
case, but at least for the given examples float128 cannot convert with  
more
accuracy than float64 (even on PowerPC ;-).
There were some dissenting opinions that trying to read a float into  
an int should
generally throw an exception though...

And Chuck just beat me...

On 26 Mar 2011, at 21:25, Charles R Harris wrote:

 I put all these patches together at 
 https://github.com/charris/numpy/tree/loadtxt-savetxt 
 . Please pull from there to continue work on loadtxt/savetxt so as  
 to avoid conflicts in the patches. One of the numpy tests is  
 failing, I assume from patch conflicts, and more tests for the  
 tickets are needed in any case. Also, 

[Numpy-discussion] loadtxt/savetxt tickets

2011-03-25 Thread Charles R Harris
Hi All,

Could someone with an interest in loadtxt/savetxt look through the
associated tickets? A search on the tickets using either of those keys will
return fairly lengthy lists.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-19 Thread Zachary Pincus
 Though, really, it's annoying that numpy.loadtxt needs both the
 readline function *and* the iterator protocol. If it just used
 iterators, you could do:

 def truncator(fh, delimiter='END'):
  for line in fh:
if line.strip() == delimiter:
  break
yield line

 numpy.loadtxt(truncator(c))

 Maybe I'll try to work up a patch for this.


http://projects.scipy.org/numpy/ticket/1616

Zach



 That seemed easy... worth applying? Won't break compatibility, because
 the previous loadtxt required both fname.readline and fname.__iter__,
 while this requires only the latter.


 Index: numpy/lib/npyio.py
 ===
 --- numpy/lib/npyio.py(revision 8716)
 +++ numpy/lib/npyio.py(working copy)
 @@ -597,10 +597,11 @@
  fh = bz2.BZ2File(fname)
  else:
  fh = open(fname, 'U')
 -elif hasattr(fname, 'readline'):
 -fh = fname
  else:
 -raise ValueError('fname must be a string or file handle')
 +  try:
 +  fh = iter(fname)
 +  except:
 +  raise ValueError('fname must be a string or file handle')
  X = []

  def flatten_dtype(dt):
 @@ -633,14 +634,18 @@

  # Skip the first `skiprows` lines
  for i in xrange(skiprows):
 -fh.readline()
 +try:
 +fh.next()
 +except StopIteration:
 +raise IOError('End-of-file reached before
 encountering data.')

  # Read until we find a line with some values, and use
  # it to estimate the number of columns, N.
  first_vals = None
  while not first_vals:
 -first_line = fh.readline()
 -if not first_line: # EOF reached
 +try:
 +first_line = fh.next()
 +except StopIteration:
  raise IOError('End-of-file reached before
 encountering data.')
  first_vals = split_line(first_line)
  N = len(usecols or first_vals)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt stop

2010-09-17 Thread Neil Hodgson
Hi,

I been looking around and could spot anything on this.  Quite often I want to 
read a homogeneous block of data from within a file.  The skiprows option is 
great for missing out the section before the data starts, but if there is 
anything below then loadtxt will choke.  I wondered if there was a possibility 
to put an endmarker= ?

For example, if I want to load text from a large! file that looks like this

header line 
header line
1 2.0 3.0
2 4.5 5.7
...
500 4.3 5.4
END
more headers
more headers
1 2.0 3.0 3.14 1.1414
2 4.5 5.7 1.14 3.1459
...
500 4.3 5.4 0.000 0.001
END

Then I can use skiprows=2, but loadtxt will choke when it gets to 'END'.  To 
read t



  ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Neil Hodgson
oops, I meant to save my post but I sent it instead - doh! 

In the end, the question was; is worth adding start= and stop= markers into 
loadtxt to allow grabbing sections of a file between two known headers?  I 
imagine it's something that people come up against regularly.

Thanks,
Neil  





From: Neil Hodgson hodgson.n...@yahoo.co.uk
To: numpy-discussion@scipy.org
Sent: Fri, 17 September, 2010 14:17:12
Subject: loadtxt stop


Hi,

I been looking around and could spot anything on this.  Quite often I want to 
read a homogeneous block of data from within a file.  The skiprows option is 
great for missing out the section before the data starts, but if there is 
anything below then loadtxt will choke.  I wondered if there was a possibility 
to put an endmarker= ?

For example, if I want to load text from a large! file that looks like this

header line 
header line
1 2.0 3.0
2 4.5 5.7
...
500 4.3 5.4
END
more headers
more headers
1 2.0 3.0 3.14 1.1414
2 4.5 5.7 1.14 3.1459
...
500 4.3 5.4 0.000 0.001
END

Then I can use skiprows=2, but loadtxt will choke when it gets to 'END'.  To 
read t


  ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Pierre GM

On Sep 17, 2010, at 2:40 PM, Neil Hodgson wrote:

 oops, I meant to save my post but I sent it instead - doh! 
 
 In the end, the question was; is worth adding start= and stop= markers into 
 loadtxt to allow grabbing sections of a file between two known headers?  I 
 imagine it's something that people come up against regularly.

genfromtxt comes with skip_header and skip_footer that do what you want. 
Earlier this week, I corrected a bug w/ skip_footer on the SVN (now git) 
version of the sources. Please check it out.
Try to be as specific as possible with your input options, that'll make 
genfromtxt more efficient.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Christopher Barker
Neil Hodgson wrote:
 In the end, the question was; is worth adding start= and stop= markers 
 into loadtxt to allow grabbing sections of a file between two known 
 headers?  I imagine it's something that people come up against regularly.

maybe not so regular. However, a common use would be to be able load 
only n rows, which also does not appear to be supported. That would be nice.

-Chris



 Thanks,
 Neil 
 
 
 *From:* Neil Hodgson hodgson.n...@yahoo.co.uk
 *To:* numpy-discussion@scipy.org
 *Sent:* Fri, 17 September, 2010 14:17:12
 *Subject:* loadtxt stop
 
 Hi,
 
 I been looking around and could spot anything on this.  Quite often I 
 want to read a homogeneous block of data from within a file.  The 
 skiprows option is great for missing out the section before the data 
 starts, but if there is anything below then loadtxt will choke.  I 
 wondered if there was a possibility to put an endmarker= ?
 
 For example, if I want to load text from a large! file that looks like this
 
 header line
 header line
 1 2.0 3.0
 2 4.5 5.7
 ...
 500 4.3 5.4
 END
 more headers
 more headers
 1 2.0 3.0 3.14 1.1414
 2 4.5 5.7 1.14 3.1459
 ...
 500 4.3 5.4 0.000 0.001
 END
 
 Then I can use skiprows=2, but loadtxt will choke when it gets to 
 'END'.  To read t
 
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Zachary Pincus
 In the end, the question was; is worth adding start= and stop=  
 markers
 into loadtxt to allow grabbing sections of a file between two known
 headers?  I imagine it's something that people come up against  
 regularly.


Simple enough to wrap your file in a new file-like object that stops  
coughing up lines when the delimiter is found, no?

class TruncatingFile(object):
   def __init__(self, fh, delimiter='END'):
 self.fh = fh
 self.delimiter=delimiter
 self.done = False
   def readline(self):
 if self.done: return ''
 line = self.fh.readline()
 if line.strip() == self.delimiter:
   self.done = True
   return ''
 return line
   def __iter__(self):
 return self
   def next(self):
 line = self.fh.next()
 if line.strip() == self.delimiter:
   self.done = True
   raise StopIteration()
 return line

from StringIO import StringIO
c = StringIO(0 1\n2 3\nEND)
numpy.loadtxt(TruncatingFile(c))

Though, really, it's annoying that numpy.loadtxt needs both the  
readline function *and* the iterator protocol. If it just used  
iterators, you could do:

def truncator(fh, delimiter='END'):
   for line in fh:
 if line.strip() == delimiter:
   break
 yield line

numpy.loadtxt(truncator(c))

Maybe I'll try to work up a patch for this.

Zach



On Sep 17, 2010, at 2:51 PM, Christopher Barker wrote:

 Neil Hodgson wrote:
 In the end, the question was; is worth adding start= and stop=  
 markers
 into loadtxt to allow grabbing sections of a file between two known
 headers?  I imagine it's something that people come up against  
 regularly.

 maybe not so regular. However, a common use would be to be able load
 only n rows, which also does not appear to be supported. That would  
 be nice.

 -Chris



 Thanks,
 Neil

 
 *From:* Neil Hodgson hodgson.n...@yahoo.co.uk
 *To:* numpy-discussion@scipy.org
 *Sent:* Fri, 17 September, 2010 14:17:12
 *Subject:* loadtxt stop

 Hi,

 I been looking around and could spot anything on this.  Quite often I
 want to read a homogeneous block of data from within a file.  The
 skiprows option is great for missing out the section before the data
 starts, but if there is anything below then loadtxt will choke.  I
 wondered if there was a possibility to put an endmarker= ?

 For example, if I want to load text from a large! file that looks  
 like this

 header line
 header line
 1 2.0 3.0
 2 4.5 5.7
 ...
 500 4.3 5.4
 END
 more headers
 more headers
 1 2.0 3.0 3.14 1.1414
 2 4.5 5.7 1.14 3.1459
 ...
 500 4.3 5.4 0.000 0.001
 END

 Then I can use skiprows=2, but loadtxt will choke when it gets to
 'END'.  To read t



 

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 -- 
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Zachary Pincus
 Though, really, it's annoying that numpy.loadtxt needs both the
 readline function *and* the iterator protocol. If it just used
 iterators, you could do:

 def truncator(fh, delimiter='END'):
   for line in fh:
 if line.strip() == delimiter:
   break
 yield line

 numpy.loadtxt(truncator(c))

 Maybe I'll try to work up a patch for this.


That seemed easy... worth applying? Won't break compatibility, because  
the previous loadtxt required both fname.readline and fname.__iter__,  
while this requires only the latter.


Index: numpy/lib/npyio.py
===
--- numpy/lib/npyio.py  (revision 8716)
+++ numpy/lib/npyio.py  (working copy)
@@ -597,10 +597,11 @@
  fh = bz2.BZ2File(fname)
  else:
  fh = open(fname, 'U')
-elif hasattr(fname, 'readline'):
-fh = fname
  else:
-raise ValueError('fname must be a string or file handle')
+  try:
+  fh = iter(fname)
+  except:
+  raise ValueError('fname must be a string or file handle')
  X = []

  def flatten_dtype(dt):
@@ -633,14 +634,18 @@

  # Skip the first `skiprows` lines
  for i in xrange(skiprows):
-fh.readline()
+try:
+fh.next()
+except StopIteration:
+raise IOError('End-of-file reached before  
encountering data.')

  # Read until we find a line with some values, and use
  # it to estimate the number of columns, N.
  first_vals = None
  while not first_vals:
-first_line = fh.readline()
-if not first_line: # EOF reached
+try:
+first_line = fh.next()
+except StopIteration:
  raise IOError('End-of-file reached before  
encountering data.')
  first_vals = split_line(first_line)
  N = len(usecols or first_vals)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Benjamin Root
On Fri, Sep 17, 2010 at 2:50 PM, Zachary Pincus zachary.pin...@yale.eduwrote:

  Though, really, it's annoying that numpy.loadtxt needs both the
  readline function *and* the iterator protocol. If it just used
  iterators, you could do:
 
  def truncator(fh, delimiter='END'):
for line in fh:
  if line.strip() == delimiter:
break
  yield line
 
  numpy.loadtxt(truncator(c))
 
  Maybe I'll try to work up a patch for this.


 That seemed easy... worth applying? Won't break compatibility, because
 the previous loadtxt required both fname.readline and fname.__iter__,
 while this requires only the latter.


 Index: numpy/lib/npyio.py
 ===
 --- numpy/lib/npyio.py  (revision 8716)
 +++ numpy/lib/npyio.py  (working copy)
 @@ -597,10 +597,11 @@
  fh = bz2.BZ2File(fname)
  else:
  fh = open(fname, 'U')
 -elif hasattr(fname, 'readline'):
 -fh = fname
  else:
 -raise ValueError('fname must be a string or file handle')
 +  try:
 +  fh = iter(fname)
 +  except:
 +  raise ValueError('fname must be a string or file handle')
  X = []

  def flatten_dtype(dt):
 @@ -633,14 +634,18 @@

  # Skip the first `skiprows` lines
  for i in xrange(skiprows):
 -fh.readline()
 +try:
 +fh.next()
 +except StopIteration:
 +raise IOError('End-of-file reached before
 encountering data.')

  # Read until we find a line with some values, and use
  # it to estimate the number of columns, N.
  first_vals = None
  while not first_vals:
 -first_line = fh.readline()
 -if not first_line: # EOF reached
 +try:
 +first_line = fh.next()
 +except StopIteration:
  raise IOError('End-of-file reached before
 encountering data.')
  first_vals = split_line(first_line)
  N = len(usecols or first_vals)


So, this code will still raise an error for an empty file.  Personally, I
consider that a bug because I would expect to receive an empty array.  I
could understand raising an error for a non-empty file that does not contain
anything useful.  For comparison, Matlab returns an empty matrix for loading
an emtpy text file.

This has been a long-standing annoyance for me, along with the behavior with
a single-line data file.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Zachary Pincus
On Sep 17, 2010, at 3:59 PM, Benjamin Root wrote:

 So, this code will still raise an error for an empty file.   
 Personally, I consider that a bug because I would expect to receive  
 an empty array.  I could understand raising an error for a non-empty  
 file that does not contain anything useful.  For comparison, Matlab  
 returns an empty matrix for loading an emtpy text file.

 This has been a long-standing annoyance for me, along with the  
 behavior with a single-line data file.

Agreed... I just wanted to make the patch as identical in behavior to  
the old version as possible. Though again, simple shims around loadtxt  
(as in my previous examples) can yield the desired behavior easily  
enough.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt stop

2010-09-17 Thread Benjamin Root
On Fri, Sep 17, 2010 at 3:04 PM, Zachary Pincus zachary.pin...@yale.eduwrote:

 On Sep 17, 2010, at 3:59 PM, Benjamin Root wrote:

  So, this code will still raise an error for an empty file.
  Personally, I consider that a bug because I would expect to receive
  an empty array.  I could understand raising an error for a non-empty
  file that does not contain anything useful.  For comparison, Matlab
  returns an empty matrix for loading an emtpy text file.
 
  This has been a long-standing annoyance for me, along with the
  behavior with a single-line data file.

 Agreed... I just wanted to make the patch as identical in behavior to
 the old version as possible. Though again, simple shims around loadtxt
 (as in my previous examples) can yield the desired behavior easily
 enough.


Fair enough.  No need to mix a bugfix with a feature request.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-07-27 Thread Benjamin Root
On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser 
 warren.weckes...@enthought.com wrote:

 Benjamin Root wrote:
  Hi,
 
  I was having the hardest time trying to figure out an intermittent bug
  in one of my programs.  Essentially, in some situations, it was
  throwing an error saying that the array object was not an array.  It
  took me a while, but then I figured out that my program was assuming
  that the object returned from a loadtxt() call was always a structured
  array (I was using dtypes).  However, if the data file being loaded
  only had one data record, then all you get back is a structured record.
 
  import numpy as np
  from StringIO import StringIO
 
  strData = StringIO(89.23 47.2\n13.2 42.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print Length Two
  print a
  print a.shape
  print len(a)
 
  strData = StringIO(53.2 49.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print \n\nLength One
  print a
  print a.shape
  try :
  print len(a)
  except TypeError as err
  print ERROR:, err
 
  Which gets me this output:
 
  Length Two
  [(89.234, 47.203)
   (13.199, 42.203)]
  (2,)
  2
 
 
  Length One
  (53.203, 49.203)
  ()
  ERROR: len() of unsized object
 
 
  Note that this isn't restricted to structured arrays.  For regular
  ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

 Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X

 
   a = np.ones((1, 1, 1))
   np.squeeze(a)[0]
  IndexError: 0-d arrays can't be indexed
 
   strData = StringIO(53.2)
   a = np.loadtxt(strData)
   a[0]
  IndexError: 0-d arrays can't be indexed
 
  So, if you have multiple lines with multiple columns, you get a 2-D
  array, as expected.
  if you have a single line of data with multiple columns, you get a 1-D
  array.
  If you have a single column with many lines, you also get a 1-D array
  (which is probably expected, I guess).
  If you have a single column with a single line, you get a scalar
  (actually, a 0-D array).
 
  Is this a bug or a feature?  I can see the advantages of having
  loadtxt() returning the lowest # of dimensions that can hold the given
  data, but it leaves the code vulnerable to certain edge cases.  Maybe
  there is a different way I should be doing this, but I feel that this
  behavior at the very least should be included in the loadtxt
  documentation.
 

 It would be useful to be able to tell loadtxt to not call squeeze, so a
 program that reads column-formatted data doesn't have to treat the case
 of a single line specially.

 Warren


 I don't know if that is the best way to solve the problem.  In that case,
 you would always get a 2-D array, right?  Is that useful for those who have
 text data as a single column?  Maybe a mindim keyword (with None as default)
 and apply an appropriate atleast_Nd() call (or maybe have available an
 .atleast_nd() function?).  But, then what would this mean for structured
 arrays?  One might think that they want at least 2-D, but they really want
 at least 1-D.

 Ben Root

 P.S. - Taking this a step further, the functions completely fail in dealing
 with empty files...  In MATLAB, it returns an empty array (matrix?).


I am reviving this dead thread to note that I have filed ticket #1562 on
the numpy Trac about this issue: http://projects.scipy.org/numpy/ticket/1562

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Benjamin Root
Hi,

I was having the hardest time trying to figure out an intermittent bug in
one of my programs.  Essentially, in some situations, it was throwing an
error saying that the array object was not an array.  It took me a while,
but then I figured out that my program was assuming that the object returned
from a loadtxt() call was always a structured array (I was using dtypes).
However, if the data file being loaded only had one data record, then all
you get back is a structured record.

import numpy as np
from StringIO import StringIO

strData = StringIO(89.23 47.2\n13.2 42.2)
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print Length Two
print a
print a.shape
print len(a)

strData = StringIO(53.2 49.2)
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print \n\nLength One
print a
print a.shape
try :
print len(a)
except TypeError as err
print ERROR:, err

Which gets me this output:

Length Two
[(89.234, 47.203)
 (13.199, 42.203)]
(2,)
2


Length One
(53.203, 49.203)
()
ERROR: len() of unsized object


Note that this isn't restricted to structured arrays.  For regular ndarrays,
loadtxt() appears to mimic the behavior of np.squeeze():

 a = np.ones((1, 1, 1))
 np.squeeze(a)[0]
IndexError: 0-d arrays can't be indexed

 strData = StringIO(53.2)
 a = np.loadtxt(strData)
 a[0]
IndexError: 0-d arrays can't be indexed

So, if you have multiple lines with multiple columns, you get a 2-D array,
as expected.
if you have a single line of data with multiple columns, you get a 1-D
array.
If you have a single column with many lines, you also get a 1-D array (which
is probably expected, I guess).
If you have a single column with a single line, you get a scalar (actually,
a 0-D array).

Is this a bug or a feature?  I can see the advantages of having loadtxt()
returning the lowest # of dimensions that can hold the given data, but it
leaves the code vulnerable to certain edge cases.  Maybe there is a
different way I should be doing this, but I feel that this behavior at the
very least should be included in the loadtxt documentation.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Warren Weckesser
Benjamin Root wrote:
 Hi,

 I was having the hardest time trying to figure out an intermittent bug 
 in one of my programs.  Essentially, in some situations, it was 
 throwing an error saying that the array object was not an array.  It 
 took me a while, but then I figured out that my program was assuming 
 that the object returned from a loadtxt() call was always a structured 
 array (I was using dtypes).  However, if the data file being loaded 
 only had one data record, then all you get back is a structured record.

 import numpy as np
 from StringIO import StringIO

 strData = StringIO(89.23 47.2\n13.2 42.2)
 a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
 print Length Two
 print a
 print a.shape
 print len(a)

 strData = StringIO(53.2 49.2)
 a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
 print \n\nLength One
 print a
 print a.shape
 try :
 print len(a)
 except TypeError as err
 print ERROR:, err

 Which gets me this output:

 Length Two
 [(89.234, 47.203)
  (13.199, 42.203)]
 (2,)
 2


 Length One
 (53.203, 49.203)
 ()
 ERROR: len() of unsized object


 Note that this isn't restricted to structured arrays.  For regular 
 ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X


  a = np.ones((1, 1, 1))
  np.squeeze(a)[0]
 IndexError: 0-d arrays can't be indexed

  strData = StringIO(53.2)
  a = np.loadtxt(strData)
  a[0]
 IndexError: 0-d arrays can't be indexed

 So, if you have multiple lines with multiple columns, you get a 2-D 
 array, as expected.
 if you have a single line of data with multiple columns, you get a 1-D 
 array.
 If you have a single column with many lines, you also get a 1-D array 
 (which is probably expected, I guess).
 If you have a single column with a single line, you get a scalar 
 (actually, a 0-D array).

 Is this a bug or a feature?  I can see the advantages of having 
 loadtxt() returning the lowest # of dimensions that can hold the given 
 data, but it leaves the code vulnerable to certain edge cases.  Maybe 
 there is a different way I should be doing this, but I feel that this 
 behavior at the very least should be included in the loadtxt 
 documentation.


It would be useful to be able to tell loadtxt to not call squeeze, so a 
program that reads column-formatted data doesn't have to treat the case 
of a single line specially.

Warren


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Christopher Barker
Warren Weckesser wrote:
 Benjamin Root wrote:
 Note that this isn't restricted to structured arrays.  For regular 
 ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
 
 Exactly.  The last four lines of the function are:
 
 X = np.squeeze(X)
 if unpack:
 return X.T
 else:
 return X

 It would be useful to be able to tell loadtxt to not call squeeze, so a 
 program that reads column-formatted data doesn't have to treat the case 
 of a single line specially.

I agree -- it seem to me that every time I load data, I know what shape 
I expect the result to be -- I'd never want it to squeeze. It might be 
nice if you could specify the dimensionality of the array you want.


But for now: can you just do a reshape?

In [42]: strData = StringIO(53.2 49.2)

In[43]:a=p.loadtxt(strData,dtype=[('x',float),('y',float)]).reshape((-1,))

In [45]: a.shape
Out[45]: (1,)



-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Benjamin Root
On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser 
warren.weckes...@enthought.com wrote:

 Benjamin Root wrote:
  Hi,
 
  I was having the hardest time trying to figure out an intermittent bug
  in one of my programs.  Essentially, in some situations, it was
  throwing an error saying that the array object was not an array.  It
  took me a while, but then I figured out that my program was assuming
  that the object returned from a loadtxt() call was always a structured
  array (I was using dtypes).  However, if the data file being loaded
  only had one data record, then all you get back is a structured record.
 
  import numpy as np
  from StringIO import StringIO
 
  strData = StringIO(89.23 47.2\n13.2 42.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print Length Two
  print a
  print a.shape
  print len(a)
 
  strData = StringIO(53.2 49.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print \n\nLength One
  print a
  print a.shape
  try :
  print len(a)
  except TypeError as err
  print ERROR:, err
 
  Which gets me this output:
 
  Length Two
  [(89.234, 47.203)
   (13.199, 42.203)]
  (2,)
  2
 
 
  Length One
  (53.203, 49.203)
  ()
  ERROR: len() of unsized object
 
 
  Note that this isn't restricted to structured arrays.  For regular
  ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

 Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X

 
   a = np.ones((1, 1, 1))
   np.squeeze(a)[0]
  IndexError: 0-d arrays can't be indexed
 
   strData = StringIO(53.2)
   a = np.loadtxt(strData)
   a[0]
  IndexError: 0-d arrays can't be indexed
 
  So, if you have multiple lines with multiple columns, you get a 2-D
  array, as expected.
  if you have a single line of data with multiple columns, you get a 1-D
  array.
  If you have a single column with many lines, you also get a 1-D array
  (which is probably expected, I guess).
  If you have a single column with a single line, you get a scalar
  (actually, a 0-D array).
 
  Is this a bug or a feature?  I can see the advantages of having
  loadtxt() returning the lowest # of dimensions that can hold the given
  data, but it leaves the code vulnerable to certain edge cases.  Maybe
  there is a different way I should be doing this, but I feel that this
  behavior at the very least should be included in the loadtxt
  documentation.
 

 It would be useful to be able to tell loadtxt to not call squeeze, so a
 program that reads column-formatted data doesn't have to treat the case
 of a single line specially.

 Warren


I don't know if that is the best way to solve the problem.  In that case,
you would always get a 2-D array, right?  Is that useful for those who have
text data as a single column?  Maybe a mindim keyword (with None as default)
and apply an appropriate atleast_Nd() call (or maybe have available an
.atleast_nd() function?).  But, then what would this mean for structured
arrays?  One might think that they want at least 2-D, but they really want
at least 1-D.

Ben Root

P.S. - Taking this a step further, the functions completely fail in dealing
with empty files...  In MATLAB, it returns an empty array (matrix?).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt raises an exception on empty file

2010-05-24 Thread Maria Liukis
Hello everybody,

I'm using numpy V1.3.0 and ran into a case when numpy.loadtxt('foo.txt') raised 
an exception:

import numpy as np
np.loadtxt('foo.txt')
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py,
 line 456, in loadtxt
raise IOError('End-of-file reached before encountering data.')
IOError: End-of-file reached before encountering data.


if provided file 'foo.txt' is empty.

Would anybody happen to know if it's a feature or a bug? I would expect it to 
return an empty array. 

numpy.fromfile() handles empty text files:

 np.fromfile('foo.txt', sep='\t\n ')
array([], dtype=float64)


Would anybody suggest a graceful way of handling empty files with 
numpy.loadtxt() (except for catching an IOError exception)?

Many thanks,
Masha

liu...@usc.edu



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt raises an exception on empty file

2010-05-24 Thread Vincent Davis
On Mon, May 24, 2010 at 4:14 PM, Maria Liukis liu...@usc.edu wrote:

 Hello everybody,

 I'm using numpy V1.3.0 and ran into a case when numpy.loadtxt('foo.txt')
 raised an exception:

 import numpy as np
 np.loadtxt('foo.txt')
 Traceback (most recent call last):
  File stdin, line 1, in module
  File
 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py,
 line 456, in loadtxt
raise IOError('End-of-file reached before encountering data.')
 IOError: End-of-file reached before encountering data.
 

 if provided file 'foo.txt' is empty.

 Would anybody happen to know if it's a feature or a bug? I would expect it
 to return an empty array.


Looking at the source for loadtxt

line 591
# Read until we find a line with some values, and use
# it to estimate the number of columns, N.
first_vals = None
while not first_vals:
first_line = fh.readline()
if first_line == '': # EOF reached
raise IOError('End-of-file reached before encountering
data.')

So it looks like it is not a bug although I am not sure why returning an
empty array would not be valid. But then what are you going to do with the
empty array?

Vincent




 numpy.fromfile() handles empty text files:

  np.fromfile('foo.txt', sep='\t\n ')
 array([], dtype=float64)
 

 Would anybody suggest a graceful way of handling empty files with
 numpy.loadtxt() (except for catching an IOError exception)?

 Many thanks,
 Masha
 
 liu...@usc.edu



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog http://vincentdavis.net |
LinkedInhttp://www.linkedin.com/in/vincentdavis
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt raises an exception on empty file

2010-05-24 Thread Nadav Horesh
You can just catch the exception and decide what to do with it:

try:
   data = np.loadtxt('foo.txt')
except IOError:
   data = 0  # Or something similar

  Nadav

-Original Message-
From: numpy-discussion-boun...@scipy.org on behalf of Maria Liukis
Sent: Tue 25-May-10 01:14
To: numpy-discussion@scipy.org
Subject: [Numpy-discussion] loadtxt raises an exception on empty file
 
Hello everybody,

I'm using numpy V1.3.0 and ran into a case when numpy.loadtxt('foo.txt') raised 
an exception:

import numpy as np
np.loadtxt('foo.txt')
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py,
 line 456, in loadtxt
raise IOError('End-of-file reached before encountering data.')
IOError: End-of-file reached before encountering data.


if provided file 'foo.txt' is empty.

Would anybody happen to know if it's a feature or a bug? I would expect it to 
return an empty array. 

numpy.fromfile() handles empty text files:

 np.fromfile('foo.txt', sep='\t\n ')
array([], dtype=float64)


Would anybody suggest a graceful way of handling empty files with 
numpy.loadtxt() (except for catching an IOError exception)?

Many thanks,
Masha

liu...@usc.edu



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

winmail.dat___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt and genfromtxt

2010-02-10 Thread Jonathan Stickel
I am new to python/numpy/scipy and new to this list.  I recently
migrated over from using Octave and am very impressed so far!

Recently I needed to load data from a text file and quickly found
numpy's loadtxt function.  However, there were missing data values,
which loadtxt does not handle.  After some amount of googling, I did
find genfromtxt which does exactly what I need.  It would have been
helpful if genfromtxt was included in the See Also portion of the
docstring for loadtxt.  Perhaps this is a simple oversight?  I see that
genfromtxt does mention loadtxt in its docstring.

Let me know if I should submit a bug somewhere, or if it is sufficient
to mention this small item on the list.

Thanks,
Jonathan

P.S.
My first send did not seem to go through.  Trying again; sorry if this 
is posted twice...

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt and genfromtxt

2010-02-10 Thread Ralf Gommers
On Thu, Feb 11, 2010 at 1:36 AM, Jonathan Stickel jjstic...@vcn.com wrote:

 I am new to python/numpy/scipy and new to this list.  I recently
 migrated over from using Octave and am very impressed so far!

 Recently I needed to load data from a text file and quickly found
 numpy's loadtxt function.  However, there were missing data values,
 which loadtxt does not handle.  After some amount of googling, I did
 find genfromtxt which does exactly what I need.  It would have been
 helpful if genfromtxt was included in the See Also portion of the
 docstring for loadtxt.  Perhaps this is a simple oversight?  I see that
 genfromtxt does mention loadtxt in its docstring.


Thanks, fixed: http://docs.scipy.org/numpy/docs/numpy.lib.io.loadtxt/


 Let me know if I should submit a bug somewhere, or if it is sufficient
 to mention this small item on the list.


If you find more such things, please consider creating an account in the doc
wiki I linked above and contributing directly. After account creation you'd
need to ask for edit rights on this list.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt example problem ?

2009-05-04 Thread bruno Piguet
Hello,

  I'm new to numpy, and considering using loadtxt() to read a data file.

  As a starter, I tried the example of the doc page (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) :


 from StringIO import StringIO   # StringIO behaves like a file object
 c = StringIO(0 1\n2 3)
 np.loadtxt(c)
I didn't get the expectd answer, but :
Traceback (moste recent call last):
  File(stdin), line 1, in module
  File C:\Python25\lib\sire-packages\numpy\core\numeric.py, line
725, in loadtxt
X = array(X, dtype)
ValueError: setting an array element with a sequence.
I'm using verison 1.0.4 of numpy).

I got the same problem on a Ms-Windows and a Linux Machine.

I could run the example by adding a \n at the end of c :
c = StringIO(0 1\n2 3\n)

Is it the normal and expected behaviour ?

Bruno.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt example problem ?

2009-05-04 Thread Ryan May
On Mon, May 4, 2009 at 3:06 PM, bruno Piguet bruno.pig...@gmail.com wrote:

 Hello,

   I'm new to numpy, and considering using loadtxt() to read a data file.

   As a starter, I tried the example of the doc page (
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) :


  from StringIO import StringIO   # StringIO behaves like a file object
  c = StringIO(0 1\n2 3)
  np.loadtxt(c)
 I didn't get the expectd answer, but :

 Traceback (moste recent call last):
   File(stdin), line 1, in module
   File C:\Python25\lib\sire-packages\numpy\core\numeric.py, line 725, in 
 loadtxt
 X = array(X, dtype)
 ValueError: setting an array element with a sequence.


 I'm using verison 1.0.4 of numpy).

 I got the same problem on a Ms-Windows and a Linux Machine.

 I could run the example by adding a \n at the end of c :
 c = StringIO(0 1\n2 3\n)


 Is it the normal and expected behaviour ?

 Bruno.


It's a bug that's been fixed.  Numpy 1.0.4 is quite a bit out of date, so
I'd recommend updating to the latest (1.3).

Ryan


-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-03-04 Thread Sturla Molden
On 2/11/2009 6:40 AM, A B wrote:
 Hi,
 
 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:
 
 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

Does this work for you?

dt = {'names': ('gender','age','weight','bal'),
   'formats': ('i4','S4','f4', 'f4')}
with open('filename.txt', 'rt') as file:
linelst = [line.strip('\n').split('|') for line in file]
n = len(linelst)
data = numpy.zeros(n, dtype=numpy.dtype(dt))
for i,(gender, age, weight, bal) in zip(range(n),linelst):
data[i] = (int(gender), age, float(weight), float(bal))


S.M.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-03-04 Thread Sturla Molden
On 3/4/2009 12:57 PM, Sturla Molden wrote:

 Does this work for you?

Never mind, it seems my e-mail got messed up. I ought to keep them 
sorted by date...

S.M.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt slow

2009-03-02 Thread Michael S. Gilbert
On Sun, 1 Mar 2009 14:29:54 -0500, Michael Gilbert wrote:
 i will send the current version to the list tomorrow when i have access
 to the system that it is on.

attached is my current version of loadtxt.  like i said, it's slower
for small data sets (because it reads through the whole data file
twice).  the first loop is used to figure out how much memory to
allocate, and i can optimize this by intelligently seeking through the
file.  but like i said, i haven't had the time to implement it.

all of the options should work, except for converters (i have never
used converters and i couldn't figure out exactly what it does based
on a quick read-through of the docs).

best wishes,
mike


myloadtxt
Description: Binary data
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt slow

2009-03-01 Thread Gideon Simpson
So I have some data sets of about 16 floating point numbers stored  
in text files.  I find that loadtxt is rather slow.  Is this to be  
expected?  Would it be faster if it were loading binary data?

-gideon

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt slow

2009-03-01 Thread Michael Gilbert
On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:

 So I have some data sets of about 16 floating point numbers stored  
 in text files.  I find that loadtxt is rather slow.  Is this to be  
 expected?  Would it be faster if it were loading binary data?

i have run into this as well.  loadtxt uses a python list to allocate
memory for the data it reads in, so once you get to about 1/4th of your
available memory, it will start allocating the updated list (every
time it reads a new value from your data file) in swap instead of main
memory, which is rediculously slow (in fact it causes my system to be
quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be
smarter about allocating memory, but it is slower overall and doesn't
support all of the original arguments/options (yet).  i have some
ideas to make it smarter/more efficient, but have not had the time
to work on it recently.

i will send the current version to the list tomorrow when i have access
to the system that it is on.

best wishes,
mike
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt slow

2009-03-01 Thread Michael Gilbert
On Sun, 1 Mar 2009 14:29:54 -0500 Michael Gilbert wrote:
 i have rewritten loadtxt to be smarter about allocating memory, but 
 it is slower overall and doesn't support all of the original 
 arguments/options (yet).

i had meant to say that my version is slower for smaller data sets (when
you aren't close to your main memory limit), but it is orders of
magnitude faster for large data sets.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt slow

2009-03-01 Thread Brent Pedersen
On Sun, Mar 1, 2009 at 11:29 AM, Michael Gilbert
michael.s.gilb...@gmail.com wrote:
 On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:

 So I have some data sets of about 16 floating point numbers stored
 in text files.  I find that loadtxt is rather slow.  Is this to be
 expected?  Would it be faster if it were loading binary data?

 i have run into this as well.  loadtxt uses a python list to allocate
 memory for the data it reads in, so once you get to about 1/4th of your
 available memory, it will start allocating the updated list (every
 time it reads a new value from your data file) in swap instead of main
 memory, which is rediculously slow (in fact it causes my system to be
 quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be
 smarter about allocating memory, but it is slower overall and doesn't
 support all of the original arguments/options (yet).  i have some
 ideas to make it smarter/more efficient, but have not had the time
 to work on it recently.

 i will send the current version to the list tomorrow when i have access
 to the system that it is on.

 best wishes,
 mike
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


to address the slowness, i use wrappers around savetxt/loadtxt that
save/load a .npy file
along with/instead of the .txt file. -- and the loadtxt wrapper checks
if the .npy is up-to-date.
code here:

http://rafb.net/p/dGBJjg80.html

of course it's still slow the first time. i look forward to your speedups.
-brentp
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt slow

2009-03-01 Thread Eric Firing
Gideon Simpson wrote:
 So I have some data sets of about 16 floating point numbers stored  
 in text files.  I find that loadtxt is rather slow.  Is this to be  
 expected?  Would it be faster if it were loading binary data?

Depending on the format you may be able to use numpy.fromfile, which I 
suspect would be much faster.  It only handles very simple ascii 
formats, though.

Eric
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread A B
On Tue, Feb 10, 2009 at 9:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 works for me but not sure i understand the problem, did you try
 setting the delimiter?


 import numpy as np
 from cStringIO import StringIO

 txt = StringIO(\
 12|h|34.5|44.5
 14552|bbb|34.5|42.5)

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}
 a = np.loadtxt(txt, dtype=dt, delimiter=|)
 print a.dtype

I had tried both with and without the delimiter. In any event, it just
worked for me as well. Not sure what I was missing before. Anyway,
thank you.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread A B
On Wed, Feb 11, 2009 at 6:27 PM, A B python6...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 works for me but not sure i understand the problem, did you try
 setting the delimiter?


 import numpy as np
 from cStringIO import StringIO

 txt = StringIO(\
 12|h|34.5|44.5
 14552|bbb|34.5|42.5)

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}
 a = np.loadtxt(txt, dtype=dt, delimiter=|)
 print a.dtype

 I had tried both with and without the delimiter. In any event, it just
 worked for me as well. Not sure what I was missing before. Anyway,
 thank you.


Actually, I was using two different machines and it appears that the
version of numpy available on Ubuntu is seriously out of date (1.0.4).
Wonder why ...
Version 1.2.1 on a RedHat box worked fine.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-11 Thread Scott Sinclair
 2009/2/12 A B python6...@gmail.com:
 Actually, I was using two different machines and it appears that the
 version of numpy available on Ubuntu is seriously out of date (1.0.4).
 Wonder why ...

See the recent post here

http://projects.scipy.org/pipermail/numpy-discussion/2009-February/040252.html

Cheers,
Scott
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt issues

2009-02-10 Thread A B
Hi,

How do I write a loadtxt command to read in the following file and
store each data point as the appropriate data type:

12|h|34.5|44.5
14552|bbb|34.5|42.5

Do the strings have to be read in separately from the numbers?

Why would anyone use 'S10' instead of 'string'?

dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
'S4','f4', 'f4')}

a = loadtxt(sample_data.txt, dtype=dt)

gives

ValueError: need more than 1 value to unpack

I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
'string' instead of S4 and all my data is read into strings.

Seems like all the examples on-line use either numeric or textual
input, but not both.

Thanks.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-10 Thread Brent Pedersen
On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


works for me but not sure i understand the problem, did you try
setting the delimiter?


import numpy as np
from cStringIO import StringIO

txt = StringIO(\
12|h|34.5|44.5
14552|bbb|34.5|42.5)

dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
'S4','f4', 'f4')}
a = np.loadtxt(txt, dtype=dt, delimiter=|)
print a.dtype
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Loadtxt .bz2 support

2008-10-22 Thread Ryan May
Charles R Harris wrote:
 On Tue, Oct 21, 2008 at 1:30 PM, Ryan May [EMAIL PROTECTED] wrote:
 
 Hi,

 I noticed numpy.loadtxt has support for gzipped text files, but not for
 bz2'd files.  Here's a 3 line patch to add bzip2 support to loadtxt.

 Ryan

 --
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma

 Index: numpy/lib/io.py
 ===
 --- numpy/lib/io.py (revision 5953)
 +++ numpy/lib/io.py (working copy)
 @@ -320,6 +320,9 @@
 if fname.endswith('.gz'):
 import gzip
 fh = gzip.open(fname)
 +elif fname.endswith('.bz2'):
 +import bz2
 +fh = bz2.BZ2File(fname)
 else:
 fh = file(fname)
 elif hasattr(fname, 'seek'):

 
 Could you open a ticket for this? Mark it as an enhancement.
 

Done. #940

http://scipy.org/scipy/numpy/ticket/940

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Loadtxt .bz2 support

2008-10-21 Thread Ryan May
Hi,

I noticed numpy.loadtxt has support for gzipped text files, but not for
bz2'd files.  Here's a 3 line patch to add bzip2 support to loadtxt.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 5953)
+++ numpy/lib/io.py (working copy)
@@ -320,6 +320,9 @@
 if fname.endswith('.gz'):
 import gzip
 fh = gzip.open(fname)
+elif fname.endswith('.bz2'):
+import bz2
+fh = bz2.BZ2File(fname)
 else:
 fh = file(fname)
 elif hasattr(fname, 'seek'):
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


  1   2   >