Re: [Numpy-discussion] More loadtxt() changes

2008-11-28 Thread Manuel Metz
Pierre GM wrote: On Nov 27, 2008, at 3:08 AM, Manuel Metz wrote: Certainly, yes! Dealing with fixed-length fields would be necessary. The case I had in mind had both -- a separator (|) __and__ fixed-length fields -- and is probably very special in that sense. But such data-files exists out

Re: [Numpy-discussion] More loadtxt() changes

2008-11-28 Thread Pierre GM
Manuel, Give me the week-end to come up with something. What you want is already doable with the current implementation of np.loadtxt, through the converter keyword. Support for missing data will be covered in a separate function, most likely to be put in numpy.ma.io at term. On Nov 28,

Re: [Numpy-discussion] More loadtxt() changes

2008-11-27 Thread Manuel Metz
Pierre GM wrote: On Nov 26, 2008, at 5:55 PM, Ryan May wrote: Manuel Metz wrote: Ryan May wrote: 3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a converter. The problem with this is that you have to pass in a converter

Re: [Numpy-discussion] More loadtxt() changes

2008-11-27 Thread Pierre GM
On Nov 27, 2008, at 3:08 AM, Manuel Metz wrote: Certainly, yes! Dealing with fixed-length fields would be necessary. The case I had in mind had both -- a separator (|) __and__ fixed-length fields -- and is probably very special in that sense. But such data-files exists out there...

Re: [Numpy-discussion] More loadtxt() changes

2008-11-27 Thread Nils Wagner
On Thu, 27 Nov 2008 09:08:41 +0100 Manuel Metz [EMAIL PROTECTED] wrote: Pierre GM wrote: On Nov 26, 2008, at 5:55 PM, Ryan May wrote: Manuel Metz wrote: Ryan May wrote: 3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a

Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Manuel Metz
Ryan May wrote: Hi, I have a couple more changes to loadtxt() that I'd like to code up in time for 1.3, but I thought I should run them by the list before doing too much work. These are already implemented in some fashion in matplotlib.mlab.csv2rec(), but the code bases are different

Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread John Hunter
On Tue, Nov 25, 2008 at 11:23 PM, Ryan May [EMAIL PROTECTED] wrote: Updated patch attached. This includes: * Updated docstring * New tests * Fixes for previous issues * Fixes to make new tests actually work I appreciate any and all feedback. I'm having trouble applying your patch, so

Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Ryan May
John Hunter wrote: On Tue, Nov 25, 2008 at 11:23 PM, Ryan May [EMAIL PROTECTED] wrote: Updated patch attached. This includes: * Updated docstring * New tests * Fixes for previous issues * Fixes to make new tests actually work I appreciate any and all feedback. I'm having trouble

Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Ryan May
Manuel Metz wrote: Ryan May wrote: 3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a converter. The problem with this is that you have to pass in a converter for *every column* that will contain missing values. If you have a text

Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Pierre GM
On Nov 26, 2008, at 5:55 PM, Ryan May wrote: Manuel Metz wrote: Ryan May wrote: 3) Better support for missing values. The docstring mentions a way of handling missing values by passing in a converter. The problem with this is that you have to pass in a converter for *every column*

[Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Hi, I have a couple more changes to loadtxt() that I'd like to code up in time for 1.3, but I thought I should run them by the list before doing too much work. These are already implemented in some fashion in matplotlib.mlab.csv2rec(), but the code bases are different enough, that pretty much

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
Ryan, FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below (I also have unittest). Most of the work is actually inspired from matplotlib's

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Christopher Barker
Pierre GM wrote: FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below great, thanks! this could be very useful to me. Two comments: missing : string,

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
On Nov 25, 2008, at 12:30 PM, Christopher Barker wrote: missing : string, optional A string representing a missing value, irrespective of the column where it appears (e.g., ``'missing'`` or ``'unused'``. It might be nice if missing could be a sequence of strings, if there is

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Christopher Barker
Pierre GM wrote: would it possible to specify column header, rather than number here? A la mlab.csv2rec ? I'll have to take a look at that. following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? well, my use case is that I

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: Ryan, FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below (I also have unittest). Most of the work is actually inspired from

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
On Nov 25, 2008, at 2:06 PM, Ryan May wrote: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to replace

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread John Hunter
On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote: A la mlab.csv2rec ? It could work with a bit more tweaking, basically following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? Actually, I'd like John to comment

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: On Nov 25, 2008, at 2:06 PM, Ryan May wrote: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions. Great. Could we think about getting that on for 1.3x, would you have time

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
On Nov 25, 2008, at 2:37 PM, Ryan May wrote: What about doing the parsing and type inference in a loop and holding onto the already split lines? Then loop through the lines with the converters that were finally chosen? In addition to making my usecase work, this has the benefit of not doing

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
It shouldn't create any *extra* temporaries since we already make a list of lists before creating the final array. It just introduces an extra looping step. (I'd reuse the existing list of lists). Cool then, go for it. If my understanding of StringConverter is correct, tweaking the

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: Nope, we still need to double check whether there's any missing data in any field of the line we process, independently of the conversion. So there must be some extra loop involved, and I'd need a special function in numpy.ma to take care of that. So our options are *

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
On Nov 25, 2008, at 3:33 PM, Ryan May wrote: You couldn't run this loop on the array returned by np.loadtxt() (by masking on the appropriate fill value)? Yet an extra loop... Doable, yes... But meh. ___ Numpy-discussion mailing list

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread John Hunter
On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED] wrote: On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these functions in numpy, since a number of them make more sense as numpy methods than as stand alone functions.

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
OK then, I'll take care of that over the next few weeks... On Nov 25, 2008, at 4:56 PM, John Hunter wrote: On Tue, Nov 25, 2008 at 2:01 PM, Pierre GM [EMAIL PROTECTED] wrote: On Nov 25, 2008, at 2:26 PM, John Hunter wrote: Yes, I've said on a number of occasions I'd like to see these

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Travis E. Oliphant
John Hunter wrote: On Tue, Nov 25, 2008 at 12:16 PM, Pierre GM [EMAIL PROTECTED] wrote: A la mlab.csv2rec ? It could work with a bit more tweaking, basically following John Hunter's et al. path. What happens when the column names are unknown (read from the header) or wrong ? Actually,

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Travis E. Oliphant
Pierre GM wrote: OK then, I'll take care of that over the next few weeks... Thanks Pierre. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
Oh don't mention... However, I'd be quite grateful if you could give an eye to the pb of mixing np.scalars and 0d subclasses of ndarray: looks like it's a C pb, quite out of my league... http://scipy.org/scipy/numpy/ticket/826

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: Sounds like a plan. Wouldn't mind getting more feedback from fellow users before we get too deep, however... Ok, I've attached, as a first cut, a diff against SVN HEAD that does (I think) what I'm looking for. It passes all of the old tests and passes my own quick test. A

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
Ryan, Quick comments: * I already have some unittests for StringConverter, check the file I attach. * Your str2bool will probably mess things up in upgrade compared to the one JDH had written (the one I send you): you don't wanna use int(bool(value)), as it'll always give you 0 or 1 when

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Charles R Harris
On Tue, Nov 25, 2008 at 5:00 PM, Pierre GM [EMAIL PROTECTED] wrote: snip All, another question: What's the best way to have some kind of sandbox for code like the one Ryan is writing ? So that we can try it, modify it, without commiting anything to SVN yet ? Probably make a branch and do

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: Ryan, Quick comments: * I already have some unittests for StringConverter, check the file I attach. Ok, great. * Your str2bool will probably mess things up in upgrade compared to the one JDH had written (the one I send you): you don't wanna use int(bool(value)), as

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Pierre GM
On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the status to the

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote: On Nov 25, 2008, at 10:02 PM, Ryan May wrote: Pierre GM wrote: * Your locked version of update won't probably work either, as you force the converter to output a string (you set the status to largest possible, that's the one that outputs strings). Why don't you set the