Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Ryan May
Pierre GM wrote: I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3', '4',

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Alan G Isaac
If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Christopher Barker
Pierre GM wrote: I can try, but in that case, please write me a unittest, so that I have a clear and unambiguous idea of what you expect. fair enough, though I'm not sure when I'll have time to do it. I do wonder if anyone else thinks it would be useful to have multiple delimiters as an

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM
On Dec 3, 2008, at 12:48 PM, Christopher Barker wrote: Pierre GM wrote: I can try, but in that case, please write me a unittest, so that I have a clear and unambiguous idea of what you expect. fair enough, though I'm not sure when I'll have time to do it. Oh, don;t worry, nothing too

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM
On Dec 3, 2008, at 12:32 PM, Alan G Isaac wrote: If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Hopefully. I'm looking for the best way to do it. Do you have an

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Christopher Barker
by the way, should this work: io.loadtxt('junk.dat', delimiter=' ') for more than one space between numbers, like: 1 2 3 4 5 6 7 8 9 10 I get: io.loadtxt('junk.dat', delimiter=' ') Traceback (most recent call last): File stdin, line 1, in module File

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Christopher Barker
Pierre GM wrote: Oh, don;t worry, nothing too fancy: give me a couple lines of input data and a line with what you expect. I just went and looked at the existing tests, and you're right, it's very easy -- my first foray into the new nose tests -- very nice! specify, say ',' as the

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Christopher Barker
Alan G Isaac wrote: If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? what I'd like to see is a version of loadtxt built on a slightly enhanced fromfile() -- that would be

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM
On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: by the way, should this work: io.loadtxt('junk.dat', delimiter=' ') for more than one space between numbers, like: 1 2 3 4 5 6 7 8 9 10 On the version I'm working on, both delimiter='' and delimiter=None (default) would

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Manuel Metz
Alan G Isaac wrote: If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Alan Isaac Hi all, that's going in the same direction I was thinking about. When I thought about

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Christopher Barker
Pierre GM wrote: On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote: for more than one space between numbers, like: 1 2 3 4 5 6 7 8 9 10 On the version I'm working on, both delimiter='' and delimiter=None (default) would give you the expected output. so empty string and

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Manuel Metz
Manuel Metz wrote: Alan G Isaac wrote: If I know my data is already clean and is handled nicely by the old loadtxt, will I be able to turn off and the special handling in order to retain the old load speed? Alan Isaac Hi all, that's going in the same direction I was thinking about.

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Pierre GM
Manuel, Looks nice, I gonna try to see how I can incorporate yours. Note that returning np.nan by default will not work w/ Python 2.6 if you want an int... ___ Numpy-discussion mailing list Numpy-discussion@scipy.org

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Alan G Isaac
On 12/2/2008 7:21 AM Joris De Ridder apparently wrote: As a historical note, we used to have scipy.io.read_array which at the time was considered by Travis too slow and too grandiose to be put in Numpy. As a consequence, numpy.loadtxt() was created which was simple and fast. Now it

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Alan G Isaac
On 12/2/2008 8:12 AM Alan G Isaac apparently wrote: I hope this consideration remains prominent in this thread. Is the disappearance or read_array the reason for this change? What happened to it? Apologies; it is only deprecated, not gone. Alan Isaac

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Joris De Ridder
On 1 Dec 2008, at 21:47 , Stéfan van der Walt wrote: Hi Pierre 2008/12/1 Pierre GM [EMAIL PROTECTED]: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Zachary Pincus
Hi Pierre, I've tested the new loadtxt briefly. Looks good, except that there's a minor bug when trying to use a specific white-space delimiter (e.g. \t) while still allowing other white-space to be allowed in fields (e.g. spaces). Specifically, on line 115 in LineSplitter, we have:

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Ryan May
Zachary Pincus wrote: Specifically, on line 115 in LineSplitter, we have: self.delimiter = delimiter.strip() or None so if I pass in, say, '\t' as the delimiter, self.delimiter gets set to None, which then causes the default behavior of any-whitespace-is- delimiter to be used.

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Ryan May
Pierre GM wrote: Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. A couple of quick nitpicks: 1) On line 186 (in the NameValidator class), you use excludelist.append() to append a list to the end of a list. I think you meant

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Pierre GM
On Dec 2, 2008, at 3:12 PM, Ryan May wrote: Pierre GM wrote: Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. A couple of quick nitpicks: 1) On line 186 (in the NameValidator class), you use excludelist.append() to append a

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Christopher Barker
Pierre GM wrote: I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3',

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Pierre GM
Chris, I can try, but in that case, please write me a unittest, so that I have a clear and unambiguous idea of what you expect. ANFSCD, have you tried the missing_values option ? On Dec 2, 2008, at 5:36 PM, Christopher Barker wrote: Pierre GM wrote: I think that treating an

[Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
All, Please find attached to this message another implementation of np.loadtxt, which focuses on missing values. It's basically a combination of John Hunter's et al mlab.csv2rec, Ryan May's patches and pieces of code I'd been working on over the last few weeks. Besides some helper classes

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
And now for the tests: Proposal : Here's an extension to np.loadtxt, designed to take missing values into account. from genload_proposal import * from numpy.ma.testutils import * import StringIO class TestLineSplitter(TestCase): # def test_nodelimiter(self): Test

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Stéfan van der Walt
2008/12/1 Pierre GM [EMAIL PROTECTED]: Please find attached to this message another implementation of Struggling to comply! Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message. Proposal : Here's an extension to np.loadtxt, designed to take missing values into account. import itertools import numpy as np import numpy.ma as ma def

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread John Hunter
On Mon, Dec 1, 2008 at 12:21 PM, Pierre GM [EMAIL PROTECTED] wrote: Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message.\ It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14',

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Stéfan van der Walt
Hi Pierre 2008/12/1 Pierre GM [EMAIL PROTECTED]: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io I see the code length increased

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Ryan May
Stéfan van der Walt wrote: Hi Pierre 2008/12/1 Pierre GM [EMAIL PROTECTED]: * `genloadtxt` is the base function that makes all the work. It outputs 2 arrays, one for the data (missing values being substituted by the appropriate default) and one for the mask. It would go in np.lib.io I

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Stéfan van der Walt
2008/12/1 Ryan May [EMAIL PROTECTED]: I've wondered about this being an issue. On one hand, you hate to make existing code noticeably slower. On the other hand, if speed is important to you, why are you using ascii I/O? More I than O! But I think numpy.fromfile, once fixed up, could fill

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
I agree, genloadtxt is a bit blotted, and it's not a surprise it's slower than the initial one. I think that in order to be fair, comparisons must be performed with matplotlib.mlab.csv2rec, that implements as well the autodetection of the dtype. I'm quite in favor of keeping a lite version

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Christopher Barker
Stéfan van der Walt wrote: important to you, why are you using ascii I/O? ascii I/O is slow, so that's a reason in itself to want it not to be slower! More I than O! But I think numpy.fromfile, once fixed up, could fill this niche nicely. I agree -- for the simple cases, fromfile() could

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Christopher Barker
Pierre GM wrote: Another issue comes from the possibility to define the dtype automatically: Does all that get bypassed if the dtype(s) is specified? Is it still slow in that case? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR