Pierre GM wrote:
I think that treating an explicitly-passed-in ' ' delimiter as
identical to 'no delimiter' is a bad idea. If I say that ' ' is the
delimiter, or '\t' is the delimiter, this should be treated *just*
like ',' being the delimiter, where the expected output is:
['1', '2', '3', '4',
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
Alan Isaac
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
Pierre GM wrote:
I can try, but in that case, please write me a unittest, so that I
have a clear and unambiguous idea of what you expect.
fair enough, though I'm not sure when I'll have time to do it.
I do wonder if anyone else thinks it would be useful to have multiple
delimiters as an
On Dec 3, 2008, at 12:48 PM, Christopher Barker wrote:
Pierre GM wrote:
I can try, but in that case, please write me a unittest, so that I
have a clear and unambiguous idea of what you expect.
fair enough, though I'm not sure when I'll have time to do it.
Oh, don;t worry, nothing too
On Dec 3, 2008, at 12:32 PM, Alan G Isaac wrote:
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
Hopefully. I'm looking for the best way to do it. Do you have an
by the way, should this work:
io.loadtxt('junk.dat', delimiter=' ')
for more than one space between numbers, like:
1 2 3 4 5
6 7 8 9 10
I get:
io.loadtxt('junk.dat', delimiter=' ')
Traceback (most recent call last):
File stdin, line 1, in module
File
Pierre GM wrote:
Oh, don;t worry, nothing too fancy: give me a couple lines of input
data and a line with what you expect.
I just went and looked at the existing tests, and you're right, it's
very easy -- my first foray into the new nose tests -- very nice!
specify, say ',' as the
Alan G Isaac wrote:
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
what I'd like to see is a version of loadtxt built on a slightly
enhanced fromfile() -- that would be
On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote:
by the way, should this work:
io.loadtxt('junk.dat', delimiter=' ')
for more than one space between numbers, like:
1 2 3 4 5
6 7 8 9 10
On the version I'm working on, both delimiter='' and delimiter=None
(default) would
Alan G Isaac wrote:
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
Alan Isaac
Hi all,
that's going in the same direction I was thinking about.
When I thought about
Pierre GM wrote:
On Dec 3, 2008, at 1:00 PM, Christopher Barker wrote:
for more than one space between numbers, like:
1 2 3 4 5
6 7 8 9 10
On the version I'm working on, both delimiter='' and delimiter=None
(default) would give you the expected output.
so empty string and
Manuel Metz wrote:
Alan G Isaac wrote:
If I know my data is already clean
and is handled nicely by the
old loadtxt, will I be able to turn
off and the special handling in
order to retain the old load speed?
Alan Isaac
Hi all,
that's going in the same direction I was thinking about.
Manuel,
Looks nice, I gonna try to see how I can incorporate yours. Note that
returning np.nan by default will not work w/ Python 2.6 if you want an
int...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
On 12/2/2008 7:21 AM Joris De Ridder apparently wrote:
As a historical note, we used to have scipy.io.read_array which at the
time was considered by Travis too slow and too grandiose to be put
in Numpy. As a consequence, numpy.loadtxt() was created which was
simple and fast. Now it
On 12/2/2008 8:12 AM Alan G Isaac apparently wrote:
I hope this consideration remains prominent
in this thread. Is the disappearance or
read_array the reason for this change?
What happened to it?
Apologies; it is only deprecated, not gone.
Alan Isaac
On 1 Dec 2008, at 21:47 , Stéfan van der Walt wrote:
Hi Pierre
2008/12/1 Pierre GM [EMAIL PROTECTED]:
* `genloadtxt` is the base function that makes all the work. It
outputs 2 arrays, one for the data (missing values being substituted
by the appropriate default) and one for the mask. It
Hi Pierre,
I've tested the new loadtxt briefly. Looks good, except that there's a
minor bug when trying to use a specific white-space delimiter (e.g.
\t) while still allowing other white-space to be allowed in fields
(e.g. spaces).
Specifically, on line 115 in LineSplitter, we have:
Zachary Pincus wrote:
Specifically, on line 115 in LineSplitter, we have:
self.delimiter = delimiter.strip() or None
so if I pass in, say, '\t' as the delimiter, self.delimiter gets set
to None, which then causes the default behavior of any-whitespace-is-
delimiter to be used.
Pierre GM wrote:
Well, looks like the attachment is too big, so here's the
implementation. The tests will come in another message.
A couple of quick nitpicks:
1) On line 186 (in the NameValidator class), you use
excludelist.append() to append a list to the end of a list. I think you
meant
On Dec 2, 2008, at 3:12 PM, Ryan May wrote:
Pierre GM wrote:
Well, looks like the attachment is too big, so here's the
implementation. The tests will come in another message.
A couple of quick nitpicks:
1) On line 186 (in the NameValidator class), you use
excludelist.append() to append a
Pierre GM wrote:
I think that treating an explicitly-passed-in ' ' delimiter as
identical to 'no delimiter' is a bad idea. If I say that ' ' is the
delimiter, or '\t' is the delimiter, this should be treated *just*
like ',' being the delimiter, where the expected output is:
['1', '2', '3',
Chris,
I can try, but in that case, please write me a unittest, so that I
have a clear and unambiguous idea of what you expect.
ANFSCD, have you tried the missing_values option ?
On Dec 2, 2008, at 5:36 PM, Christopher Barker wrote:
Pierre GM wrote:
I think that treating an
All,
Please find attached to this message another implementation of
np.loadtxt, which focuses on missing values. It's basically a
combination of John Hunter's et al mlab.csv2rec, Ryan May's patches
and pieces of code I'd been working on over the last few weeks.
Besides some helper classes
And now for the tests:
Proposal :
Here's an extension to np.loadtxt, designed to take missing values into account.
from genload_proposal import *
from numpy.ma.testutils import *
import StringIO
class TestLineSplitter(TestCase):
#
def test_nodelimiter(self):
Test
2008/12/1 Pierre GM [EMAIL PROTECTED]:
Please find attached to this message another implementation of
Struggling to comply!
Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
Well, looks like the attachment is too big, so here's the
implementation. The tests will come in another message.
Proposal :
Here's an extension to np.loadtxt, designed to take missing values into account.
import itertools
import numpy as np
import numpy.ma as ma
def
On Mon, Dec 1, 2008 at 12:21 PM, Pierre GM [EMAIL PROTECTED] wrote:
Well, looks like the attachment is too big, so here's the implementation.
The tests will come in another message.\
It looks like I am doing something wrong -- trying to parse a CSV file
with dates formatted like '2008-10-14',
Hi Pierre
2008/12/1 Pierre GM [EMAIL PROTECTED]:
* `genloadtxt` is the base function that makes all the work. It
outputs 2 arrays, one for the data (missing values being substituted
by the appropriate default) and one for the mask. It would go in
np.lib.io
I see the code length increased
Stéfan van der Walt wrote:
Hi Pierre
2008/12/1 Pierre GM [EMAIL PROTECTED]:
* `genloadtxt` is the base function that makes all the work. It
outputs 2 arrays, one for the data (missing values being substituted
by the appropriate default) and one for the mask. It would go in
np.lib.io
I
2008/12/1 Ryan May [EMAIL PROTECTED]:
I've wondered about this being an issue. On one hand, you hate to make
existing code noticeably slower. On the other hand, if speed is
important to you, why are you using ascii I/O?
More I than O! But I think numpy.fromfile, once fixed up, could
fill
I agree, genloadtxt is a bit blotted, and it's not a surprise it's
slower than the initial one. I think that in order to be fair,
comparisons must be performed with matplotlib.mlab.csv2rec, that
implements as well the autodetection of the dtype. I'm quite in favor
of keeping a lite version
Stéfan van der Walt wrote:
important to you, why are you using ascii I/O?
ascii I/O is slow, so that's a reason in itself to want it not to be slower!
More I than O! But I think numpy.fromfile, once fixed up, could
fill this niche nicely.
I agree -- for the simple cases, fromfile() could
Pierre GM wrote:
Another issue comes from the possibility to define the dtype
automatically:
Does all that get bypassed if the dtype(s) is specified? Is it still
slow in that case?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/ORR
33 matches
Mail list logo