Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-28 Thread Lisandro Dalcin
On 2/27/08, Travis E. Oliphant [EMAIL PROTECTED] wrote:
 Did this discussion resolve with a fix that can go in before 1.0.5 is
  released?

I believe the answer is yes, but we have to choose:

1- Use the regepx based solution of David.

2- Move to use 'index' instead of 'find' as proposed by Alan and
implemented by Christopher in example code.

In my view, (1) is more powerful regarding future improvements; but
(2) is far simpler if we are looking for just a fix. Someone will have
to take a decision about what approach should be followed.






-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-28 Thread Christopher Barker
Lisandro Dalcin wrote:
 On 2/27/08, Travis E. Oliphant [EMAIL PROTECTED] wrote:
 Did this discussion resolve with a fix that can go in before 1.0.5 is
  released?
 
 I believe the answer is yes, but we have to choose:
 
 1- Use the regepx based solution of David.

A good idea, but a feature expansion, and it needs more testing -- not 
ready for 1.0.5

 2- Move to use 'index' instead of 'find' as proposed by Alan and
 implemented by Christopher in example code.

Robert's committed that, so we're done for now (though I hope to write a 
test or two soon...)

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread David Huard
I can look at it.

Would everyone be satisfied with a solution using regular expressions ?
That is, looking for the following pattern:

pattern = re.compile(r
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()

instead of

line = line[:line.find(comments)].strip()

By the way, is there a test function for loadtxt and savetxt ? I couldn't
find one.


David

2008/2/26, Alan G Isaac [EMAIL PROTECTED]:

 On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote:
  I believe the current 'loadtxt' function is broken


 I agree:
 URL:
 http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html
 

 Cheers,

 Alan Isaac




 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker

David Huard wrote:

Would everyone be satisfied with a solution using regular expressions ?


Maybe it's because regular expressions make me itch, but I think it's 
overkill for this.


The issue here is a result of what I consider a wart in python's string 
methods -- string.find() returns a valid index( -1 ) when it fails to 
find anything. The usual way to work with this is to test for it:


print test for comment not found:
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip()
else:
line = line[:i].strip()
print line

which does seem like a lot of extra code.

In this case, that wasn't' done, as most of the time there is a newline 
at the end that can be thrown away anyway, so the -1 index is OK. So 
that inspired the following solution -- just add an extra space every time:


print simply pad the line with a space:
for line in SampleLines:
line +=  
line = line[:(line).find(comments)].strip()
print line

an extra string creation, but simple.


pattern = re.compile(r
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
%comments, re.VERBOSE)


This pattern fails if the last character of the line is a comment 
character, and if it is a comment only line, though I'm sure that could 
be fixed. I still prefer the python string methods approaches, though.


I've enclosed a little test code, that gives these results:

old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5

with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

My suggestions work on all my test cases. We really should put these, 
and others, into a real unit test when this fix is added.


-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
#!/usr/bin/env python


test of loadtext issue


comments = #

SampleLines = [ 1 2 3 4 5\n,
1 2 3 4 5,
1 2 3 4 5#,
 # 1 2 3 4 5,
   ]


#SampleLines = [a line with a comment # this is the comment
#   # a comment-only line,
#a line with no comment, and no newline,
#a line with a trailing comment character, and no newline#,
#   ]

print old way -- this fails with no comment of newline
for line in SampleLines: 
line = line[:line.find(comments)].strip()
print line

print with regular expression:
import re
pattern = re.compile(r
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()
for line in SampleLines:
match = pattern.search(line)
line, comment = match.groups()
print line

print simply pad the line with a space:
for line in SampleLines: 
line +=  
line = line[:(line).find(comments)].strip()
print line

print test for comment not found:
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip() 
else:
line = line[:i].strip()
print line

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread David Huard
Hi Christopher,

The advantage of using regular expressions is that in this case it gives you
some flexibility that wasn't there before. For instance, if for any reason
there are two type of characters that coexist in the file to mark comments,
using

pattern = re.compile(comments)
for i,line in enumerate(fh):
 if iskiprows: continue
 line = pattern.split(line)[0]

can take care of that automatically if comments is a regular expression.

Cheers,

David






2008/2/27, Christopher Barker [EMAIL PROTECTED]:

 David Huard wrote:
  Would everyone be satisfied with a solution using regular expressions ?


 Maybe it's because regular expressions make me itch, but I think it's
 overkill for this.

 The issue here is a result of what I consider a wart in python's string
 methods -- string.find() returns a valid index( -1 ) when it fails to
 find anything. The usual way to work with this is to test for it:

 print test for comment not found:
 for line in SampleLines:
  i = line.find(comments)
  if i == -1:
  line = line.strip()
  else:
  line = line[:i].strip()
  print line

 which does seem like a lot of extra code.

 In this case, that wasn't' done, as most of the time there is a newline
 at the end that can be thrown away anyway, so the -1 index is OK. So
 that inspired the following solution -- just add an extra space every
 time:

 print simply pad the line with a space:
 for line in SampleLines:
  line +=  

  line = line[:(line).find(comments)].strip()

  print line

 an extra string creation, but simple.


  pattern = re.compile(r
  ^\s* # leading white space
  (.*) # Data
  %s?  # Zero or one comment character
  (.*) # Comments
  \s*$ # Trailing white space
  %comments, re.VERBOSE)


 This pattern fails if the last character of the line is a comment
 character, and if it is a comment only line, though I'm sure that could
 be fixed. I still prefer the python string methods approaches, though.

 I've enclosed a little test code, that gives these results:

 old way -- this fails with no comment of newline
 1 2 3 4 5
 1 2 3 4
 1 2 3 4 5

 with regular expression:
 1 2 3 4 5
 1 2 3 4 5
 1 2 3 4 5#
 # 1 2 3 4 5
 simply pad the line with a space:
 1 2 3 4 5
 1 2 3 4 5
 1 2 3 4 5

 test for comment not found:
 1 2 3 4 5
 1 2 3 4 5
 1 2 3 4 5

 My suggestions work on all my test cases. We really should put these,
 and others, into a real unit test when this fix is added.

 -Chris

 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 [EMAIL PROTECTED]

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Alan Isaac
On Wed, 27 Feb 2008, Christopher Barker wrote:
 The issue here is a result of what I consider a wart in python's string 
 methods -- string.find() returns a valid index( -1 ) when 
 it fails to find anything. 

Use index instead?

Cheers,
Alan Isaac




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker
David Huard wrote:
 The advantage of using regular expressions is that in this case it gives 
 you some flexibility that wasn't there before. For instance, if for any 
 reason there are two type of characters that coexist in the file to mark 
 comments, using

 pattern = re.compile(comments)
 can take care of that automatically if comments is a regular expression.

OK -- but loadtxt() doesn't support that now anyway. I'm not writing the 
code, nor using it at the moment, so It's fine with me either way, but 
the re should certainly support the examples I gave that don't work now. 
(plus probably others, that's not a comprehensive list of possibilities.)

-CHB

 2008/2/27, Christopher Barker [EMAIL PROTECTED] 

 This pattern fails if the last character of the line is a comment
 character, and if it is a comment only line

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Lisandro Dalcin
Well, after all that said, I'm also fine with either approach. Anyway,
I would say that my personal preference is for the one using
'str.index', as it is the simplest one regarding the old code.

Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue
made a coworker to get crazy (he is a newby in python/numpy).

BTW, I'm pretty sure that some time ago Guido agreed about the removal
of str.find for Py3k, but it is still there in py3k-repo. Feel free to
ask at python-dev if any of you consider it appropriate.

Regards,


On 2/27/08, Christopher Barker [EMAIL PROTECTED] wrote:
 David Huard wrote:
   The advantage of using regular expressions is that in this case it gives
   you some flexibility that wasn't there before. For instance, if for any
   reason there are two type of characters that coexist in the file to mark
   comments, using

   pattern = re.compile(comments)

  can take care of that automatically if comments is a regular expression.


 OK -- but loadtxt() doesn't support that now anyway. I'm not writing the
  code, nor using it at the moment, so It's fine with me either way, but
  the re should certainly support the examples I gave that don't work now.
  (plus probably others, that's not a comprehensive list of possibilities.)

  -CHB


   2008/2/27, Christopher Barker [EMAIL PROTECTED]


  This pattern fails if the last character of the line is a comment
   character, and if it is a comment only line


 --

 Christopher Barker, Ph.D.
  Oceanographer

  Emergency Response Division
  NOAA/NOS/ORR(206) 526-6959   voice
  7600 Sand Point Way NE   (206) 526-6329   fax
  Seattle, WA  98115   (206) 526-6317   main reception

  [EMAIL PROTECTED]
  ___
  Numpy-discussion mailing list
  Numpy-discussion@scipy.org
  http://projects.scipy.org/mailman/listinfo/numpy-discussion



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Travis E. Oliphant
Lisandro Dalcin wrote:
 Well, after all that said, I'm also fine with either approach. Anyway,
 I would say that my personal preference is for the one using
 'str.index', as it is the simplest one regarding the old code.

 Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue
 made a coworker to get crazy (he is a newby in python/numpy).

 BTW, I'm pretty sure that some time ago Guido agreed about the removal
 of str.find for Py3k, but it is still there in py3k-repo. Feel free to
 ask at python-dev if any of you consider it appropriate.

   

Did this discussion resolve with a fix that can go in before 1.0.5 is 
released?

-Travis O.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Wed, Feb 27, 2008 at 4:04 PM, Travis E. Oliphant
[EMAIL PROTECTED] wrote:
  Did this discussion resolve with a fix that can go in before 1.0.5 is
  released?

Fixed in r4827.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker
Robert Kern wrote:
 Fixed in r4827.

Thanks Robert. For the record, this is the fixed version:

comment_start = line.find(comments)
 if comment_start  0:
 line = line[:comments_start].strip()
 else:
 line = line.strip()

Just as a matter of interest, why this, rather than line.index()? Are 
exceptions slower than an if test?

Also,

I don't see any io tests in:

numpy/lib/tests

Is that where they should be? It seems like a good idea to have a few...

If I did find the time to write some tests -- how does one go about it 
for this sort of thing? Do I put a couple sample input files in SVN? Or 
does the test code write out the sample files, then read them in to 
test? Or maybe do it all in memory with sStringIO or something. Are 
there any examples of tests of file reading code that I could borrow from?

thanks,
-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker
[EMAIL PROTECTED] wrote:
 Robert Kern wrote:
   Fixed in r4827.

  Thanks Robert. For the record, this is the fixed version:

 comment_start = line.find(comments)
  if comment_start  0:
  line = line[:comments_start].strip()
  else:
  line = line.strip()

  Just as a matter of interest, why this, rather than line.index()? Are
  exceptions slower than an if test?

Yes.

  Also,

  I don't see any io tests in:

  numpy/lib/tests

  Is that where they should be? It seems like a good idea to have a few...

Yes.

  If I did find the time to write some tests -- how does one go about it
  for this sort of thing? Do I put a couple sample input files in SVN? Or
  does the test code write out the sample files, then read them in to
  test? Or maybe do it all in memory with sStringIO or something.

Any of the above depending on the situation. Use cStringIO if you can.
Put files into numpy/lib/tests/data/ otherwise. Locate them using
os.path.join(os.path.dirname(__file__), 'data', 'mytestfile.dat').
Write things out at runtime *only* if you use tempfile correctly and
are sure you clean up properly after yourself whether the test passes
or fails.

  Are
  there any examples of tests of file reading code that I could borrow from?

numpy/lib/tests/test_format.py

Unfortunately, they have been written for nose, which we haven't moved
to, yet, for numpy itself.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Alan G Isaac
 On Wed, 27 Feb 2008, Robert Kern apparently wrote:
 Fixed in r4827.


 On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote:
 For the record, this is the fixed version:
 comment_start = line.find(comments)
  if comment_start  0:
  line = line[:comments_start].strip()
  else:
  line = line.strip()


Three problems.
1. I do not see this change here: 
URL:http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py
Am I looking in the wrong place?

2. Can I assume this was not cut and past?
Otherwise, I see two problems.

2a.  comment_start vs. comments_start (spelling)
2b.  0 instead of =0   (e.g., #try me! would not be skipped)

So I think the desired lines are actually::

comment_start = line.find(comments)
if comment_start = 0:
line = line[:comment_start].strip()
else:
line = line.strip()
return line

Cheers,
Alan Isaac



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Thu, Feb 28, 2008 at 12:12 AM, Alan G Isaac [EMAIL PROTECTED] wrote:
  On Wed, 27 Feb 2008, Robert Kern apparently wrote:
   Fixed in r4827.



   On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote:
   For the record, this is the fixed version:
   comment_start = line.find(comments)
if comment_start  0:
line = line[:comments_start].strip()
else:
line = line.strip()


  Three problems.
  1. I do not see this change here:
  URL:http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py
  Am I looking in the wrong place?

I fixed the version in numpy/lib/io.py. I didn't know there was a
second version lying around. It was moved there during in the lib_io
branch but did not get removed from numpy/core during the merge.

  2. Can I assume this was not cut and past?
  Otherwise, I see two problems.

 2a.  comment_start vs. comments_start (spelling)
 2b.  0 instead of =0   (e.g., #try me! would not be skipped)

  So I think the desired lines are actually::


 comment_start = line.find(comments)
 if comment_start = 0:
 line = line[:comment_start].strip()
 else:
 line = line.strip()
 return line

The errors were real. They are now fixed, thank you.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion