Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On 2/27/08, Travis E. Oliphant [EMAIL PROTECTED] wrote: Did this discussion resolve with a fix that can go in before 1.0.5 is released? I believe the answer is yes, but we have to choose: 1- Use the regepx based solution of David. 2- Move to use 'index' instead of 'find' as proposed by Alan and implemented by Christopher in example code. In my view, (1) is more powerful regarding future improvements; but (2) is far simpler if we are looking for just a fix. Someone will have to take a decision about what approach should be followed. -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
Lisandro Dalcin wrote: On 2/27/08, Travis E. Oliphant [EMAIL PROTECTED] wrote: Did this discussion resolve with a fix that can go in before 1.0.5 is released? I believe the answer is yes, but we have to choose: 1- Use the regepx based solution of David. A good idea, but a feature expansion, and it needs more testing -- not ready for 1.0.5 2- Move to use 'index' instead of 'find' as proposed by Alan and implemented by Christopher in example code. Robert's committed that, so we're done for now (though I hope to write a test or two soon...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
I can look at it. Would everyone be satisfied with a solution using regular expressions ? That is, looking for the following pattern: pattern = re.compile(r ^\s* # leading white space (.*) # Data %s? # Zero or one comment character (.*) # Comments \s*$ # Trailing white space %comments, re.VERBOSE) match = pattern.search(line) line, comment = match.groups() instead of line = line[:line.find(comments)].strip() By the way, is there a test function for loadtxt and savetxt ? I couldn't find one. David 2008/2/26, Alan G Isaac [EMAIL PROTECTED]: On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote: I believe the current 'loadtxt' function is broken I agree: URL: http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
David Huard wrote: Would everyone be satisfied with a solution using regular expressions ? Maybe it's because regular expressions make me itch, but I think it's overkill for this. The issue here is a result of what I consider a wart in python's string methods -- string.find() returns a valid index( -1 ) when it fails to find anything. The usual way to work with this is to test for it: print test for comment not found: for line in SampleLines: i = line.find(comments) if i == -1: line = line.strip() else: line = line[:i].strip() print line which does seem like a lot of extra code. In this case, that wasn't' done, as most of the time there is a newline at the end that can be thrown away anyway, so the -1 index is OK. So that inspired the following solution -- just add an extra space every time: print simply pad the line with a space: for line in SampleLines: line += line = line[:(line).find(comments)].strip() print line an extra string creation, but simple. pattern = re.compile(r ^\s* # leading white space (.*) # Data %s? # Zero or one comment character (.*) # Comments \s*$ # Trailing white space %comments, re.VERBOSE) This pattern fails if the last character of the line is a comment character, and if it is a comment only line, though I'm sure that could be fixed. I still prefer the python string methods approaches, though. I've enclosed a little test code, that gives these results: old way -- this fails with no comment of newline 1 2 3 4 5 1 2 3 4 1 2 3 4 5 with regular expression: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5# # 1 2 3 4 5 simply pad the line with a space: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 test for comment not found: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 My suggestions work on all my test cases. We really should put these, and others, into a real unit test when this fix is added. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] #!/usr/bin/env python test of loadtext issue comments = # SampleLines = [ 1 2 3 4 5\n, 1 2 3 4 5, 1 2 3 4 5#, # 1 2 3 4 5, ] #SampleLines = [a line with a comment # this is the comment # # a comment-only line, #a line with no comment, and no newline, #a line with a trailing comment character, and no newline#, # ] print old way -- this fails with no comment of newline for line in SampleLines: line = line[:line.find(comments)].strip() print line print with regular expression: import re pattern = re.compile(r ^\s* # leading white space (.*) # Data %s? # Zero or one comment character (.*) # Comments \s*$ # Trailing white space %comments, re.VERBOSE) match = pattern.search(line) line, comment = match.groups() for line in SampleLines: match = pattern.search(line) line, comment = match.groups() print line print simply pad the line with a space: for line in SampleLines: line += line = line[:(line).find(comments)].strip() print line print test for comment not found: for line in SampleLines: i = line.find(comments) if i == -1: line = line.strip() else: line = line[:i].strip() print line ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
Hi Christopher, The advantage of using regular expressions is that in this case it gives you some flexibility that wasn't there before. For instance, if for any reason there are two type of characters that coexist in the file to mark comments, using pattern = re.compile(comments) for i,line in enumerate(fh): if iskiprows: continue line = pattern.split(line)[0] can take care of that automatically if comments is a regular expression. Cheers, David 2008/2/27, Christopher Barker [EMAIL PROTECTED]: David Huard wrote: Would everyone be satisfied with a solution using regular expressions ? Maybe it's because regular expressions make me itch, but I think it's overkill for this. The issue here is a result of what I consider a wart in python's string methods -- string.find() returns a valid index( -1 ) when it fails to find anything. The usual way to work with this is to test for it: print test for comment not found: for line in SampleLines: i = line.find(comments) if i == -1: line = line.strip() else: line = line[:i].strip() print line which does seem like a lot of extra code. In this case, that wasn't' done, as most of the time there is a newline at the end that can be thrown away anyway, so the -1 index is OK. So that inspired the following solution -- just add an extra space every time: print simply pad the line with a space: for line in SampleLines: line += line = line[:(line).find(comments)].strip() print line an extra string creation, but simple. pattern = re.compile(r ^\s* # leading white space (.*) # Data %s? # Zero or one comment character (.*) # Comments \s*$ # Trailing white space %comments, re.VERBOSE) This pattern fails if the last character of the line is a comment character, and if it is a comment only line, though I'm sure that could be fixed. I still prefer the python string methods approaches, though. I've enclosed a little test code, that gives these results: old way -- this fails with no comment of newline 1 2 3 4 5 1 2 3 4 1 2 3 4 5 with regular expression: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5# # 1 2 3 4 5 simply pad the line with a space: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 test for comment not found: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 My suggestions work on all my test cases. We really should put these, and others, into a real unit test when this fix is added. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On Wed, 27 Feb 2008, Christopher Barker wrote: The issue here is a result of what I consider a wart in python's string methods -- string.find() returns a valid index( -1 ) when it fails to find anything. Use index instead? Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
David Huard wrote: The advantage of using regular expressions is that in this case it gives you some flexibility that wasn't there before. For instance, if for any reason there are two type of characters that coexist in the file to mark comments, using pattern = re.compile(comments) can take care of that automatically if comments is a regular expression. OK -- but loadtxt() doesn't support that now anyway. I'm not writing the code, nor using it at the moment, so It's fine with me either way, but the re should certainly support the examples I gave that don't work now. (plus probably others, that's not a comprehensive list of possibilities.) -CHB 2008/2/27, Christopher Barker [EMAIL PROTECTED] This pattern fails if the last character of the line is a comment character, and if it is a comment only line -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
Well, after all that said, I'm also fine with either approach. Anyway, I would say that my personal preference is for the one using 'str.index', as it is the simplest one regarding the old code. Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue made a coworker to get crazy (he is a newby in python/numpy). BTW, I'm pretty sure that some time ago Guido agreed about the removal of str.find for Py3k, but it is still there in py3k-repo. Feel free to ask at python-dev if any of you consider it appropriate. Regards, On 2/27/08, Christopher Barker [EMAIL PROTECTED] wrote: David Huard wrote: The advantage of using regular expressions is that in this case it gives you some flexibility that wasn't there before. For instance, if for any reason there are two type of characters that coexist in the file to mark comments, using pattern = re.compile(comments) can take care of that automatically if comments is a regular expression. OK -- but loadtxt() doesn't support that now anyway. I'm not writing the code, nor using it at the moment, so It's fine with me either way, but the re should certainly support the examples I gave that don't work now. (plus probably others, that's not a comprehensive list of possibilities.) -CHB 2008/2/27, Christopher Barker [EMAIL PROTECTED] This pattern fails if the last character of the line is a comment character, and if it is a comment only line -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
Lisandro Dalcin wrote: Well, after all that said, I'm also fine with either approach. Anyway, I would say that my personal preference is for the one using 'str.index', as it is the simplest one regarding the old code. Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue made a coworker to get crazy (he is a newby in python/numpy). BTW, I'm pretty sure that some time ago Guido agreed about the removal of str.find for Py3k, but it is still there in py3k-repo. Feel free to ask at python-dev if any of you consider it appropriate. Did this discussion resolve with a fix that can go in before 1.0.5 is released? -Travis O. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On Wed, Feb 27, 2008 at 4:04 PM, Travis E. Oliphant [EMAIL PROTECTED] wrote: Did this discussion resolve with a fix that can go in before 1.0.5 is released? Fixed in r4827. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
Robert Kern wrote: Fixed in r4827. Thanks Robert. For the record, this is the fixed version: comment_start = line.find(comments) if comment_start 0: line = line[:comments_start].strip() else: line = line.strip() Just as a matter of interest, why this, rather than line.index()? Are exceptions slower than an if test? Also, I don't see any io tests in: numpy/lib/tests Is that where they should be? It seems like a good idea to have a few... If I did find the time to write some tests -- how does one go about it for this sort of thing? Do I put a couple sample input files in SVN? Or does the test code write out the sample files, then read them in to test? Or maybe do it all in memory with sStringIO or something. Are there any examples of tests of file reading code that I could borrow from? thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker [EMAIL PROTECTED] wrote: Robert Kern wrote: Fixed in r4827. Thanks Robert. For the record, this is the fixed version: comment_start = line.find(comments) if comment_start 0: line = line[:comments_start].strip() else: line = line.strip() Just as a matter of interest, why this, rather than line.index()? Are exceptions slower than an if test? Yes. Also, I don't see any io tests in: numpy/lib/tests Is that where they should be? It seems like a good idea to have a few... Yes. If I did find the time to write some tests -- how does one go about it for this sort of thing? Do I put a couple sample input files in SVN? Or does the test code write out the sample files, then read them in to test? Or maybe do it all in memory with sStringIO or something. Any of the above depending on the situation. Use cStringIO if you can. Put files into numpy/lib/tests/data/ otherwise. Locate them using os.path.join(os.path.dirname(__file__), 'data', 'mytestfile.dat'). Write things out at runtime *only* if you use tempfile correctly and are sure you clean up properly after yourself whether the test passes or fails. Are there any examples of tests of file reading code that I could borrow from? numpy/lib/tests/test_format.py Unfortunately, they have been written for nose, which we haven't moved to, yet, for numpy itself. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On Wed, 27 Feb 2008, Robert Kern apparently wrote: Fixed in r4827. On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote: For the record, this is the fixed version: comment_start = line.find(comments) if comment_start 0: line = line[:comments_start].strip() else: line = line.strip() Three problems. 1. I do not see this change here: URL:http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py Am I looking in the wrong place? 2. Can I assume this was not cut and past? Otherwise, I see two problems. 2a. comment_start vs. comments_start (spelling) 2b. 0 instead of =0 (e.g., #try me! would not be skipped) So I think the desired lines are actually:: comment_start = line.find(comments) if comment_start = 0: line = line[:comment_start].strip() else: line = line.strip() return line Cheers, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt broken if file does not end in newline
On Thu, Feb 28, 2008 at 12:12 AM, Alan G Isaac [EMAIL PROTECTED] wrote: On Wed, 27 Feb 2008, Robert Kern apparently wrote: Fixed in r4827. On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote: For the record, this is the fixed version: comment_start = line.find(comments) if comment_start 0: line = line[:comments_start].strip() else: line = line.strip() Three problems. 1. I do not see this change here: URL:http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py Am I looking in the wrong place? I fixed the version in numpy/lib/io.py. I didn't know there was a second version lying around. It was moved there during in the lib_io branch but did not get removed from numpy/core during the merge. 2. Can I assume this was not cut and past? Otherwise, I see two problems. 2a. comment_start vs. comments_start (spelling) 2b. 0 instead of =0 (e.g., #try me! would not be skipped) So I think the desired lines are actually:: comment_start = line.find(comments) if comment_start = 0: line = line[:comment_start].strip() else: line = line.strip() return line The errors were real. They are now fixed, thank you. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion