Re: [ActivePython 2.5.1.1] Why does Python not return first line?
Dennis Lee Bieber wrote: Teletypes, OTOH, really did use one character to advance the platen by a line, and a second to move the print-head to the left. (and may have needed rub-out characters to act as timing delays while the print-head moved) I remember writing a printer driver for a terminal which kept the print head horizontal position soft. The trick to printing fast was to start a line feed, and use horizontal positioning (returns, backspace, space, tab) as part of the time delay required before printing the first actual printing character. Once you had to print a printing character, we used ASCII NULs (though I have seen rub-out used as well) to finish the time delay needed. Since we needed eight (or was it twelve) chars of delay, this driver substantially improved the print speed for our listings. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python not return first line?
On Mar 15, 6:25 pm, Gilles Ganault nos...@nospam.com wrote: address = re_address.search(response) if address: address = address.group(1).strip() #Important! for item in [\t,\r, br /]: address = address.replace(item,) As you found, your script works just fine, it's just that during terminal output the \r performs a carriage return and wipes out everything prior to it. FWIW, I've rarely seen a \r by itself, even in Windows (where it's usually \r\n). Unix generally just outputs the \n, so my guess is that some other process which created the output removed newline characters, but didn't account for the carriage return characters first. Wiping out the \r characters as you did will solve your display issues, though any other code should read right past them. ~G -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python not return first line?
On Mon, 16 Mar 2009 15:20:18 -0700, Falcolas wrote: FWIW, I've rarely seen a \r by itself, even in Windows (where it's usually \r\n). Unix generally just outputs the \n, so my guess is that some other process which created the output removed newline characters, but didn't account for the carriage return characters first. \r is the line terminator for classic Mac. (I think OS X uses \n, but presumably Apple applications are smart enough to use either.) I also remember a software package that allowed you to choose between \r \n and \n\r when exporting data to text files. It's been some years since I've used it -- by memory it was a custom EDI application for a rather large Australian hardware company. (Presumably their developers couldn't remember which came first, the \r or the \n, so they made it optional.) The Unicode standard specifies that all of the following should be considered line terminators: LF:Line Feed, U+000A CR:Carriage Return, U+000D CR+LF: CR followed by LF, U+000D followed by U+000A NEL: Next Line, U+0085 FF:Form Feed, U+000C LS:Line Separator, U+2028 PS:Paragraph Separator, U+2029 http://en.wikipedia.org/wiki/Newline#Unicode so presumably if you're getting data from non-Windows or Unix systems, you could find any of these. Aren't standards wonderful? There are so many to choose from. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
[ActivePython 2.5.1.1] Why does Python not return first line?
Hello I'm stuck at why Python doesn't return the first line in this simple regex: === response = spanAddress :/span/td\r\t\ttd\r\t\t\t3 Abbey Road, St Johns Wood br /\r\t\t\tLondon, NW8 9AY\t\t/td re_address = re.compile('spanAddress :/span/td.+?td(.+?)/td',re.I | re.S | re.M) address = re_address.search(response) if address: address = address.group(1).strip() print address is %s % address else: print address not found === C:\test.py London, NW8 9AYbr / === Could this be due to the non-printable characters like TAB or ENTER? FWIW, I think that the original web page I'm trying to parse is from a *nix host. Thanks for any hint. -- http://mail.python.org/mailman/listinfo/python-list
Re: [ActivePython 2.5.1.1] Why does Python not return first line?
On Mon, 16 Mar 2009 01:14:00 +0100, Gilles Ganault nos...@nospam.com wrote: I'm stuck at why Python doesn't return the first line in this simple regex Found it: Python does extract the token, but displaying it requires removing hidden chars: = response = spanAddress :/span/td\r\t\ttd\r\t\t\t3 Abbey Road, St Johns Wood br /\r\t\t\tLondon, NW8 9AY\t\t/td re_address = re.compile('spanAddress :/span/td.+?td(.+?)/td',re.I | re.S | re.M) address = re_address.search(response) if address: address = address.group(1).strip() #Important! for item in [\t,\r, br /]: address = address.replace(item,) print address is %s % address else: print address not found = HTH, -- http://mail.python.org/mailman/listinfo/python-list
Re: [ActivePython 2.5.1.1] Why does Python not return first line?
On Sun, Mar 15, 2009 at 8:14 PM, Gilles Ganault nos...@nospam.com wrote: Hello I'm stuck at why Python doesn't return the first line in this simple regex: === response = spanAddress :/span/td\r\t\ttd\r\t\t\t3 Abbey Road, St Johns Wood br /\r\t\t\tLondon, NW8 9AY\t\t/td re_address = re.compile('spanAddress :/span/td.+?td(.+?)/td',re.I | re.S | re.M) address = re_address.search(response) if address: address = address.group(1).strip() print address is %s % address else: print address not found === C:\test.py London, NW8 9AYbr / === Could this be due to the non-printable characters like TAB or ENTER? FWIW, I think that the original web page I'm trying to parse is from a *nix host. Actually, the problem is that the only newlines you have on there are Mac OS Classic/Commodore newlines. Windows new lines date back to typewriters. There are two characters in a Windows newline- a carriage return (\r), which returns the cursor to the beginning of the line, and linefeed (\n) which moves to the next line. I think what's happening is that Windows tries to duplicate the commands from the typewritter- it returns to the beginning of the line at the carriage return, but doesn't move to a new one. The second half of the text overwrites the first half, and you get the problem you're seeing. The only way I can think of to fix this is to search for any carriage return not followed by a linefeed and add a linefeed in. http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python not return first line?
On Mar 16, 11:25 am, Gilles Ganault nos...@nospam.com wrote: On Mon, 16 Mar 2009 01:14:00 +0100, Gilles Ganault nos...@nospam.com wrote: I'm stuck at why Python doesn't return the first line in this simple regex Found it: Python does extract the token, but displaying it requires removing hidden chars: = response = spanAddress :/span/td\r\t\ttd\r\t\t\t3 Abbey Road, St Johns Wood br /\r\t\t\tLondon, NW8 9AY\t\t/td re_address = re.compile('spanAddress :/span/td.+?td(.+?)/td',re.I | re.S | re.M) address = re_address.search(response) if address: address = address.group(1).strip() When in doubt, use the repr() function (2.X) or the ascii() function (3.X); it will show you unambiguously exactly what you have in a string; in this case: '3 Abbey Road, St Johns Wood br /\r\t\t\tLondon, NW8 9AY' #Important! for item in [\t,\r, br /]: address = address.replace(item,) print address is %s % address and the result is: 3 Abbey Road, St Johns WoodLondon, NW8 9AY WoodLondon ?? Consider the possibility that whether the webpage originated on *x or not, the author inserted that br / with beneficial intent i.e. not just to annoy you. You may wish to replace it with something instead of discarding it. If you really want the address to look tidy, you could do something like this: def norm_space(s): return ' '.join(s.split()) tidy = , .join([norm_space(x) for x in address.replace('br /', ',').strip(' ,').split(',')]) Perhaps the br /) has even more significance (line break?) than a comma ... in which case you should split the address into lines first, and apply the tidy process to each line. HTH, John -- http://mail.python.org/mailman/listinfo/python-list