Re: Any algorithm to preserve whitespaces?
On 1/24/13, Peter Otten __pete...@web.de wrote: Santosh Kumar wrote: Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. You can use argparse and its FileType: import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code This works file when I do `script.py inputfile.txt`; capitalizes as expected. But it work unexpected if I do `cat inputfile.txt | script.py`; leaves the first word of each line and then capitalizes remaining. -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Santosh Kumar wrote: On 1/24/13, Peter Otten __pete...@web.de wrote: Santosh Kumar wrote: Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. You can use argparse and its FileType: import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code This works file when I do `script.py inputfile.txt`; capitalizes as expected. But it work unexpected if I do `cat inputfile.txt | script.py`; leaves the first word of each line and then capitalizes remaining. I cannot reproduce that: $ cat title.py #!/usr/bin/env python import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code $ cat inputfile.txt alpha beta gamma delta epsilon zeta $ cat inputfile.txt | ./title.py Alpha Beta Gamma Delta Epsilon Zeta $ ./title.py inputfile.txt Alpha Beta Gamma Delta Epsilon Zeta -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
But I can; see: http://pastebin.com/ZGGeZ71r On 1/24/13, Peter Otten __pete...@web.de wrote: Santosh Kumar wrote: On 1/24/13, Peter Otten __pete...@web.de wrote: Santosh Kumar wrote: Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. You can use argparse and its FileType: import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code This works file when I do `script.py inputfile.txt`; capitalizes as expected. But it work unexpected if I do `cat inputfile.txt | script.py`; leaves the first word of each line and then capitalizes remaining. I cannot reproduce that: $ cat title.py #!/usr/bin/env python import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code $ cat inputfile.txt alpha beta gamma delta epsilon zeta $ cat inputfile.txt | ./title.py Alpha Beta Gamma Delta Epsilon Zeta $ ./title.py inputfile.txt Alpha Beta Gamma Delta Epsilon Zeta -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Santosh Kumar wrote: But I can; see: http://pastebin.com/ZGGeZ71r You have messed with your cat command -- it adds line numbers. Therefore the output of cat somefile | ./argpa.py differs from ./argpa.py somefile Try ./argpa.py somefile to confirm my analysis. As to why your capitalisation algorithm fails on those augmented lines: the number is separated from the rest of the line by a TAB -- therefore the first word is 1\tthis and the only candidate to be capitalised is the 1. To fix this you could use regular expressions (which I wanted to avoid initially): parts = re.compile((\s+)).split( 1\tthis is it) parts ['', ' ', '1', '\t', 'this', ' ', 'is', ' ', 'it'] Process every other part as you wish and then join all parts: parts[::2] = [s.upper() for s in parts[::2]] parts ['', ' ', '1', '\t', 'THIS', ' ', 'IS', ' ', 'IT'] print .join(parts) 1 THIS IS IT -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
I am in a problem. words = line.split(' ') preserve whitespaces but the problem is it writes an additional line after every line. And: words = line.split() works as I expect (does not adds addition line after every line) but does not preserves whitespaces. -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Santosh Kumar wrote: I am in a problem. words = line.split(' ') preserve whitespaces but the problem is it writes an additional line after every line. Strip off the newline at the end of the line with: line = line.rstrip(\n) words = line.split( ) -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
On 01/23/2013 04:20 AM, Santosh Kumar wrote: I am in a problem. words = line.split(' ') preserve whitespaces but the problem is it writes an additional line after every line. Think about what you said. It might be clearer if you wrote: but the problem is it doesn't strip off the newline (which is whitespace). You might want to fix it by doing an rstrip(), as Peter said, or you might want to check if the last character is \n, and delete it if so. Or you might want to fix the other logic where you use the reconstituted line, making sure it doesn't add an extra newline to a line that already has one. Best answer depends on whether there might be other whitespace at the end of the line, and on whether you consider the newline part of the last field on the line. Chances are that Peter's response is the one you want, but I had to point out that without a spec, we're really just guessing. For another example, suppose that some of the words in the file are separated by tabs. If so, perhaps you'd better rethink the whole split logic. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
On 01/23/2013 07:56 AM, Santosh Kumar wrote: Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. That's up to your program logic to do. Check to see if the arguments have been provided, and if not, open sys.stdin. It's quite common for command-line utilities to do this, but most of them use an explicit parameter '-' to indicate that you want the command to use standard-in. Again, you can code this any way you want. Personally I use one of the standard library command-line argument parsing modules, like optparse, but there are others that may be better. -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Santosh Kumar wrote: Yes, Peter got it right. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. You can use argparse and its FileType: import argparse import sys parser = argparse.ArgumentParser() parser.add_argument(infile, type=argparse.FileType(r), nargs=?, default=sys.stdin) args = parser.parse_args() for line in args.infile: print line.strip().title() # replace with your code As this has the small disadvantage that infile is opened immediately I tend to use a slight variation: import argparse import sys from contextlib import contextmanager @contextmanager def xopen(filename): if filename is None or filename == -: yield sys.stdin else: with open(filename) as instream: yield instream parser = argparse.ArgumentParser() parser.add_argument(infile, nargs=?) args = parser.parse_args() with xopen(args.infile) as instream: for line in instream: print line.strip().title() -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
Santosh Kumar wrote: Yes, Peter got it right. Peter? Which Peter? What's it that he got right? You have deleted all context from your post, so I have no idea what you are talking about. And whatever program you are using to post is stripping out threading information, so I can't tell what post you are replying to. Please take careful note of the posting conventions used by the experienced regulars on this forum, and copy their style. That is for your benefit as well as ours. Now, how can I replace: script, givenfile = argv with something better that takes argv[1] as input file as well as reads input from stdin. By input from stdin, I mean that currently when I do `cat foo.txt | capitalizr` it throws a ValueError error: Traceback (most recent call last): File /home/santosh/bin/capitalizr, line 16, in module script, givenfile = argv ValueError: need more than 1 value to unpack I want both input methods. The usual convention in Unix and Linux is that if the file name is -, read from stdin instead. Something like this, untested: givenfile = sys.argv[1] if givenfile == '-': data = sys.stdin.read() else: data = open(givenfile).read() Adding error checking etc. is left as an exercise. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
On 01/23/2013 07:49 PM, Steven D'Aprano wrote: Santosh Kumar wrote: Yes, Peter got it right. Peter? Which Peter? What's it that he got right? You have deleted all context from your post, so I have no idea what you are talking about. Right. And whatever program you are using to post is stripping out threading information, so I can't tell what post you are replying to. You're not entirely right here. Santosh's message threads correctly to mine when I look with Thunderbird. And mine is parallel to one by Peter Otten, who suggested rstrip() to get rid of the extra newline. About 10% of your posts show up as top-level (starting new threads), even though I know you're careful. So there seem to be more than one threading protocol, and the multiple protocols are fighting each other. I'd love to see a spec that I could use to (manually?) check whether the threads are right or not. the relevant timestamps (at least as seen from USA EST zone) are Santosh at 4:20 am Peter Otten at 4:46 am DaveA at 5:34 am Santosh at 9:56 am Steven D'Aprano at 7:49 pm But your message was a reply to Santosh's 9:56 am message. (I'm deleting the rest, because I'm not responding to the commandline parsing question) -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
On 19/01/13 21:13, Santosh Kumar wrote: I have a working script which takes argv[1] as an input, deassembles each line, and then each word. Then after it capitalizes all its word (upcases the first letter) and then prints it out on the stdout. That script does the capitalization work fine, but, when it reassemble the the words, it does it like this: lines.append(' '.join(words)) The biggest problem is, even when the input file has many spaces, it strips it down to one. replace: words = line.split() with: words = line.split(' ') The whole script will look clumsy here. I have put it up on GitHub, here is it: https://github.com/santosh/capitalizr.py/blob/master/capitalizr In general, when the script is just this short, it's better to put it directly on the message. -- http://mail.python.org/mailman/listinfo/python-list
Re: Any algorithm to preserve whitespaces?
On 01/19/2013 05:13 AM, Santosh Kumar wrote: I have a working script which takes argv[1] as an input, deassembles each line, and then each word. Then after it capitalizes all its word (upcases the first letter) and then prints it out on the stdout. That script does the capitalization work fine, but, when it reassemble the the words, it does it like this: lines.append(' '.join(words)) The biggest problem is, even when the input file has many spaces, it strips it down to one. A file with this line: This line containsmany spaces becomes: This Line Contains Many Spaces The whole script will look clumsy here. I have put it up on GitHub, here is it: https://github.com/santosh/capitalizr.py/blob/master/capitalizr You know that mystr.title() can do this? - m -- Lark's Tongue Guide to Python: http://lightbird.net/larks/ -- http://mail.python.org/mailman/listinfo/python-list