Re: Any algorithm to preserve whitespaces?

2013-01-24 Thread Santosh Kumar
On 1/24/13, Peter Otten __pete...@web.de wrote:
 Santosh Kumar wrote:

 Yes, Peter got it right.

 Now, how can I replace:

 script, givenfile = argv

 with something better that takes argv[1] as input file as well as
 reads input from stdin.

 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:

 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack

 I want both input methods.

 You can use argparse and its FileType:

 import argparse
 import sys

 parser = argparse.ArgumentParser()
 parser.add_argument(infile, type=argparse.FileType(r), nargs=?,
 default=sys.stdin)
 args = parser.parse_args()

 for line in args.infile:
 print line.strip().title() # replace with your code


This works file when I do `script.py inputfile.txt`; capitalizes as
expected. But it work unexpected if I do `cat inputfile.txt |
script.py`; leaves the first word of each line and then capitalizes
remaining.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-24 Thread Peter Otten
Santosh Kumar wrote:

 On 1/24/13, Peter Otten __pete...@web.de wrote:
 Santosh Kumar wrote:

 Yes, Peter got it right.

 Now, how can I replace:

 script, givenfile = argv

 with something better that takes argv[1] as input file as well as
 reads input from stdin.

 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:

 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack

 I want both input methods.

 You can use argparse and its FileType:

 import argparse
 import sys

 parser = argparse.ArgumentParser()
 parser.add_argument(infile, type=argparse.FileType(r), nargs=?,
 default=sys.stdin)
 args = parser.parse_args()

 for line in args.infile:
 print line.strip().title() # replace with your code

 
 This works file when I do `script.py inputfile.txt`; capitalizes as
 expected. But it work unexpected if I do `cat inputfile.txt |
 script.py`; leaves the first word of each line and then capitalizes
 remaining.

I cannot reproduce that:

$ cat title.py 
#!/usr/bin/env python
import argparse
import sys

parser = argparse.ArgumentParser()
parser.add_argument(infile, type=argparse.FileType(r), nargs=?,
default=sys.stdin)
args = parser.parse_args()

for line in args.infile:
print line.strip().title() # replace with your code
$ cat inputfile.txt 
alpha beta
gamma delta epsilon
zeta
$ cat inputfile.txt | ./title.py 
Alpha Beta
Gamma Delta Epsilon
Zeta
$ ./title.py inputfile.txt 
Alpha Beta
Gamma Delta Epsilon
Zeta


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-24 Thread Santosh Kumar
But I can; see: http://pastebin.com/ZGGeZ71r

On 1/24/13, Peter Otten __pete...@web.de wrote:
 Santosh Kumar wrote:

 On 1/24/13, Peter Otten __pete...@web.de wrote:
 Santosh Kumar wrote:

 Yes, Peter got it right.

 Now, how can I replace:

 script, givenfile = argv

 with something better that takes argv[1] as input file as well as
 reads input from stdin.

 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:

 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack

 I want both input methods.

 You can use argparse and its FileType:

 import argparse
 import sys

 parser = argparse.ArgumentParser()
 parser.add_argument(infile, type=argparse.FileType(r), nargs=?,
 default=sys.stdin)
 args = parser.parse_args()

 for line in args.infile:
 print line.strip().title() # replace with your code


 This works file when I do `script.py inputfile.txt`; capitalizes as
 expected. But it work unexpected if I do `cat inputfile.txt |
 script.py`; leaves the first word of each line and then capitalizes
 remaining.

 I cannot reproduce that:

 $ cat title.py
 #!/usr/bin/env python
 import argparse
 import sys

 parser = argparse.ArgumentParser()
 parser.add_argument(infile, type=argparse.FileType(r), nargs=?,
 default=sys.stdin)
 args = parser.parse_args()

 for line in args.infile:
 print line.strip().title() # replace with your code
 $ cat inputfile.txt
 alpha beta
 gamma delta epsilon
 zeta
 $ cat inputfile.txt | ./title.py
 Alpha Beta
 Gamma Delta Epsilon
 Zeta
 $ ./title.py inputfile.txt
 Alpha Beta
 Gamma Delta Epsilon
 Zeta


 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-24 Thread Peter Otten
Santosh Kumar wrote:

 But I can; see: http://pastebin.com/ZGGeZ71r

You have messed with your cat command -- it adds line numbers.
Therefore the output of

cat somefile | ./argpa.py

differs from

./argpa.py somefile

Try

./argpa.py  somefile

to confirm my analysis. As to why your capitalisation algorithm fails on 
those augmented lines: the number is separated from the rest of the line by 
a TAB -- therefore the first word is 1\tthis and the only candidate to be 
capitalised is the 1. To fix this you could use regular expressions (which 
I wanted to avoid initially):

 parts = re.compile((\s+)).split( 1\tthis is it)
 parts
['', ' ', '1', '\t', 'this', ' ', 'is', ' ', 'it']

Process every other part as you wish and then join all parts:

 parts[::2] = [s.upper() for s in parts[::2]]
 parts
['', ' ', '1', '\t', 'THIS', ' ', 'IS', ' ', 'IT']
 print .join(parts)
 1  THIS IS IT


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Santosh Kumar
I am in a problem.

words = line.split(' ')

preserve whitespaces but the problem is it writes an additional line
after every line.


And:

words = line.split()

works as I expect (does not adds addition line after every line) but
does not preserves whitespaces.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Peter Otten
Santosh Kumar wrote:

 I am in a problem.
 
 words = line.split(' ')
 
 preserve whitespaces but the problem is it writes an additional line
 after every line.

Strip off the newline at the end of the line with:

line = line.rstrip(\n)
words = line.split( )



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Dave Angel

On 01/23/2013 04:20 AM, Santosh Kumar wrote:

I am in a problem.

 words = line.split(' ')

preserve whitespaces but the problem is it writes an additional line
after every line.



Think about what you said.  It might be clearer if you wrote:

but the problem is it doesn't strip off the newline (which is whitespace).

You might want to fix it by doing an rstrip(), as Peter said, or you 
might want to check if the last character is \n, and delete it if so.


Or you might want to fix the other logic where you use the reconstituted 
line, making sure it doesn't add an extra newline to a line that already 
has one.


Best answer depends on whether there might be other whitespace at the 
end of the line, and on whether you consider the newline part of the 
last field on the line.


Chances are that Peter's response is the one you want, but I had to 
point out that without a spec, we're really just guessing.  For another 
example, suppose that some of the words in the file are separated by 
tabs.  If so, perhaps you'd better rethink the whole split logic.





--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Santosh Kumar
Yes, Peter got it right.

Now, how can I replace:

script, givenfile = argv

with something better that takes argv[1] as input file as well as
reads input from stdin.

By input from stdin, I mean that currently when I do `cat foo.txt |
capitalizr` it throws a ValueError error:

Traceback (most recent call last):
  File /home/santosh/bin/capitalizr, line 16, in module
script, givenfile = argv
ValueError: need more than 1 value to unpack

I want both input methods.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Michael Torrie
On 01/23/2013 07:56 AM, Santosh Kumar wrote:
 Yes, Peter got it right.
 
 Now, how can I replace:
 
 script, givenfile = argv
 
 with something better that takes argv[1] as input file as well as
 reads input from stdin.
 
 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:
 
 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack
 
 I want both input methods.

That's up to your program logic to do.  Check to see if the arguments
have been provided, and if not, open sys.stdin.  It's quite common for
command-line utilities to do this, but most of them use an explicit
parameter '-' to indicate that you want the command to use standard-in.
 Again, you can code this any way you want.  Personally I use one of the
standard library command-line argument parsing modules, like optparse,
but there are others that may be better.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Peter Otten
Santosh Kumar wrote:

 Yes, Peter got it right.
 
 Now, how can I replace:
 
 script, givenfile = argv
 
 with something better that takes argv[1] as input file as well as
 reads input from stdin.
 
 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:
 
 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack
 
 I want both input methods.

You can use argparse and its FileType:

import argparse
import sys

parser = argparse.ArgumentParser()
parser.add_argument(infile, type=argparse.FileType(r), nargs=?, 
default=sys.stdin)
args = parser.parse_args()

for line in args.infile:
print line.strip().title() # replace with your code


As this has the small disadvantage that infile is opened immediately I tend 
to use a slight variation:

import argparse
import sys
from contextlib import contextmanager

@contextmanager
def xopen(filename):
if filename is None or filename == -:
yield sys.stdin
else:
with open(filename) as instream:
yield instream

parser = argparse.ArgumentParser()
parser.add_argument(infile, nargs=?)
args = parser.parse_args()

with xopen(args.infile) as instream:
for line in instream:
print line.strip().title()



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Steven D'Aprano
Santosh Kumar wrote:

 Yes, Peter got it right.

Peter? Which Peter? What's it that he got right?

You have deleted all context from your post, so I have no idea what you are
talking about. And whatever program you are using to post is stripping out
threading information, so I can't tell what post you are replying to.

Please take careful note of the posting conventions used by the experienced
regulars on this forum, and copy their style. That is for your benefit as
well as ours.


 Now, how can I replace:
 
 script, givenfile = argv
 
 with something better that takes argv[1] as input file as well as
 reads input from stdin.
 
 By input from stdin, I mean that currently when I do `cat foo.txt |
 capitalizr` it throws a ValueError error:
 
 Traceback (most recent call last):
   File /home/santosh/bin/capitalizr, line 16, in module
 script, givenfile = argv
 ValueError: need more than 1 value to unpack
 
 I want both input methods.

The usual convention in Unix and Linux is that if the file name is -, read
from stdin instead. Something like this, untested:


givenfile = sys.argv[1]
if givenfile == '-':
data = sys.stdin.read()
else:
data = open(givenfile).read()


Adding error checking etc. is left as an exercise.




-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-23 Thread Dave Angel

On 01/23/2013 07:49 PM, Steven D'Aprano wrote:

Santosh Kumar wrote:


Yes, Peter got it right.


Peter? Which Peter? What's it that he got right?

You have deleted all context from your post, so I have no idea what you are
talking about.


Right.


And whatever program you are using to post is stripping out
threading information, so I can't tell what post you are replying to.


You're not entirely right here.  Santosh's message threads correctly to 
mine when I look with Thunderbird. And mine is parallel to one by Peter 
Otten, who suggested rstrip() to get rid of the extra newline.  About 
10% of your posts show up as top-level (starting new threads), even 
though I know you're careful.  So there seem to be more than one 
threading protocol, and the multiple protocols are fighting each other. 
 I'd love to see a spec that I could use to (manually?) check whether 
the threads are right or not.


the relevant timestamps (at least as seen from USA EST zone) are
Santosh at 4:20 am
   Peter Otten at 4:46 am
   DaveA  at 5:34 am
  Santosh at 9:56 am
Steven D'Aprano at 7:49 pm

But your message was a reply to Santosh's 9:56 am message.

(I'm deleting the rest, because I'm not responding to the commandline 
parsing question)





--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-19 Thread Lie Ryan

On 19/01/13 21:13, Santosh Kumar wrote:

I have a working script which takes argv[1] as an input, deassembles
each line, and then each word. Then after it capitalizes all its word
(upcases the first letter) and then prints it out on the stdout.

That script does the capitalization work fine, but, when it reassemble
the the words, it does it like this:

 lines.append(' '.join(words))

The biggest problem is, even when the input file has many spaces, it
strips it down to one.


replace:

words = line.split()

with:
words = line.split(' ')

 The whole script will look clumsy here. I have put it up on GitHub,
 here is it: 
https://github.com/santosh/capitalizr.py/blob/master/capitalizr


In general, when the script is just this short, it's better to put it 
directly on the message.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Any algorithm to preserve whitespaces?

2013-01-19 Thread Mitya Sirenef

On 01/19/2013 05:13 AM, Santosh Kumar wrote:

I have a working script which takes argv[1] as an input, deassembles
each line, and then each word. Then after it capitalizes all its word
(upcases the first letter) and then prints it out on the stdout.

That script does the capitalization work fine, but, when it reassemble
the the words, it does it like this:

 lines.append(' '.join(words))

The biggest problem is, even when the input file has many spaces, it
strips it down to one.

A file with this line:

This line containsmany   spaces
becomes:

This Line Contains Many Spaces


The whole script will look clumsy here. I have put it up on GitHub,
here is it: https://github.com/santosh/capitalizr.py/blob/master/capitalizr


You know that mystr.title() can do this?

 - m


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

--
http://mail.python.org/mailman/listinfo/python-list