On Mar 4, 2:30 pm, MRAB <pyt...@mrabarnett.plus.com> wrote: > Pete Emerson wrote: > > I've written my first python program, and would love suggestions for > > improvement. > > > I'm a perl programmer and used a perl version of this program to guide > > me. So in that sense, the python is "perlesque" > > > This script parses /etc/hosts for hostnames, and based on terms given > > on the command line (argv), either prints the list of hostnames that > > match all the criteria, or uses ssh to connect to the host if the > > number of matches is unique. > > > I am looking for advice along the lines of "an easier way to do this" > > or "a more python way" (I'm sure that's asking for trouble!) or > > "people commonly do this instead" or "here's a slick trick" or "oh, > > interesting, here's my version to do the same thing". > > > I am aware that there are performance improvements and error checking > > that could be made, such as making sure the file exists and is > > readable and precompiling the regular expressions and not calculating > > how many sys.argv arguments there are more than once. I'm not hyper > > concerned with performance or idiot proofing for this particular > > script. > > > Thanks in advance. > > > ######################################################## > > #!/usr/bin/python > > > import sys, fileinput, re, os > > > filename = '/etc/hosts' > > > hosts = [] > > > for line in open(filename, 'r'): > > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > > Use 'raw' strings for regular expressions. > > 'Normal' Python string literals use backslashes in escape sequences (eg, > '\n'), but so do regular expressions, and regular expressions are passed > to the 're' module in strings, so that can lead to a profusion of > backslashes! > > You could either escape the backslashes: > > match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line) > > or use a raw string: > > match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > > A raw string literal is just like a normal string literal, except that > backslashes aren't special. > > > if match is None or re.search('^(?:float|localhost)\.', line): > > continue > > It would be more 'Pythonic' to say "not match". > > if not match or re.search(r'^(?:float|localhost)\.', line): > > > hostname = match.group(1) > > count = 0 > > for arg in sys.argv[1:]: > > for section in hostname.split('.'): > > if section == arg: > > count = count + 1 > > Shorter is: > > count += 1 > > > break > > if count == len(sys.argv) - 1: > > hosts.append(hostname) > > > if len(hosts) == 1: > > os.system("ssh -A " + hosts[0]) > > else: > > print '\n'.join(hosts) > > You're splitting the hostname repeatedly. You could split it just once, > outside the outer loop, and maybe make it into a set. You could then > have: > > host_parts = set(hostname.split('.')) > count = 0 > for arg in sys.argv[1:]: > if arg in host_parts: > count += 1 > > Incidentally, the re module contains a cache, so the regular expressions > won't be recompiled every time (unless you have a lot of them and the re > module flushes the cache).
I think that you really just need a set intersection for the count part and this should work: host_parts = set(hostname.split('.')) count = len(host_parts.intersection(sys.argv[1:])) -- http://mail.python.org/mailman/listinfo/python-list