Re: python3 raw strings and \u escapes
On 05/30/2012 09:07 AM, ru...@yahoo.com wrote: On 05/30/2012 05:54 AM, Thomas Rachel wrote: Am 30.05.2012 08:52 schrieb ru...@yahoo.com: This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A'] I can remove the r prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the r prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) Probably it is more consequent. Alas, it makes the whole stuff incompatible to Py2. But if you think about it: why allow for \u if \r, \n etc. are disallowed as well? Maybe the blame is elsewhere then... If the re module interprets (in a regex string) the 2-character string consisting of r'\' followed by 'n' as a single newline character, then why wasn't re changed for Python 3 to interpret the 6-character string, r'\u3000' as a single unicode character to correspond with Python's lexer no longer doing that (as it did in Python 2)? And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read, but should do the trick... I guess the +s could be left out allowing something like, '[ \u3000]' r'\w+ \d{3}' but I'll have to try it a little; maybe just doubling backslashes won't be much worse. I did that for years in Perl and lived through it. Just for some closure, there are many places in my code that I had/have to track down and change. But the biggest problem so far is a lexer module that is structured as many dozens of little functions, each with a docstring that is a regex string. The only way I found change these and maintain sanity was to go through them and remove the r prefix from any strings that contain \u literals, and then double any other backslashes in the string. Since these are docstrings, creating them with executable code was awkward, and using adjacent string concatenation led to a very confusing mix of string styles. Strings that used concatenation often had a single logical regex structure (eg a character set [...]) split between two strings. The extra quote characters were as visually confusing as doubled backslashes in many cases. Strings with doubled backslashes, although harder to read were, were much easier to edit reliably and in their way, more regular. It does make this module look very Perlish though... :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: python3 raw strings and \u escapes
On 05/31/2012 03:10 PM, Chris Angelico wrote: On Fri, Jun 1, 2012 at 6:28 AM, ru...@yahoo.com ru...@yahoo.com wrote: ... a lexer module that is structured as many dozens of little functions, each with a docstring that is a regex string. This may be a good opportunity to take a step back and ask yourself: Why so many functions, each with a regular expression in its docstring? Because that's the way David Beazley designed Ply? http://dabeaz.com/ply/ Personally, I think it's an abuse of docstrings but he never asked me for my opinion... -- http://mail.python.org/mailman/listinfo/python-list
python3 raw strings and \u escapes
In python2, \u escapes are processed in raw unicode strings. That is, ur'\u3000' is a string of length 1 consisting of the IDEOGRAPHIC SPACE unicode character. In python3, \u escapes are not processed in raw strings. r'\u3000' is a string of length 6 consisting of a backslash, 'u', '3' and three '0' characters. This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A'] I can remove the r prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the r prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? -- http://mail.python.org/mailman/listinfo/python-list
Re: python3 raw strings and \u escapes
On 05/30/2012 05:54 AM, Thomas Rachel wrote: Am 30.05.2012 08:52 schrieb ru...@yahoo.com: This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') == [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) == ['A\u3000A'] I can remove the r prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the r prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) Probably it is more consequent. Alas, it makes the whole stuff incompatible to Py2. But if you think about it: why allow for \u if \r, \n etc. are disallowed as well? Maybe the blame is elsewhere then... If the re module interprets (in a regex string) the 2-character string consisting of r'\' followed by 'n' as a single newline character, then why wasn't re changed for Python 3 to interpret the 6-character string, r'\u3000' as a single unicode character to correspond with Python's lexer no longer doing that (as it did in Python 2)? And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read, but should do the trick... I guess the +s could be left out allowing something like, '[ \u3000]' r'\w+ \d{3}' but I'll have to try it a little; maybe just doubling backslashes won't be much worse. I did that for years in Perl and lived through it. -- http://mail.python.org/mailman/listinfo/python-list
Re: python3 raw strings and \u escapes
On 05/30/2012 10:46 AM, Terry Reedy wrote: On 5/30/2012 2:52 AM, ru...@yahoo.com wrote: In python2, \u escapes are processed in raw unicode strings. That is, ur'\u3000' is a string of length 1 consisting of the IDEOGRAPHIC SPACE unicode character. That surprised me until I rechecked the fine manual and found: When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \u and \U escape sequences are processed while all other backslashes are left in the string. When 'u' was removed in Python 3, a choice had to be made and the first must have seemed to be the obvious one, or perhaps the automatic one. In 3.3, 'u' is being restored. I have inquired on pydev list whether the difference above should also be restored, and mentioned this thread. As mentioned is a different message, another option might be to leave raw strings as is (more consistent since all backslashes are treated the same) and have the re module un-escape \u (and similar) literals in regex string (also more consistent since that's what it does with '\\n', '\\t', etc.) I do realize though that this may have back-compatibilty problems that makes it impossible to do. -- http://mail.python.org/mailman/listinfo/python-list
2to3 inscrutable output
What is this output from 2to3 supposed to mean? $ cat mysub.py isinstance (3, (int,float)) $ 2to3 -f isinstance mysub.py RefactoringTool: No changes to mysub.py RefactoringTool: Files that need to be modified: RefactoringTool: mysub.py Why does mysub.py need to be modified, and how? -- http://mail.python.org/mailman/listinfo/python-list
Re: 2to3 for 2.7
On 05/27/2012 07:53 AM, Steven D'Aprano wrote: On Sat, 26 May 2012 19:37:33 -0700, ru...@yahoo.com wrote: Is there a list of fixers I can tell 2to3 to use that will limit changes to things that will continue to run under python-2.7? So you want a 2to2? Yes. :-) I suggest you read the Fine Manual and choose the fixers you want to apply yourself: http://docs.python.org/library/2to3.html That, plus a bit of trial-and-error at the interactive prompt, will soon tell you what works and what doesn't. But read on for my suggestions. That, and the 2.6 and 2.7 What's New's and the docs for the 3.x backported features mentioned therein... I've started to do just that but if someone else has already distilled all this information... I want to start the 2-3 trip by making my code as py3 compatible (under py2) as possible before going the rest of the way to py3, and having 2to3 help with this seems like a good idea. Your project, your decision, but it doesn't sound like a good idea to me, unless your project is quite small or efficiency is not high on your list of priorities. You risk making your 2.7 version significantly slower and less efficient than your 2.6 version, but without actually gaining 3.x compatibility. I can't really migrate my project until wxPython does. But I've read a number of conversion experiences ranging from ran 2to3 and everything was golden to needing to make some serious design decisions (usually in the bytes/str area) to months of effort to get all the little glitches wrung out. So I have no idea what is in store for me. By doing some of the conversion now I can hopefully get a better sense of what is in store and get some of the work done earlier rather than later. There is also the generally useful heuristic of dividing a larger task into two smaller independent tasks... And finally, there is the question about maintaining 2/3 compatibility in a single codebase. I don't have a hard requirement for this but if it is doable without too much effort, I would prefer to do so. ISTM that looking at what remains left to do after the 2.7 code has been 3-ifided as much as possible will allow me to make a better judgment about that. (For what it's worth, I try to aim at 3.x compatibility as the priority, and if that means my code is a bit slower under 2.5-2.7, that's a price I'm willing to pay.) The problem is that many of the idioms that work well in Python 3 will be less efficient, and therefore slower, in Python 2.7. For example, consider this Python 2.x loop, iterating lazily over a dict efficiently: I did not spend much time optimizing for performance when writing the code, so it probably doesn't make sense to worry about it now, unless a really large performance difference is likely (which seems to me unlikely given that I don't have any really large in-memory data). Thanks for the tip though; it is something I'll remain alert for. [...] For what it's worth, I'd try these fixers: apply except exec execfile exitfunc has_key idioms ne next paren print raise repr tuple_params ws_comma xreadlines plus from __future__ import print, and see what breaks :) Also, don't forget future_builtins: http://docs.python.org/library/future_builtins.html Good luck, and if you do go ahead with this, please consider posting an update here, or writing a blog post with details of how successful it was. Thanks for that list. Sans anything more definitive it is a good starting point. -- http://mail.python.org/mailman/listinfo/python-list
2to3 for 2.7
Is there a list of fixers I can tell 2to3 to use that will limit changes to things that will continue to run under python-2.7? I want to start the 2-3 trip by making my code as py3 compatible (under py2) as possible before going the rest of the way to py3, and having 2to3 help with this seems like a good idea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Create directories and modify files with Python
On 04/30/2012 05:24 PM, deltaquat...@gmail.com wrote: Hi, I would like to automate some simple tasks I'm doing by hand. Given a text file foobar.fo: 073 1.819 085 2.132 100 2.456 115 2.789 I need to create the directories 073, 085, 100, 115, and copy in each directory a modified version of the text file input.in: . . . foo = 1.5 ! edit this value . . . bar = 1.5 ! this one, too . . . Tthe modification consists in substituting the number in the above lines with the value associated to the directory in the file foobar.fo. Thus, the input.in file in the directory 100 will be: . . . foo = 2.456 ! edit this value . . . bar = 2.456 ! this one, too . . . At first, I tried to write a bash script to do this. However, when and if the script will work, I'll probably want to add more features to automate some other tasks. So I thought about using some other language, to have a more flexible and mantainable code. I've been told that both Python and perl are well suited for such tasks, but unfortunately I know neither of them. Can you show me how to write the script in Python? Thanks, Perhaps something like this will get you started? To keep things simple (since this is illustrative code) there is little parameterization and no error handling. Apologies if Google screws up the formatting too bad. from __future__ import print_function #1 import os def main(): listf = open ('foobar.fo') for line in listf: dirname, param = line.strip().split() #7 make_directory (dirname, param) def make_directory (dirname, param): os.mkdir (dirname) #11 tmplf = open (input.in) newf = open (dirname + '/' + 'input.in', 'w') #13 for line in tmplf: if line.startswith ('foo = ') or line.startswith ('bar = '): #15 line = line.replace (' 1.5 ', ' '+param+' ') #16 print (line, file=newf, end='') #17 if __name__ == '__main__': main()#19 #1: Not sure whether you're using Python 2 or 3. I ran this on Python 2.7 and think it will run on Python 3 if you remove this line. #7:The strip() method removes the '\n' characters from the end of the lines as well as any other extraneous leading or trailing whitespace. The split() method here breaks the line into two pieces on the whitespace in the middle. See http://docs.python.org/library/stdtypes.html#string-methods #11: This will create subdirectory 'dirname' relative to the current directory of course. See http://docs.python.org/library/os.html#os.mkdir #13: Usually, is is more portable to use os.path.join() to concatenate path components but since you stated you are on Linux (and / works on Windows too), creating the path with / is easier to follow in this example. For open() see http://docs.python.org/library/functions.html#open #15: Depending on your data, you might want to use the re (regular expression) module here if the simple string substitution is not sufficient. #16: For simplicity I just blindly replaced the 1.5 text in the string. Depending of your files, you might want to parameterize this of do something more robust or sophisticated. #17: Since we did not strip the trailing '\n' of the lines we read from input.in, we use end='' to prevent print from adding an additional '\n'. See http://docs.python.org/library/functions.html#print #19: This line is required to actually get your python file to do anything. :-) Hope this gets you started. I think you will find doing this kind of thing in Python is much easier in the long run than with bash scripts. A decent resource for learning the basics of Python is the standard Python tutorial: http://docs.python.org/tutorial/index.html -- http://mail.python.org/mailman/listinfo/python-list
argparse missing optparse capabilities?
I have optparse code that parses a command line containing intermixed positional and optional arguments, where the optional arguments set the context for the following positional arguments. For example, myprogram.py arg1 -c33 arg2 arg3 -c44 arg4 'arg1' is processed in a default context, 'args2' and 'arg3' in context '33', and 'arg4' in context '44'. I am trying to do the same using argparse but it appears to be not doable in a documented way. Here is the working optparse code (which took 30 minutes to write using just the optparse docs): import optparse def append_with_pos (option, opt_str, value, parser): if getattr (parser.values, option.dest, None) is None: setattr (parser.values, option.dest, []) getattr (parser.values, option.dest).append ((value, len (parser.largs))) def opt_parse(): p = optparse.OptionParser() p.add_option (-c, type=int, action='callback', callback=append_with_pos) opts, args = p.parse_args() return args, opts if __name__ == '__main__': args, opts = opt_parse() print args, opts Output from the command line above: ['arg1', 'arg2', 'arg3', 'arg4'] {'c': [(33, 1), (44, 3)]} The -c values are stored as (value, arglist_position) tuples. Here is an attempt to convert to argparse using the guidelines in the argparse docs: import argparse class AppendWithPos (argparse.Action): def __call__ (self, parser, namespace, values, option_string=None): if getattr (namespace, self.dest, None) is None: setattr (namespace, self.dest, []) getattr (namespace, self.dest).extend ((values, len (parser.largs))) def arg_parse(): p = argparse.ArgumentParser (description='description') p.add_argument ('src', nargs='*') p.add_argument ('-c', type=int, action=AppendWithPos) opts = p.parse_args() return opts if __name__ == '__main__': opts = arg_parse() print opts This fails with, AttributeError: 'ArgumentParser' object has no attribute 'largs' and of course, the argparse.parser is not documented beyond how to instantiate it. Even were that not a problem, argparse complains about unrecognised arguments for any positional arguments that occur after an optional one. I've been farting with this code for a day now. Any suggestions on how I can convince argparse to do what optparse does easily will be very welcome. (I tried parse_known_args() but that breaks help and requires me to detect truly unknown arguments.) (Python 2.7.1 if it matters and apologies if Google mangles the formatting of this post.) -- http://mail.python.org/mailman/listinfo/python-list
Re: argparse missing optparse capabilities?
On Jan 5, 1:05 am, ru...@yahoo.com ru...@yahoo.com wrote: class AppendWithPos (argparse.Action): def __call__ (self, parser, namespace, values, option_string=None): if getattr (namespace, self.dest, None) is None: setattr (namespace, self.dest, []) getattr (namespace, self.dest).extend ((values, len (parser.largs))) I realized right after posting that the above line should be I think, getattr (namespace, self.dest).extend ((values, len (namespace.src))) but that still doesn't help with the unrecognised arguments problem. -- http://mail.python.org/mailman/listinfo/python-list
Re: argparse missing optparse capabilities?
On 01/05/2012 02:19 AM, Ulrich Eckhardt wrote: Am 05.01.2012 09:05, schrieb ru...@yahoo.com: I have optparse code that parses a command line containing intermixed positional and optional arguments, where the optional arguments set the context for the following positional arguments. For example, myprogram.py arg1 -c33 arg2 arg3 -c44 arg4 'arg1' is processed in a default context, 'args2' and 'arg3' in context '33', and 'arg4' in context '44'. Question: How would you e.g. pass the string -c33 as first argument, i.e. to be parsed in the default context? There will not be a need for that. The point is that you separate the parameters in a way that makes it possible to parse them in a way that works 100%, not just a way that works in 99% of all cases. I agree that one should strive for a syntax that works 100% but in this case, the simplicity and intuitiveness of the existing command syntax outweigh by far the need for having it work in very improbable corner cases. (And I'm sure I've seen this syntax used in other unix command line tools in the past though I don't have time to look for examples now.) If argparse does not handle this syntax for some such purity reason (as opposed to, for example. it is hard to do in argparse's current design) then argparse is mistakenly putting purity before practicality. For that reason, many commandline tools accept -- as separator, so that cp -- -r -x will copy the file -r to the folder -x. In that light, I would consider restructuring your commandline. In my case that's not possible since I am replacing an existing tool with a Python application and changing the command line syntax is not an option. I am trying to do the same using argparse but it appears to be not doable in a documented way. As already hinted at, I don't think this is possible and that that is so by design. Thanks for the confirmation. I guess that shows that optparse has a reason to exist beyond backwards compatibility. -- http://mail.python.org/mailman/listinfo/python-list
Re: argparse missing optparse capabilities?
On 01/05/2012 11:46 AM, Ian Kelly wrote: On Thu, Jan 5, 2012 at 11:14 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Jan 5, 2012 at 1:05 AM, ru...@yahoo.com ru...@yahoo.com wrote: I have optparse code that parses a command line containing intermixed positional and optional arguments, where the optional arguments set the context for the following positional arguments. For example, myprogram.py arg1 -c33 arg2 arg3 -c44 arg4 'arg1' is processed in a default context, 'args2' and 'arg3' in context '33', and 'arg4' in context '44'. I am trying to do the same using argparse but it appears to be not doable in a documented way. [...] Sorry, I missed the second part of that. You seem to be right, as far as I can tell from tinkering with it, all the positional arguments have to be in a single group. If you have some positional arguments followed by an option followed by more positional arguments, and any of the arguments have a loose nargs quantifier ('?' or '*' or '+'), then you get an error. OK, thanks for the second confirmation. I was hoping there was something I missed or some undocumented option to allow intermixed optional and positional arguments with Argparse but it appears not. I notice that Optparse seems to intentionally provide this capability since it offers a disable_interspersed_args() method. It is unfortunate that Argparse chose to not to provide backward compatibility for this thus forcing some users to continue using a deprecated module. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fixing the XML batteries
On Dec 13, 5:32 am, Stefan Behnel stefan...@behnel.de wrote: ... In Python 2.7/3.2, ElementTree has support for C14N serialisation, just pass the option method=c14n. Where in the Python docs can one find information about this? [previous post disappeared, sorry if I double posted or replied to author inadvertently.] -- http://mail.python.org/mailman/listinfo/python-list
Re: Fixing the XML batteries
On Dec 13, 5:32 am, Stefan Behnel stefan...@behnel.de wrote: ... In Python 2.7/3.2, ElementTree has support for C14N serialisation, just pass the option method=c14n. Where does one find information in the Python documentation about this? -- http://mail.python.org/mailman/listinfo/python-list
Re: Fixing the XML batteries
On Dec 13, 1:21 pm, Stefan Behnel stefan...@behnel.de wrote: ru...@yahoo.com, 13.12.2011 20:37: On Dec 13, 5:32 am, Stefan Behnel wrote: In Python 2.7/3.2, ElementTree has support for C14N serialisation, just pass the option method=c14n. Where does one find information in the Python documentation about this? Hmm, interesting. I though it had, but now when I click on the stdlib doc link to read the module source (hint, hint), I realize the source is available (having had to use it way too many time in the past, not just with ET), but that does not justify omission from the docs. However the point is moot since (as you say) it seems the Python-distributed ET doesn't contain the c14n feature. I can see that it only has the hooks. The C14N support module of ET 1.3 was not integrated into the stdlib. Sorry for not verifying this earlier. So you actually need the external package for C14N support. See here: http://effbot.org/zone/elementtree-13-intro.htm http://hg.effbot.org/et-2009-provolone/src/tip/elementtree/elementtre... Just to emphasize this once again: it's not more than a single module that you can copy into your own code as a fallback import, or deploy in your local installations. Right, but many times I try to avoid external dependencies when feasible. Thanks for the clarifications. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to generate error when argument are not supplied and there is no explicit defults (in optparse)?
On 10/14/2011 03:29 PM, Peng Yu wrote: Hi, The following code doesn't give me error, even I don't specify the value of filename from the command line arguments. filename gets 'None'. I checked the manual, but I don't see a way to let OptionParser fail if an argument's value (which has no default explicitly specified) is not specified. I may miss some thing in the manual. Could any expert let me know if there is a way to do so? Thanks! #!/usr/bin/env python from optparse import OptionParser usage = 'usage: %prog [options] arg1 arg2' parser = OptionParser(usage=usage) parser.set_defaults(verbose=True) parser.add_option('-f', '--filename') #(options, args) = parser.parse_args(['-f', 'file.txt']) (options, args) = parser.parse_args() print options.filename You can check it yourself. I find I use a pretty standard pattern with optparse: def main (args, opts): ... def parse_cmdline (): p = OptionParser() p.add_option('-f', '--filename') options, args = parser.parse_args() if not options.filename: p.error (-f option required) if len (args) != 2: p.error (Expected exactly 2 arguments) # Other checks can obviously be done here too. return args, options if __name__ == '__main__': args, opts = parse_cmdline() main (args, opts) While one can probably subclass OptionParser or use callbacks to achieve the same end, I find the above approach simple and easy to follow. I also presume you know that you have can optparse produce a usage message by adding 'help' arguments to the add_option() calls? And as was mentioned in another post, argparse in Python 2.7 (or in earlier Pythons by downloading/installing it yourself) can do the checking you want. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
On 08/19/2011 11:33 AM, Matt Funk wrote: On Friday, August 19, 2011, Alain Ketterlin wrote: Matt Funk matze...@gmail.com writes: thanks for the suggestion. I guess i had found another way around the problem as well. But i really wanted to match the line exactly and i wanted to know why it doesn't work. That is less for the purpose of getting the thing to work but more because it greatly annoys me off that i can't figure out why it doesn't work. I.e. why the expression is not matches {32} times. I just don't get it. Because a line is not 32 times a number, it is a number followed by 31 times a space followed by a number. Using Jason's regexp, you can build the regexp step by step: number = r\d\.\d+e\+\d+ numbersequence = r%s( %s){31} % (number,number) That didn't work either. Using the (modified (where the (.+) matches the end of the line)) expression as: number = r\d\.\d+e\+\d+ numbersequence = r%s( %s){31}(.+) % (number,number) instance_linetype_pattern = re.compile(numbersequence) The results obtained are: results: [(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')] so this matches the last number plus the string at the end of the line, but no retaining the previous numbers. The secret is buried very unobtrusively in the re docs, where it has caught me out in the past. Specifically in the docs for re.group(): If a group is contained in a part of the pattern that matched multiple times, the last match is returned. In addition to the findall solution someone else posted, another thing you could do is to explicitly express the groups in your re: number = r\d\.\d+e\+\d+ groups = (r( %s) % number)*31 numbersequence = r%s%s(.+) % (number,groups) ... results = match_object.group(range(1,33)) Or (what I would probably do), simply match the whole string of numbers and pull it apart later: number = r\d\.\d+e\+\d+ numbersequence = r(%s(?: %s){31})(.+) % (number,number) results = (match_object.group(1)).split() [none of this code is tested but should be close enough to convey the general idea.] -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/08/2011 03:01 AM, Duncan Booth wrote: ru...@yahoo.com ru...@yahoo.com wrote: On 06/06/2011 09:29 AM, Steven D'Aprano wrote: Yes, but you have to pay the cost of loading the re engine, even if it is a one off cost, it's still a cost, [...] At least part of the reason that there's no difference there is that the 're' module was imported in both cases: Quite right. I should have thought of that. [...] Steven is right to assert that there's a cost to loading it, but unless you jump through hoops it's not a cost you can avoid paying and still use Python. I would say that it is effectively zero cost then. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/07/2011 06:30 PM, Roy Smith wrote: On 06/06/2011 08:33 AM, rusi wrote: Evidently for syntactic, implementation and cultural reasons, Perl programmers are likely to get (and then overuse) regexes faster than python programmers. ru...@yahoo.com ru...@yahoo.com wrote: I don't see how the different Perl and Python cultures themselves would make learning regexes harder for Python programmers. Oh, that part's obvious. People don't learn things in a vacuum. They read about something, try it, fail, and ask for help. If, in one community, the response they get is, I see what's wrong with your regex, you need to ..., and in another they get, You shouldn't be using a regex there, you should use this string method instead..., it should not be a surprise that it's easier to learn about regexes in the first community. I think we are just using different definitions of harder. I said, immediately after the sentence you quoted, At most I can see the Perl culture encouraging their use and the Python culture discouraging it, but that doesn't change the ease or difficulty of learning. Constantly being told not to use regexes certainly discourages one from learning them, but I don't think that's the same as being *harder* to learn in Python. The syntax of regexes is, at least at the basic level, pretty universal, and it is in learning to understand that syntax that most of any difficulty lies. Whether to express a regex as /code (blue)|(red)/i in Perl or (r'code (blue)|(red)', re.I) in Python is a superficial difference, as is, say, using match results: $alert = $1' vs alert = m.group(1). A Google for python regular expression tutorial produces lots of results including the Python docs HOWTO. And because the syntax is pretty universal, leaving the python off that search string will yield many, many more that are applicable. Although one does get some don't do that responses to regex questions on this list (and some are good advice), there are also usually answers too. So I think of it as more of a Python culture thing, rather then being actually harder to learn to use regexes in Python although I see how one can view it your way too. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/06/2011 09:29 AM, Steven D'Aprano wrote: On Sun, 05 Jun 2011 23:03:39 -0700, ru...@yahoo.com wrote: [...] I would argue that the first, non-regex solution is superior, as it clearly distinguishes the multiple steps of the solution: * filter lines that start with CUSTOMER * extract fields in that line * validate fields (not shown in your code snippet) while the regex tries to do all of these in a single command. This makes the regex an all or nothing solution: it matches *everything* or *nothing*. This means that your opportunity for giving meaningful error messages is much reduced. E.g. I'd like to give an error message like: found digit in customer name (field 2) but with your regex, if it fails to match, I have no idea why it failed, so can't give any more meaningful error than: invalid customer line and leave it to the caller to determine what makes it invalid. (Did I misspell CUSTOMER? Put a dot after the initial? Forget the code? Use two spaces between fields instead of one?) I agree that is a legitimate criticism. Its importance depends greatly on the purpose and consumers of the code. While such detailed error messages might be appropriate in a fully polished product, in my case, I often have to process files personally to extract information, or to provide code to others (who typically have at least some degree of technical sophistication) to do the same. In this case, being able to code something quickly, and adapt it quickly to changes is more important than providing highly detailed error messages. The format is simple enough that invalid customer line and the line number is perfectly adaquate. YMMV. As I said, regexes are a tool, like any tool, to be used appropriately. [...] In addition to being wrong (loading is done once, compilation is typically done once or a few times, while the regex is used many times inside a loop so the overhead cost is usually trivial compared with the cost of starting Python or reading a file), this is another micro-optimization argument. Yes, but you have to pay the cost of loading the re engine, even if it is a one off cost, it's still a cost, ~$ time python -c 'pass' real0m0.015s user0m0.011s sys 0m0.003s ~$ time python -c 'import re' real0m0.015s user0m0.011s sys 0m0.003s Or do you mean something else by loading the re engine? and sometimes (not always!) it can be significant. It's quite hard to write fast, tiny Python scripts, because the initialization costs of the Python environment are so high. (Not as high as for, say, VB or Java, but much higher than, say, shell scripts.) In a tiny script, you may be better off avoiding regexes because it takes longer to load the engine than to run the rest of your script! Do you have an example? I am having a hard time imagining that. Perhaps you are thinking on the time require to compile a RE? ~$ time python -c 'import re; re.compile(r^[^()]*(\([^()]*\)[^()]*)* $)' real0m0.017s user0m0.014s sys 0m0.003s Hard to imagine a case where where 15mS is fast enough but 17mS is too slow. And that's without the diluting effect of actually doing some real work in the script. Of course a more complex regex would likely take longer. (The times vary greatly on my machine, I am quoting the most common lowest but not absolutely lowest results.) (Note that Apocalypse is referring to a series of Perl design documents and has nothing to do with regexes in particular.) But Apocalypse 5 specifically has everything to do with regexes. That's why I linked to that, and not (say) Apocalypse 2. Where did I suggest that you should have linked to Apocalypse 2? I wrote what I wrote to point out that the Apocalypse title was not a pejorative comment on regexes. I don't see how I could have been clearer. Possibly by saying what you just said here? I never suggested, or implied, or thought, that Apocalypse was a pejorative comment on *regexes*. The fact that I referenced Apocalypse FIVE suggests strongly that there are at least four others, presumably not about regexes. Nor did I ever suggest you did. Don't forget that you are not the only person reading this list. The comment was for the benefit of others. Perhaps you are being overly sensitive? [...] If regexes were more readable, as proposed by Wall, that would go a long way to reducing my suspicion of them. I am delighted to read that you find the new syntax more acceptable. Perhaps I wasn't as clear as I could have been. I don't know what the new syntax is. I was referring to the design principle of improving the readability of regexes. Whether Wall's new syntax actually does improve readability and ease of maintenance is a separate issue, one on which I don't have an opinion on. I applaud his *intention* to reform regex syntax, without necessarily agreeing that he has done so. Thanks for clarifying. But since you earlier wrote in response to MRAB, http
Re: how to avoid leading white spaces
On 06/06/2011 08:33 AM, rusi wrote: For any significant language feature (take recursion for example) there are these issues: 1. Ease of reading/skimming (other's) code 2. Ease of writing/designing one's own 3. Learning curve 4. Costs/payoffs (eg efficiency, succinctness) of use 5. Debug-ability I'll start with 3. When someone of Kernighan's calibre (thanks for the link BTW) says that he found recursion difficult it could mean either that Kernighan is a stupid guy -- unlikely considering his other achievements. Or that C is not optimal (as compared to lisp say) for learning recursion. Just as a side comment, I didn't see anything in the link Chris Torek posted (repeated here since it got snipped: http://www.princeton.edu/~hos/frs122/precis/kernighan.htm) that said Kernighan found recursion difficult, just that it was perceived as expensive. Nor that the expense had anything to do with programming language but rather was due to hardware constraints of the time. But maybe you are referring to some other source? Evidently for syntactic, implementation and cultural reasons, Perl programmers are likely to get (and then overuse) regexes faster than python programmers. If by get, you mean understand, then I'm not sure why the reasons you give should make a big difference. Regex syntax is pretty similar in both Python and Perl, and virtually identical in terms of learning their basics. There are some differences in the how regexes are used between Perl and Python that I mentioned in http://groups.google.com/group/comp.lang.python/msg/39fca0d4589f4720?, but as I said there, that wouldn't, particularly in light of Python culture where one-liners and terseness are not highly valued, seem very important. And I don't see how the different Perl and Python cultures themselves would make learning regexes harder for Python programmers. At most I can see the Perl culture encouraging their use and the Python culture discouraging it, but that doesn't change the ease or difficulty of learning. And why do you say overuse regexs? Why isn't it the case that Perl programmers use regexes appropriately in Perl? Are you not arbitrarily applying a Python-centric standard to a different culture? What if a Perl programmer says that Python programmers under-use regexes? 1 is related but not the same as 3. Someone with courses in automata, compilers etc -- standard CS stuff -- is unlikely to find regexes a problem. Conversely an intelligent programmer without a CS background may find them more forbidding. I'm not sure of that. (Not sure it should be that way, perhaps it may be that way in practice.) I suspect that a good theoretical understanding of automata theory would be essential in writing a regex compiler but I'm not sure it is necessary to use regexes. It does I'm sure give one a solid understanding of the limitations of regexes but a practical understanding of those can be achieved without the full course I think. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/03/2011 08:05 PM, Steven D'Aprano wrote: On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote: I often find myself changing, for example, a startwith() to a RE when I realize that the input can contain mixed case Why wouldn't you just normalise the case? Because some of the text may be case-sensitive. Perhaps you misunderstood me. You don't have to throw away the unnormalised text, merely use the normalized text in the expression you need. Of course, if you include both case-sensitive and insensitive tests in the same calculation, that's a good candidate for a regex... or at least it would be if regexes supported that :) I did not choose a good example to illustrate what I find often motivates my use of regexes. You are right that for a simple .startwith() using a regex just in case is not a good choice, and in fact I would not do that. The process that I find often occurs is that I write (or am about to write string method solution and when I think more about the input data (which is seldom well-specified), I realize that using a regex I can get better error checking, do more of the parsing in one place, and adapt to changes in input format better than I could with a .startswith and a couple other such methods. Thus what starts as if line.startswith ('CUSTOMER '): try: kw, first_initial, last_name, code, rest = line.split(None, 4) ... often turns into (sometimes before it is written) something like m = re.match (r'CUSTOMER (\w+) (\w+) ([A-Z]\d{3})') if m: first_initial, last_name, code = m.group(...) [...] or that I have to treat commas as well as spaces as delimiters. source.replace(,, ).split( ) Uhgg. create a whole new string just so you can split it on one rather than two characters? You say that like it's expensive. No, I said it like it was ugly. Doing things unrelated to the task at hand is ugly. And not very adaptable -- see my reply to Chris Torek's post. I understand it is a common idiom and I use it myself, but in this case there is a cleaner alternative with re.split that expresses exactly what one is doing. And how do you what the regex engine is doing under the hood? For all you know, it could be making hundreds of temporary copies and throwing them away. Or something. It's a black box. That's a silly argument. And how do you know what replace is doing under the hood? I would expect any regex processor to compile the regex into an FSM. As usual, I would expect to pay a small performance price for the generality, but that is reasonable tradeoff in many cases. If it were a potential problem, I would test it. What I wouldn't do is throw away a useful tool because, golly, I don't know, maybe it'll be slow -- that's just a form of cargo cult programming. The fact that creating a whole new string to split on is faster than *running* the regex (never mind compiling it, loading the regex engine, and anything else that needs to be done) should tell you which does more work. Copying is cheap. Parsing is expensive. In addition to being wrong (loading is done once, compilation is typically done once or a few times, while the regex is used many times inside a loop so the overhead cost is usually trivial compared with the cost of starting Python or reading a file), this is another micro-optimization argument. I'm not sure why you've suddenly developed this obsession with wringing every last nanosecond out of your code. Usually it is not necessary. Have you thought of buying a faster computer? Or using C? *wink* Sorry, but I find re.split ('[ ,]', source) states much more clearly exactly what is being done with no obfuscation. That's because you know regex syntax. And I'd hardly call the version with replace obfuscated. Certainly the regex is shorter, and I suppose it's reasonable to expect any reader to know at least enough regex to read that, so I'll grant you that this is a small win for clarity. A micro-optimization for readability, at the expense of performance. Obviously this is a simple enough case that the difference is minor but when the pattern gets only a little more complex, the clarity difference becomes greater. Perhaps. But complicated tasks require complicated regexes, which are anything but clear. Complicated tasks require complicated code as well. As another post pointed out, there are ways to improve the clarity of a regex such as the re.VERBOSE flag. There is no doubt that a regex encapsulates information much more densely than python string manipulation code. One should not be surprised that is might take as much time and effort to understand a one-line regex as a dozen (or whatever) lines Python code that do the same thing. In most cases I'll bet, given equal fluency in regexes and Python, the regex will take less. [...] After doing this a number of times, one starts to use an RE right from the get go unless one is VERY sure that there will be no requirements creep. YAGNI
Re: how to avoid leading white spaces
On 06/03/2011 02:49 PM, Neil Cerutti wrote: On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote: or that I have to treat commas as well as spaces as delimiters. source.replace(,, ).split( ) Uhgg. create a whole new string just so you can split it on one rather than two characters? Sorry, but I find re.split ('[ ,]', source) It's quibbling to complain about creating one more string in an operation that already creates N strings. It's not the time it take to create the string, its the doing of things that aren't really needed to accomplish the task: The re.split says directly and with no extraneous actions, split 'source' on either spaces or commas. This of course is a trivial example but used thoughtfully, REs allow you to be very precise about what you are doing, versus using tricks like substituting individual characters first so you can split on a single character afterwards. Here's another alternative: list(itertools.chain.from_iterable(elem.split( ) for elem in source.split(,))) You seriously find that clearer than re.split('[ ,]') above? I have no further comment. :-) It's weird looking, but delimiting text with two different delimiters is weird. Perhaps, but real-world input data is often very weird. Try parsing a text database of a circa 1980 telephone company phone directory sometime. :-) [...] - they are another language to learn, a very cryptic a terse language; Chinese is cryptic too but there are a few billion people who don't seem to be bothered by that. Chinese *would* be a problem if you proposed it as the solution to a problem that could be solved by using a persons native tongue instead. My point was that cryptic is in large part an inverse function of knowledge. If I always go out of my way to avoid regexes, than likely I will never become comfortable with them and they will always seem cryptic. To someone who uses them more often, they will seem less cryptic. They may never have the clarity of Python but neither is Python code a very clear way to describe text patterns. As for needing to learn them (S D'A comment), shrug. Programmers are expected to learn new things all the time, many even do so for fun. REs (practical use that is) in the grand scheme of things are not that hard. They are I think a lot easier to learn than SQL, yet it is common here to see recommendations to use sqlite rather than an ad-hoc concoction of Python dicts. [...] - and thanks in part to Perl's over-reliance on them, there's a tendency among many coders (especially those coming from Perl) to abuse and/or misuse regexes; people react to that misuse by treating any use of regexes with suspicion. So you claim. I have seen more postings in here where REs were not used when they would have simplified the code, then I have seen regexes used when a string method or two would have done the same thing. Can you find an example or invent one? I simply don't remember such problems coming up, but I admit it's possible. Sure, the response to the OP of this thread. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/03/2011 03:45 PM, Chris Torek wrote: On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote: [prefers] re.split ('[ ,]', source) This is probably not what you want in dealing with human-created text: re.split('[ ,]', 'foo bar, spam,maps') ['foo', '', 'bar', '', 'spam', 'maps'] Instead, you probably want a comma followed by zero or more spaces; or, one or more spaces: re.split(r',\s*|\s+', 'foo bar, spam,maps') ['foo', 'bar', 'spam', 'maps'] or perhaps (depending on how you want to treat multiple adjacent commas) even this: re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs') ['foo', 'bar', 'spam', 'maps', 'eggs'] Which to me, illustrates nicely the power of a regex to concisely localize the specification of an input format and adapt easily to changes in that specification. although eventually you might want to just give in and use the csv module. :-) (Especially if you want to be able to quote commas, for instance.) Which internally uses regexes, at least for the sniffer function. (The main parser is in C presumably for speed, this being a library module and all.) ... With regexes the code is likely to be less brittle than a dozen or more lines of mixed string functions, indexes, and conditionals. In article 94svm4fe7...@mid.individual.net Neil Cerutti ne...@norwich.edu wrote: [lots of snippage] That is the opposite of my experience, but YMMV. I suspect it depends on how familiar the user is with regular expressions, their abilities, and their limitations. I suspect so too at least in part. People relatively new to REs always seem to want to use them to count (to balance parentheses, for instance). People who have gone through the compiler course know better. :-) But also, a thing I think sometimes gets forgotten, is if the max nesting depth is finite, parens can be balanced with a regex. This is nice for the particularly common case of a nest depth of 1 (balanced but non-nested parens.) -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/02/2011 07:21 AM, Neil Cerutti wrote: On 2011-06-01, ru...@yahoo.com ru...@yahoo.com wrote: For some odd reason (perhaps because they are used a lot in Perl), this groups seems to have a great aversion to regular expressions. Too bad because this is a typical problem where their use is the best solution. Python's str methods, when they're sufficent, are usually more efficient. Unfortunately, except for the very simplest cases, they are often not sufficient. I often find myself changing, for example, a startwith() to a RE when I realize that the input can contain mixed case or that I have to treat commas as well as spaces as delimiters. After doing this a number of times, one starts to use an RE right from the get go unless one is VERY sure that there will be no requirements creep. And to regurgitate the mantra frequently used to defend Python when it is criticized for being slow, the real question should be, are REs fast enough? The answer almost always is yes. Perl integrated regular expressions, while Python relegated them to a library. Which means that one needs an one extra import re line that is not required in Perl. Since RE strings are complied and cached, one often need not compile them explicitly. Using match results is often requires more lines than in Perl: m = re.match (...) if m: do something with m rather than Perl's if m/.../ {do something with capture group globals} Any true Python fan should not find this a problem, the stock response being, what's the matter, your Enter key broken? There are thus a large class of problems that are best solve with regular expressions in Perl, but str methods in Python. Guess that depends on what one's definition of large is. There are a few simple things, admittedly common, that Python provides functions for that Perl uses REs for: replace(), for example. But so what? I don't know if Perl does it or not but there is no reason why functions called with string arguments or REs with no magic characters can't be optimized to something about as efficient as a corresponding Python function. Such uses are likely to be naively counted as using an RE in Perl. I would agree though that the selection of string manipulation functions in Perl are not as nice or orthogonal as in Python, and that this contributes to a tendency to use REs in Perl when one doesn't need to. But that is a programmer tradeoff (as in Python) between fast-coding/slow-execution and slow-coding/fast-execution. I for one would use Perl's index() and substr() to identify and manipulate fixed patterns when performance was an issue. One runs into the same tradeoff in Python pretty quickly too so I'm not sure I'd call that space between the two languages large. The other tradeoff, applying both to Perl and Python is with maintenance. As mentioned above, even when today's requirements can be solved with some code involving several string functions, indexes, and conditionals, when those requirements change, it is usually a lot harder to modify that code than a RE. In short, although your observations are true to some extent, they are not sufficient to justify the anti-RE attitude often seen here. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/03/2011 07:17 AM, Neil Cerutti wrote: On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote: The other tradeoff, applying both to Perl and Python is with maintenance. As mentioned above, even when today's requirements can be solved with some code involving several string functions, indexes, and conditionals, when those requirements change, it is usually a lot harder to modify that code than a RE. In short, although your observations are true to some extent, they are not sufficient to justify the anti-RE attitude often seen here. Very good article. Thanks. I mostly wanted to combat the notion that that the alleged anti-RE attitude here might be caused by an opposition to Perl culture. I contend that the anti-RE attitude sometimes seen here is caused by dissatisfaction with regexes in general combined with an aversion to the re module. I agree that it's not that bad, but it's clunky enough that it does contribute to making it my last resort. But I questioned the reasons given (not as efficient, not built in, not often needed) for dissatisfaction with REs.[*] If those reasons are not strong, then is not their Perl-smell still a leading candidate for explaining the anti-RE attitude here? Of course the whole question, lacking some serious group-psychological investigation, is pure speculation anyway. [*] A reason for not using REs not mentioned yet is that REs take some time to learn. Thus, although most people will know how to use Python string methods, only a subset of those will be familiar with REs. But that doesn't seem like a reason for RE bashing either since REs are easier to learn than SQL and one frequently sees recommendations here to use sqlite. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 06/03/2011 08:25 AM, Steven D'Aprano wrote: On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote: On 06/02/2011 07:21 AM, Neil Cerutti wrote: Python's str methods, when they're sufficent, are usually more efficient. Unfortunately, except for the very simplest cases, they are often not sufficient. Maybe so, but the very simplest cases occur very frequently. Right, and I stated that. I often find myself changing, for example, a startwith() to a RE when I realize that the input can contain mixed case Why wouldn't you just normalise the case? Because some of the text may be case-sensitive. [...] or that I have to treat commas as well as spaces as delimiters. source.replace(,, ).split( ) Uhgg. create a whole new string just so you can split it on one rather than two characters? Sorry, but I find re.split ('[ ,]', source) states much more clearly exactly what is being done with no obfuscation. Obviously this is a simple enough case that the difference is minor but when the pattern gets only a little more complex, the clarity difference becomes greater. [...] re.split is about four times slower than the simple solution. If this processing is a bottleneck, by all means use a more complex hard-coded replacement for a regex. In most cases that won't be necessary. After doing this a number of times, one starts to use an RE right from the get go unless one is VERY sure that there will be no requirements creep. YAGNI. IAHNI. (I actually have needed it.) There's no need to use a regex just because you think that you *might*, someday, possibly need a regex. That's just silly. If and when requirements change, then use a regex. Until then, write the simplest code that will solve the problem you have to solve now, not the problem you think you might have to solve later. I would not recommend you use a regex instead of a string method solely because you might need a regex later. But when you have to spend 10 minutes writing a half-dozen lines of python versus 1 minute writing a regex, your evaluation of the possibility of requirements changing should factor into your decision. [...] In short, although your observations are true to some extent, they are not sufficient to justify the anti-RE attitude often seen here. I don't think that there's really an *anti* RE attitude here. It's more a skeptical, cautious attitude to them, as a reaction to the Perl when all you have is a hammer, everything looks like a nail love affair with regexes. Yes, as I said, the regex attitude here seems in large part to be a reaction to their frequent use in Perl. It seems anti- to me in that I often see cautions about their use but seldom see anyone pointing out that they are often a better solution than a mass of twisty little string methods and associated plumbing. There are a few problems with regexes: - they are another language to learn, a very cryptic a terse language; Chinese is cryptic too but there are a few billion people who don't seem to be bothered by that. - hence code using many regexes tends to be obfuscated and brittle; No. With regexes the code is likely to be less brittle than a dozen or more lines of mixed string functions, indexes, and conditionals. - they're over-kill for many simple tasks; - and underpowered for complex jobs, and even some simple ones; Right, like all tools (including Python itself) they are suited best for a specific range of problems. That range is quite wide. - debugging regexes is a nightmare; Very complex ones, perhaps. Nightmare seems an overstatement. - they're relatively slow; So is Python. In both cases, if it is a bottleneck then choosing another tool is appropriate. - and thanks in part to Perl's over-reliance on them, there's a tendency among many coders (especially those coming from Perl) to abuse and/or misuse regexes; people react to that misuse by treating any use of regexes with suspicion. So you claim. I have seen more postings in here where REs were not used when they would have simplified the code, then I have seen regexes used when a string method or two would have done the same thing. But they have their role to play as a tool in the programmers toolbox. We agree. Regarding their syntax, I'd like to point out that even Larry Wall is dissatisfied with regex culture in the Perl community: http://www.perl.com/pub/2002/06/04/apo5.html You did see the very first sentence in this, right? Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information. (Note that Apocalypse is referring to a series of Perl design documents and has nothing to do with regexes in particular.) Synopsis 05 is (AFAICT with a quick scan) a proposal for revising regex syntax. I didn't see anything about de-emphasizing them in Perl. (But I have no idea what is going on for Perl 6 so I could be wrong about that.) As for the original
Re: how to avoid leading white spaces
On Jun 1, 11:11 am, Chris Rebert c...@rebertia.com wrote: On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar Hi i have a file which contains data //ACCDJ EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB, // UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ ' //ACCT EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB, // UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCT ' //ACCUM EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB, // UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCUM ' //ACCUM1 EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB, // UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCUM1 ' i want to cut the white spaces which are in between single quotes after TABLE=. for example : 'ACCT[spaces] ' 'ACCUM ' 'ACCUM1 ' the above is the output of another python script but its having a leading spaces. Er, you mean trailing spaces. Since this is easy enough to be homework, I will only give an outline: 1. Use str.index() and str.rindex() to find the positions of the starting and ending single-quotes in the line. 2. Use slicing to extract the inside of the quoted string. 3. Use str.rstrip() to remove the trailing spaces from the extracted string. 4. Use slicing and concatenation to join together the rest of the line with the now-stripped inner string. Relevant docs:http://docs.python.org/library/stdtypes.html#string-methods For some odd reason (perhaps because they are used a lot in Perl), this groups seems to have a great aversion to regular expressions. Too bad because this is a typical problem where their use is the best solution. import re f = open (your file) for line in f: fixed = re.sub (r(TABLE='\S+)\s+'$, r\1', line) print fixed, (The above is for Python-2, adjust as needed for Python-3) -- http://mail.python.org/mailman/listinfo/python-list
Re: checking if a list is empty
On 05/12/2011 12:13 AM, Steven D'Aprano wrote: [snip] http://www.codinghorror.com/blog/2006/07/separating-programming-sheep-from-non-programming-goats.html Shorter version: it seems that programming aptitude is a bimodal distribution, with very little migration from the can't program hump into the can program hump. There does seem to be a simple predictor for which hump you fall into: those who intuitively develop a consistent model of assignment (right or wrong, it doesn't matter, so long as it is consistent) can learn to program. Those who don't, can't. A later paper by the same authors... (http://www.eis.mdx.ac.uk/research/PhDArea/saeed/paper3.pdf) Abstract: [...] Despite a great deal of research into teaching methods and student responses, there have been to date no strong predictors of success in learning to program. Two years ago we appeared to have discovered an exciting and enigmatic new predictor of success in a first programming course. We now report that after six experiments, involving more than 500 students at six institutions in three countries, the predictive effect of our test has failed to live up to that early promise. We discuss the strength of the effects that have been observed and the reasons for some apparent failures of prediction. -- http://mail.python.org/mailman/listinfo/python-list
Re: opinion: comp lang docs style
On 01/04/2011 11:29 PM, Steven D'Aprano wrote: On Tue, 04 Jan 2011 15:17:37 -0800, ru...@yahoo.com wrote: If one wants to critique the 'Python Docs', especially as regards to usefulness to beginners, one must start with the Tutorial; and if one wants to use if statements as an example, one must start with the above. No. The language reference (LR) and standard library reference (SLR) must stand on their own merits. It is nice to have a good tutorial for those who like that style of learning. But it should be possible for a programmer with a basic understanding of computers and some other programming languages to understand how to program in python without referring to tutorials, explanatory websites, commercially published books, the source code, etc. No it shouldn't. That's what the tutorial is for. The language reference and standard library reference are there to be reference manuals, not to teach beginners Python. Yes it should. That's not what the tutorial is for. The (any) tutorial is for people new to python, often new to programming, who have the time and a learning style suitable for sitting down and going through a slow step-by-step exposition, much as one would get in a classroom. That is a perfectly valid way for someone in that target audience to learn python. Your (and Terry's) mistake is to presume that it is appropriate for everyone, perhaps because it worked for you personally. There is a large class of potential python users for whom a tutorial is highly suboptimal -- people who have some significant programming experience, who don't have the time or patience required to go through it getting information serially bit by bit, or whos learning style is, don't spoon feed me, just tell me concisely what python does, who fill in gaps on a need-to-know basis rather than linearly. I (and many others) don't need or want an explanation of how to use lists as a stack! A language reference manual should completely and accurately describe the language it documents. (That seems fairly obvious to me although there will be differing opinions of how precise one needs to be, etc.) Once it meets that minimum standard, it's quality is defined by how effectively it transfers that information to its target audience. A good reference manual meets the learning needs of the target audience above admirably. I learned Perl (reputedly more difficult to learn than Python) from the Perl manpages and used it for many many years before I ever bought a Perl book. I learned C mostly from Harbison and Steele's C: A Reference. Despite several attempts at python using its reference docs, I never got a handle on it until I forked out money for Beazley's book. There is obviously nothing inherently difficult about python -- it's just that python's reference docs are written for people who already know python. Since limiting their scope that narrowly is not necessary, as other languages show, it is fair to say that python's reference docs are poorer. In any case, your assumption that any one documentation work should stand on its own merits is nonsense -- *nothing* stands alone. Everything builds on something else. Technical documentation is no different: it *must* assume some level of knowledge of its readers -- should it be aimed at Python experts, or average Python coders, or beginners, or beginners to programming, or at the very least is it allowed to assume that the reader already knows how to read? You can't satisfy all of these groups with one document, because their needs are different and in conflict. This is why you have different documentation -- tutorials and reference manuals and literate source code and help text are all aimed at different audiences. Expecting one document to be useful for all readers' needs is like expecting one data type to be useful for all programming tasks. I defined (roughly) the target audience I was talking about when I wrote for a programmer with a basic understanding of computers and some other programming languages. Let's dispense with the 6th-grade arguments about people who don't know how to read, etc. Reasonable people might disagree on what a particular documentation work should target, and the best way to target it, but not on the need for different documentation for different targets. As I hope I clarified above, that was exactly my point too. There is a significant, unsatisfied gap between the audience that a tutorial aims at, and the audience that the reference docs as currently written seem to be aimed at. Since other language manuals incorporate this gap audience more or less sucessfully in their reference manuals, python's failure to do so is justification for calling them poor. (Of course they are poor in lots of other ways too but my original response was prompted by the erroneous claim that good (in my sense above) reference manuals were unnecessary because a tutorial exists.) -- http://mail.python.org/mailman
Re: opinion: comp lang docs style
On 01/05/2011 12:23 AM, Alice Bevan–McGregor wrote: On 2011-01-04 22:29:31 -0800, Steven D'Aprano said: In any case, your assumption that any one documentation work should stand on its own merits is nonsense -- *nothing* stands alone. +1 I responded more fully in my response to Steven but you like he is taking stand on it's own merits out of context. The context I gave was someone who wants a complete and accurate description of python and who understands programming with other languages but not python. How many RFCs still in use today don't start with: The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 RFC 2119 is incorporated in the others by reference. It is purely a matter of technical convenience that those definitions, which are common to hundreds of RFCs, are factored out to a single common location. RFC 2119 is not a tutorial. I posted a response on the article itself, rather than pollute a mailing list with replies to a troll. The name calling was a rather large hint as to the intention of the opinion, either that or whoever translated the article (man or machine) was really angry at the time. ;) I can hint to my neighbor that his stereo is too loud by throwing a brick through his window. Neither that nor calling people arrogant ignoramus is acceptable in polite society. I am not naive, nor not shocked that c.l.p is not always polite, and normally would not have even commented on it except that 1) Terry Reedy is usually more polite and thoughtful, and 2) Xah Lee's post was not a troll -- it was a legitimate comment on free software documentation (including specifically python's) and while I don't agree with some of his particulars, the Python docs would be improved if some of his comments were considered rather than dismissed with mindless epithets like troll and arrogant ignoramus. -- http://mail.python.org/mailman/listinfo/python-list
Re: opinion: comp lang docs style
On 01/04/2011 01:34 PM, Terry Reedy wrote: On 1/4/2011 1:24 PM, an Arrogant Ignoramus wrote: what he called a opinion piece. I normally do not respond to trolls, but while expressing his opinions, AI made statements that are factually wrong at least as regards Python and its practitioners. Given that most trolls include factually false statements, the above is inconsistent. And speaking of arrogant, it is just that to go around screaming troll about a posting relevant to the newsgroup it was posted in because you don't happen to agree with its content. In doing so you lower your own credibility. (Which is also not helped by your Arrogant Ignoramus name-calling.) [...] 2. AI also claims that this notation is 'incomprehensible'. Since incomprehensibility is clearly subjective your claim that it is a factual error is every bit as hyperbolic as his. [...] 3. AI's complaint is deceptive and deficient in omitting any mention the part of the docs *intended* to teach beginners: the Tutorial. The main doc pages list the Tutorial first, as what one should start with. That [...] If one wants to critique the 'Python Docs', especially as regards to usefulness to beginners, one must start with the Tutorial; and if one wants to use if statements as an example, one must start with the above. No. The language reference (LR) and standard library reference (SLR) must stand on their own merits. It is nice to have a good tutorial for those who like that style of learning. But it should be possible for a programmer with a basic understanding of computers and some other programming languages to understand how to program in python without referring to tutorials, explanatory websites, commercially published books, the source code, etc. The difficulty of doing that is a measure of the failure of the python docs to achive a level quality commensurate with the language itself. FWIW, I think the BNF in the LR is perfectly reasonable given the target audience I gave above. The failure of the LR has more to do with missing or excessively terse material -- it concentrates too exclusively on syntax and insufficiently on semantics. Much of the relevant semantics information is currently mislocated in the SLR. -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance: sets vs dicts.
On 09/02/2010 02:47 PM, Terry Reedy wrote: On 9/1/2010 10:57 PM, ru...@yahoo.com wrote: So while you may think most people rarely read the docs for basic language features and objects (I presume you don't mean to restrict your statement to only sets), I and most people I know *do* read them. And when read them I expect them, as any good reference documentation does, to completely and accurately describe the behavior of the item I am reading about. If big-O performance is deemed an intrinsic behavior of an (operation of) an object, it should be described in the documentation for that object. However, big-O performance is intentionally NOT so deemed. The discussion, as I understood it, was about whether or not it *should* be so deemed. And I have and would continue to argue that it should not be, for multiple reasons. Yes, you have. And others have argued the opposite. Personally, I did not find your arguments very convincing, particularly that it would be misleading or that the limits necessarily imposed by a real implementation somehow invalidates the usefulness of O() documentation. But I acknowledged that there was not universal agreement that O() behavior should be documented in the the reference docs by qualifying my statement with the word if. But mostly my comments were directed towards some of the side comments in Raymond's post I thought should not pass unchallenged. I think that some of the attitudes expressed (and shared by others) are likely the direct cause of many of the faults I find in the currrent documentation. -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance: sets vs dicts.
On 09/01/2010 04:51 PM, Raymond Hettinger wrote: On Aug 30, 6:03 am, a...@pythoncraft.com (Aahz) wrote: That reminds me: one co-worker (who really should have known better ;-) had the impression that sets were O(N) rather than O(1). Although writing that off as a brain-fart seems appropriate, it's also the case that the docs don't really make that clear, it's implied from requiring elements to be hashable. Do you agree that there should be a comment? There probably ought to be a HOWTO or FAQ entry on algorithmic complexity that covers classes and functions where the algorithms are interesting. That will concentrate the knowledge in one place where performance is a main theme and where the various alternatives can be compared and contrasted. I think most users of sets rarely read the docs for sets. The few lines in the tutorial are enough so that most folks just get it and don't read more detail unless they attempting something exotic. I think that attitude is very dangerous. There is a long history in this world of one group of people presuming what another group of people does or does not do or think. This seems to be a characteristic of human beings and is often used to promote one's own ideology. And even if you have hard evidence for what you say, why should 60% of people who don't read docs justify providing poor quality docs to the 40% that do? So while you may think most people rarely read the docs for basic language features and objects (I presume you don't mean to restrict your statement to only sets), I and most people I know *do* read them. And when read them I expect them, as any good reference documentation does, to completely and accurately describe the behavior of the item I am reading about. If big-O performance is deemed an intrinsic behavior of an (operation of) an object, it should be described in the documentation for that object. Your use of the word exotic is also suspect. I learned long ago to always click the advanced options box on dialogs because most developers/- designers really don't have a clue about what users need access to. Our docs have gotten somewhat voluminous, No they haven't (relative to what they attempt to describe). The biggest problem with the docs is that they are too terse. They often appear to have been written by people playing a game of who can describe X in the minimum number of words that can still be defended as correct. While that may be fun, good docs are produced by considering how to describe something to the reader, completely and accurately, as effectively as possible. The test is not how few words were used, but how quickly the reader can understand the object or find the information being sought about the object. so it's unlikely that adding that particular needle to the haystack would have cured your colleague's brain-fart unless he had been focused on a single document talking about the performance characteristics of various data structures. I don't know the colleague any more that you so I feel comfortable saying that having it very likely *would* have cured that brain-fart. That is, he or she very likely would have needed to check some behavior of sets at some point and would have either noted the big-O characteristics in passing, or would have noted that such information was available, and would have returned to the documentation when the need for that information arose. The reference description of sets is the *one* canonical place to look for information about sets. There are people who don't read documentation, but one has to be very careful not use the existence of such people as an excuse to justify sub-standard documentation. So I think relegating algorithmic complexity information to some remote document far from the description of the object it pertains to, is exactly the wrong approach. This is not to say that a performance HOWTO or FAQ in addition to the reference manual would not be good. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to convert (unicode) text to image?
On 08/30/2010 04:50 AM, Thomas Jollans wrote: On Monday 30 August 2010, it occurred to ru...@yahoo.com to exclaim: Face the facts dude. The Python docs have some major problems. They were pretty good when Python was a new, cool, project used by a handful of geeks. They are good relative to the average (whatever that is) open source project -- but that bar is so low as to be a string lying on the ground. Actually, the Python standard library reference manual is excellent. At least that's my opinion. Granted, it's not necessarily the best in the world. It could probably be better. But that goes for just about every documentation effort there is. What exactly are you comparing the Python docs to, I wonder? Obviously not something like Vala, but that goes without saying. kj said that the Perl docs were better. I can't comment on that. I also won't comment on the sorry mess that the language Perl is, either. There are a few documentation efforts that I recognize are actually better than the Python docs: Firstly, the MSDN Library docs for the .Net framework. Not that I refer to it much, but it is excellent, and it probably was a pretty darn expensive project too. Secondly, the libc development manual pages on Linux and the BSDs. Provided you know your way around the C library, they are really a top-notch reference. The Postgresql docs have always seemed pretty good to me. And I'll second kj's nomination of Perl. The Perl docs have plenty of faults but many years ago I was able to learn Perl with nothing more than those docs. It was well over five years later that I ever got around to buying a commercial Perl book. In contrast, I made several, honest efforts to learn Python the same way but found it impossible and never got a handle on it until I bought Lutz's and Beazley's books. (Of which Bealey's was by far the most useful; Lutz became a doorstop pretty quickly. And yes, I knew about but didn't use the tutorial -- tutorials are one way of presenting information that aren't appropriate for everyone or in every situation, and the existence of one in no way excuses inadequate reference material.) If one is comparing the Python docs to others, comparing it to Beazley's book is informative. Most of the faults I find with the book are the places he took material from the Python docs nearly verbatim. The material he interprets and explains (usually quite tersely) is much clearer that similar material (if it even exists) in the Python docs. Finally, it it not really necessary to compare the Python docs to others to make a judgment -- simply looking at the hours taken to solve some problem that could have been avoided with a couple more sentences in the docs -- the number of hours spent trying figure out some behavior by pouring over the standard lib code -- the number of times one decides how to write code by trying it, with fingers crossed that one isn't relying on some accidental effect that will change with the next version or platform -- these can give a pretty good indication of the magnitude of the doc problems. I think one reason for the frequent Python docs are great opinions here is that eventually one figures out the hard way how things work, and tends to rely less on the docs as documentation, and more as a memmonic. And for that the existing docs are adequate. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to convert (unicode) text to image?
On 08/30/2010 01:14 PM, Terry Reedy wrote: On 8/30/2010 12:23 AM, ru...@yahoo.com wrote: The Python docs have some major problems. And I have no idea what you think they are. I have written about a few of them here in the past. I sure Google will turn up something. I have participated in 71 doc improvement issues on the tracker. Most of those I either initiated or provided suggestions. How many have you helped with? Certainly not 71. But there is, for example, http://bugs.python.org/issue1397474 Please note the date on it. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to convert (unicode) text to image?
On 08/29/2010 08:21 PM, alex23 wrote: kj no.em...@please.post wrote: snip Sorry for the outburst, but unfortunately, PIL is not alone in this. Python is awash in poor documentation. [...] I have to conclude that the problem with Python docs is somehow systemic... Yes, if everyone else disagrees with you, the problem is obviously systemic. No, not everyone disagrees with him. There are many people who absolutely agree with him. What helps are concrete suggestions to the package maintainers about how these improvements could be made, rather than huge sprawling attacks on the state of Python documentation (and trying to tie it into the state of Python itself) as a whole. ] Nothing you quoted of what he wrote attempted to tie it into the state of Python itself Instead, what we get are huge pointless rants like yours whenever someone finds that something isn't spelled out for them in exactly the way that they want. He never complained about spelling choices. These people are _volunteering_ their effort and their code, Yes, we all know that. all you're providing is an over-excess of hyperbole It is hardly convincing when one criticizes hyperbole with hyperbole. and punctuation. What is frustrating to me is seeing people like yourself spend far more time slamming these projects than actually contributing useful changes back. Face the facts dude. The Python docs have some major problems. They were pretty good when Python was a new, cool, project used by a handful of geeks. They are good relative to the average (whatever that is) open source project -- but that bar is so low as to be a string lying on the ground. Your overly defensive and oppressive response does not help. All it (combined with similar knee-jerk responses) does is act to suppress any criticism leaving the impression that the Python docs are really great, an assertion commonly made here and often left unchallenged. Responses like yours create a force that works to maintain the status quo. -- http://mail.python.org/mailman/listinfo/python-list
Re: A question about the posibility of raise-yield in Python
On Jun 30, 10:48 am, John Nagle na...@animats.com wrote: On 6/30/2010 12:13 AM, Дамјан Георгиевски wrote: A 'raise-yield' expression would break the flow of a program just like an exception, going up the call stack until it would be handled, but also like yield it would be possible to continue the flow of the program from where it was raise-yield-ed. Bad idea. Continuing after an exception is generally troublesome. This was discussed during the design phase of Ada, and rejected. Since then, it's been accepted that continuing after an exception is a terrible idea. The stack has already been unwound, for example. What you want, in in the situation you describe, is an optional callback, to be called in case of a fixable problem. Then the caller gets control, but without stack unwinding. Strangely I was just thinking about something similar (non-stack the other day. Something like def caller(): try: callee() except SomeError, exc: ... else exclist: ... def callee(): if error: raise SomeError() else raise2: SomeWarning() raise2 would create an exception object but unlike raise, would save in it a list somewhere and when callee() returned normally, the list would be made asvailable to caller, possibly in a parameter to the try/except else clause as shown above. Obviously raise2 is a placeholder for some way to signal that this is a non-stack-unwinding exception. The use case addressed is to note exceptional conditions in a function that aren't exceptional enough to be fatal but which the caller may or may not care about. Siminlar to the Warning module but without the brokenness of doing io. Integrates well with the existing way of handling fatal exceptions. No idea if something like this even remotely feasible. -- http://mail.python.org/mailman/listinfo/python-list
Re: I strongly dislike Python 3
On Jun 30, 9:42 am, Michele Simionato michele.simion...@gmail.com wrote: Actually when debugging I use pdb which uses p (no parens) for printing, so having print or print() would not make any difference for me. Perhaps you don't use CJK strings much? p u'\u30d1\u30a4\u30c8\u30f3' give quite a different result than print u'\u30d1\u30a4\u30c8\u30f3' at least in python2. Is this different in python3? -- http://mail.python.org/mailman/listinfo/python-list