On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote: > > > Why don't you use re.findall? > > > > > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt) > > > > I think I can see what you did there but it won't make sense to me - or > > whoever looks at the code - in future. > > > > That answers your specific question. However, I am in awe of people who > > can just "do" regular expressions and I thank you very much for what > > would have been a monumental effort had I tried it. > > I feel the same way about regex. If I can find a way to write something > without regex I very much prefer to as regex usually adds complexity and > hurts readability.
I find "straight" regexps very easy to write. There are only a handful of constructs which are all very simple and you just string them together. But then I've used regexps for 30+ years, so of course they feel natural to me. (Reading regexps may be a bit harder, exactly because they are to simple: There is no abstraction, so a complicated pattern results in a long regexp.) There are some extensions to regexps which are conceptually harder, like lookahead and lookbehind or nested contexts in Perl. I may need the manual for those (especially because they are new(ish) and every language uses a different syntax for them) or avoid them altogether. Oh, and Python (just like Perl) allows you to embed whitespace and comments into Regexps, which helps readability a lot if you have to write long regexps. > You might find https://regex101.com/ to be useful for testing your regex. > You can enter in sample data and see if it matches. > > If I understood what your regex was trying to do I might be able to suggest > some python to do the same thing. Is it just removing numbers from text? Not "removing" them (as I understood it), but extracting them (i.e. find and collect them). > > > re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt) \b - a word boundary. [0-9]{2,7} - 2 to 7 digits - - a hyphen-minus [0-9]{2} - exactly 2 digits - - a hyphen-minus [0-9]{2} - exactly 2 digits \b - a word boundary. Seems quite straightforward to me. I'll be impressed if you can write that in Python in a way which is easier to read. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | h...@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list