Re: Question about metacharacter '*'
On 07/07/2014 19:51, rxjw...@gmail.com wrote: Will you please do something about the double spaced google crap that you keep sending, I've already asked you twice. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Mon, Jul 7, 2014 at 11:51 AM, wrote: > Would you give me an example using your pattern: `.*` -- `.`? > I try it, but it cannot pass. (of course, I use it incorrectly) Those are two patterns. Python 3.4.1 (default, Jul 7 2014, 13:22:02) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.fullmatch(r'.', 'a') <_sre.SRE_Match object; span=(0, 1), match='a'> >>> re.fullmatch(r'.', 'ab') >>> re.fullmatch(r'.', '') >>> >>> re.fullmatch(r'.*', 'a') <_sre.SRE_Match object; span=(0, 1), match='a'> >>> re.fullmatch(r'.*', 'ab') <_sre.SRE_Match object; span=(0, 2), match='ab'> >>> re.fullmatch(r'.*', '') <_sre.SRE_Match object; span=(0, 0), match=''> -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sunday, July 6, 2014 8:09:57 AM UTC-4, Devin Jeanpierre wrote: > On Sun, Jul 6, 2014 at 4:51 AM, wrote: > > > Hi, > > > > > > I just begin to learn Python. I do not see the usefulness of '*' in its > > > description below: > > > > > > > > > > > > > > > The first metacharacter for repeating things that we'll look at is *. * > > doesn't > > > match the literal character *; instead, it specifies that the previous > > character > > > can be matched zero or more times, instead of exactly once. > > > > > > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a > > > characters), and so forth. > > > > > > > > > > > > It has to be used with other search constraints? > > > > (BTW, this is a regexp question, not really a Python question per se.) > > > > That's usually when it's useful, yeah. For example, [0-9] matches any > > of the characters 0 through 9. So to match a natural number written in > > decimal form, we might use the regexp [0-9][0-9]*, which matches the > > strings "1", "12", and "007", but not "" or "Jeffrey". > > > > Another useful one is `.*` -- `.` matches exactly one character, no > > matter what that character is. So, `.*` matches any string at all. > > > > The power of regexps stems from the ability to mix and match all of > > the regexp pieces in pretty much any way you want. > > > > -- Devin Would you give me an example using your pattern: `.*` -- `.`? I try it, but it cannot pass. (of course, I use it incorrectly) -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sun, Jul 6, 2014 at 4:49 PM, MRAB wrote: > \d also matches more than just [0-9] in Unicode. I think that anything matched by \d will also be accepted by int(). >>> decimals = [c for c in (chr(i) for i in range(17 * 2**16)) if >>> unicodedata.category(c) == 'Nd'] >>> len(decimals) 460 >>> re.match(r'\d*', ''.join(decimals)).span() (0, 460) >>> int(''.join(decimals)) 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 >>> nondecimals = [c for c in (chr(i) for i in range(17 * 2**16)) if >>> unicodedata.category(c) in 'NoNl'] >>> len(nondecimals) 688 >>> re.findall(r'\d', ''.join(nondecimals)) [] -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
The reason I did not use \d\d* or \d+ or ^\d+$ or any number of more-correct things was because the OP was new to regexps. -- Devin On Sun, Jul 6, 2014 at 3:49 PM, MRAB wrote: > On 2014-07-06 18:41, Albert-Jan Roskam wrote: >> >> >> >> >>> In article , >>> Rick Johnson wrote: >>> As an aside i prefer to only utilize a "character set" when nothing else will suffice. And in this case r"[0-9][0-9]*" can be expressed just as correctly (and less noisy IMHO) as r"\d\d*". >>> >>> >>> Even better, r"\d+" >> >> >> I tend tot do that too, even though technically the two are not perfectly >> equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects >> what is captured by \d but not by [0-9] >> > \d also matches more than just [0-9] in Unicode. > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On 2014-07-06 18:41, Albert-Jan Roskam wrote: In article , Rick Johnson wrote: As an aside i prefer to only utilize a "character set" when nothing else will suffice. And in this case r"[0-9][0-9]*" can be expressed just as correctly (and less noisy IMHO) as r"\d\d*". Even better, r"\d+" I tend tot do that too, even though technically the two are not perfectly equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects what is captured by \d but not by [0-9] \d also matches more than just [0-9] in Unicode. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
>In article , > Rick Johnson wrote: > >> As an aside i prefer to only utilize a "character set" when >> nothing else will suffice. And in this case r"[0-9][0-9]*" >> can be expressed just as correctly (and less noisy IMHO) as >> r"\d\d*". > >Even better, r"\d+" I tend tot do that too, even though technically the two are not perfectly equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects what is captured by \d but not by [0-9] -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sunday, July 6, 2014 12:38:23 PM UTC-5, Rick Johnson wrote: > r'\s*#[^\n]' Well, there i go not testing again! r'\s*#[^\n]*' -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sunday, July 6, 2014 11:47:38 AM UTC-5, Roy Smith wrote: > Even better, r"\d+" > >>> re.search(r'(\d\d*)', '111aaa222').groups() > ('111',) > >>> re.search(r'(\d+)', '111aaa222').groups() > ('111',) Yes, good catch! I had failed to reduce your original pattern down to it's most fundamental aspects for the sake of completeness, and instead, opted to modify it in a manner that mirrored your example. > Oddly enough, I prefer character sets to the backslash > notation, but I suppose that's largely because when I > first learned regexes, that new-fangled backslash stuff > hadn't been invented yet. :-) Ha, point taken! :-) Character sets really shine when you need a fixed range of letters or numbers which are NOT defined by one of the "special characters" of \d \D \W \w, etc... Say you want to match any letters between "c" and "m" or the digits between "3" and "6". Defining that pattern using OR'd "char literals" would be a massive undertaking! Another great use of character sets is skipping chars that don't match a "target". For instance, a python comment will start with one hash char and proceedeth to the end of the line,,, which when accounting for leading white-space,,, could be defined by the pattern: r'\s*#[^\n]' > Regex is also not as easy to use in Python as it is in a > language like Perl where it's baked into the syntax. As a > result, pythonistas tend to shy away from regex, and > either never learn the full power, or let their skills > grow rusty. Which is a shame, because for many tasks, > there's no better tool. Agreed, but unfortunately like many other languages, Python has decided to import all the illogical of regex syntax from other languages instead of creating a "new" regex syntax that is consistent and logical. They did the same thing with Tkinter, and what a nightmare! And don't misunderstand my statements, i don't intend that we should create a syntax of verbosity, NO, we *CAN* keep the syntax succinct whist eliminating the illogical and inconsistent aspects that plague our patterns. Will regex ever be easy to learn, probably not, but they can be easier to use if only we put on our "big boy" pants and decide to do something about it! -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
In article , Rick Johnson wrote: > As an aside i prefer to only utilize a "character set" when > nothing else will suffice. And in this case r"[0-9][0-9]*" > can be expressed just as correctly (and less noisy IMHO) as > r"\d\d*". Even better, r"\d+" >>> re.search(r'(\d\d*)', '111aaa222').groups() ('111',) >>> re.search(r'(\d+)', '111aaa222').groups() ('111',) Oddly enough, I prefer character sets to the backslash notation, but I suppose that's largely because when I first learned regexes, that new-fangled backslash stuff hadn't been invented yet. :-) I know I've said this before, but people should put more effort into learning regex. There are lots of good tools in Python (startswith, endswith, split, in, etc) which handle many of the most common regex use cases. Regex is also not as easy to use in Python as it is in a language like Perl where it's baked into the syntax. As a result, pythonistas tend to shy away from regex, and either never learn the full power, or let their skills grow rusty. Which is a shame, because for many tasks, there's no better tool. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
[CONTINUED FROM LAST REPLY...] Likewise if your intent is to filter out any match strings which contain non-digits, then define the start and stop points of the pattern: # Match only if all are digits >>> re.match(r'\d\d*$', '111aaa222') # fails # Match only if all are digits and, # allow leading white-space >>> re.match(r'\s*\d\d*$', ' 111') <_sre.SRE_Match object at 0x026D8410> # But not trailing space! >>> re.match(r'\s*\d\d*$', ' 111 ') # fails -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sunday, July 6, 2014 10:50:13 AM UTC-5, Devin Jeanpierre wrote: > In related news, the regexp I gave for numbers will match "1a". Well of course it matched, because your pattern defines "one or more consecutive digits". So it will match the "1" of "1a" and the "11" of "11a" likewise. As an aside i prefer to only utilize a "character set" when nothing else will suffice. And in this case r"[0-9][0-9]*" can be expressed just as correctly (and less noisy IMHO) as r"\d\d*". INTERACTIVE SESSION: Python 2.x # Note: Grouping used for explicitness. # # Using character sets: >>> import re >>> re.search(r'([0-9][0-9]*)', '1a').groups() ('1',) >>> re.search(r'([0-9][0-9]*)', '11a').groups() ('11',) >>> re.search(r'([0-9][0-9]*)', '111aaa222').groups() ('111',) # # Same result without charactor sets: >>> re.search(r'(\d\d*)', '1a').groups() ('1',) >>> re.search(r'(\d\d*)', '11a').groups() ('11',) >>> re.search(r'(\d\d*)', '111aaa222').groups() ('111',) -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
In related news, the regexp I gave for numbers will match "1a". -- Devin On Sun, Jul 6, 2014 at 8:32 AM, MRAB wrote: > On 2014-07-06 13:09, Devin Jeanpierre wrote: >> >> On Sun, Jul 6, 2014 at 4:51 AM, wrote: >>> >>> Hi, >>> >>> I just begin to learn Python. I do not see the usefulness of '*' in its >>> description below: >>> >>> >>> >>> >>> The first metacharacter for repeating things that we'll look at is *. * >>> doesn't >>> match the literal character *; instead, it specifies that the previous >>> character >>> can be matched zero or more times, instead of exactly once. >>> >>> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a >>> characters), and so forth. >>> >>> >>> >>> It has to be used with other search constraints? >> >> >> (BTW, this is a regexp question, not really a Python question per se.) >> >> That's usually when it's useful, yeah. For example, [0-9] matches any >> of the characters 0 through 9. So to match a natural number written in >> decimal form, we might use the regexp [0-9][0-9]*, which matches the >> strings "1", "12", and "007", but not "" or "Jeffrey". >> >> Another useful one is `.*` -- `.` matches exactly one character, no >> matter what that character is. So, `.*` matches any string at all. >> > Not quite. It won't match a '\n' unless the DOTALL flag is turned on. > > >> The power of regexps stems from the ability to mix and match all of >> the regexp pieces in pretty much any way you want. >> > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On 2014-07-06 13:09, Devin Jeanpierre wrote: On Sun, Jul 6, 2014 at 4:51 AM, wrote: Hi, I just begin to learn Python. I do not see the usefulness of '*' in its description below: The first metacharacter for repeating things that we'll look at is *. * doesn't match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a characters), and so forth. It has to be used with other search constraints? (BTW, this is a regexp question, not really a Python question per se.) That's usually when it's useful, yeah. For example, [0-9] matches any of the characters 0 through 9. So to match a natural number written in decimal form, we might use the regexp [0-9][0-9]*, which matches the strings "1", "12", and "007", but not "" or "Jeffrey". Another useful one is `.*` -- `.` matches exactly one character, no matter what that character is. So, `.*` matches any string at all. Not quite. It won't match a '\n' unless the DOTALL flag is turned on. The power of regexps stems from the ability to mix and match all of the regexp pieces in pretty much any way you want. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about metacharacter '*'
On Sun, Jul 6, 2014 at 4:51 AM, wrote: > Hi, > > I just begin to learn Python. I do not see the usefulness of '*' in its > description below: > > > > > The first metacharacter for repeating things that we'll look at is *. * > doesn't > match the literal character *; instead, it specifies that the previous > character > can be matched zero or more times, instead of exactly once. > > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a > characters), and so forth. > > > > It has to be used with other search constraints? (BTW, this is a regexp question, not really a Python question per se.) That's usually when it's useful, yeah. For example, [0-9] matches any of the characters 0 through 9. So to match a natural number written in decimal form, we might use the regexp [0-9][0-9]*, which matches the strings "1", "12", and "007", but not "" or "Jeffrey". Another useful one is `.*` -- `.` matches exactly one character, no matter what that character is. So, `.*` matches any string at all. The power of regexps stems from the ability to mix and match all of the regexp pieces in pretty much any way you want. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Question about metacharacter '*'
Hi, I just begin to learn Python. I do not see the usefulness of '*' in its description below: The first metacharacter for repeating things that we'll look at is *. * doesn't match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a characters), and so forth. It has to be used with other search constraints? Thanks, -- https://mail.python.org/mailman/listinfo/python-list