Re: Question about metacharacter '*'

2014-07-07 Thread Mark Lawrence

On 07/07/2014 19:51, rxjw...@gmail.com wrote:

Will you please do something about the double spaced google crap that 
you keep sending, I've already asked you twice.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-07 Thread Devin Jeanpierre
On Mon, Jul 7, 2014 at 11:51 AM,   wrote:
> Would you give me an example using your pattern: `.*` -- `.`?
> I try it, but it cannot pass. (of course, I use it incorrectly)

Those are two patterns.

Python 3.4.1 (default, Jul  7 2014, 13:22:02)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.fullmatch(r'.', 'a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.fullmatch(r'.', 'ab')
>>> re.fullmatch(r'.', '')
>>>
>>> re.fullmatch(r'.*', 'a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.fullmatch(r'.*', 'ab')
<_sre.SRE_Match object; span=(0, 2), match='ab'>
>>> re.fullmatch(r'.*', '')
<_sre.SRE_Match object; span=(0, 0), match=''>

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-07 Thread rxjwg98
On Sunday, July 6, 2014 8:09:57 AM UTC-4, Devin Jeanpierre wrote:
> On Sun, Jul 6, 2014 at 4:51 AM,   wrote:
> 
> > Hi,
> 
> >
> 
> > I just begin to learn Python. I do not see the usefulness of '*' in its
> 
> > description below:
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > The first metacharacter for repeating things that we'll look at is *. * 
> > doesn't
> 
> > match the literal character *; instead, it specifies that the previous 
> > character
> 
> > can be matched zero or more times, instead of exactly once.
> 
> >
> 
> > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
> 
> > characters), and so forth.
> 
> >
> 
> >
> 
> >
> 
> > It has to be used with other search constraints?
> 
> 
> 
> (BTW, this is a regexp question, not really a Python question per se.)
> 
> 
> 
> That's usually when it's useful, yeah. For example, [0-9] matches any
> 
> of the characters 0 through 9. So to match a natural number written in
> 
> decimal form, we might use the regexp [0-9][0-9]*, which matches the
> 
> strings "1", "12", and "007", but not "" or "Jeffrey".
> 
> 
> 
> Another useful one is `.*` -- `.` matches exactly one character, no
> 
> matter what that character is. So, `.*` matches any string at all.
> 
> 
> 
> The power of regexps stems from the ability to mix and match all of
> 
> the regexp pieces in pretty much any way you want.
> 
> 
> 
> -- Devin

Would you give me an example using your pattern: `.*` -- `.`?
I try it, but it cannot pass. (of course, I use it incorrectly)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-07 Thread Ian Kelly
On Sun, Jul 6, 2014 at 4:49 PM, MRAB  wrote:
> \d also matches more than just [0-9] in Unicode.

I think that anything matched by \d will also be accepted by int().

>>> decimals = [c for c in (chr(i) for i in range(17 * 2**16)) if 
>>> unicodedata.category(c) == 'Nd']
>>> len(decimals)
460
>>> re.match(r'\d*', ''.join(decimals)).span()
(0, 460)
>>> int(''.join(decimals))
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
>>> nondecimals = [c for c in (chr(i) for i in range(17 * 2**16)) if 
>>> unicodedata.category(c) in 'NoNl']
>>> len(nondecimals)
688
>>> re.findall(r'\d', ''.join(nondecimals))
[]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Devin Jeanpierre
The reason I did not use \d\d* or \d+ or ^\d+$ or any number of
more-correct things was because the OP was new to regexps.

-- Devin

On Sun, Jul 6, 2014 at 3:49 PM, MRAB  wrote:
> On 2014-07-06 18:41, Albert-Jan Roskam wrote:
>>
>>
>>
>>
>>> In article ,
>>> Rick Johnson  wrote:
>>>
 As an aside i prefer to only utilize a "character set" when
 nothing else will suffice. And in this case r"[0-9][0-9]*"
 can be expressed just as correctly  (and less noisy IMHO) as
 r"\d\d*".
>>>
>>>
>>> Even better, r"\d+"
>>
>>
>> I tend tot do that too, even though technically the two are not perfectly
>> equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects
>> what is captured by \d but not by [0-9]
>>
> \d also matches more than just [0-9] in Unicode.
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread MRAB

On 2014-07-06 18:41, Albert-Jan Roskam wrote:





In article ,
Rick Johnson  wrote:


As an aside i prefer to only utilize a "character set" when
nothing else will suffice. And in this case r"[0-9][0-9]*"
can be expressed just as correctly  (and less noisy IMHO) as
r"\d\d*".


Even better, r"\d+"


I tend tot do that too, even though technically the two are not perfectly 
equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects 
what is captured by \d but not by [0-9]


\d also matches more than just [0-9] in Unicode.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Albert-Jan Roskam



>In article ,
> Rick Johnson  wrote:
>
>> As an aside i prefer to only utilize a "character set" when
>> nothing else will suffice. And in this case r"[0-9][0-9]*"
>> can be expressed just as correctly  (and less noisy IMHO) as
>> r"\d\d*".
>
>Even better, r"\d+"

I tend tot do that too, even though technically the two are not perfectly 
equivalent. With the re.LOCALE flag LC_ctype is also affected, which affects 
what is captured by \d but not by [0-9]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Rick Johnson
On Sunday, July 6, 2014 12:38:23 PM UTC-5, Rick Johnson wrote:

> r'\s*#[^\n]'

Well, there i go not testing again!

r'\s*#[^\n]*'
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Rick Johnson
On Sunday, July 6, 2014 11:47:38 AM UTC-5, Roy Smith wrote:
> Even better, r"\d+"
> >>> re.search(r'(\d\d*)', '111aaa222').groups()
> ('111',)
> >>> re.search(r'(\d+)', '111aaa222').groups()
> ('111',)

Yes, good catch! I had failed to reduce your original
pattern down to it's most fundamental aspects for the sake
of completeness, and instead, opted to modify it in a manner
that mirrored your example. 

> Oddly enough, I prefer character sets to the backslash
> notation, but I suppose that's largely because when I
> first learned regexes, that new-fangled backslash stuff
> hadn't been invented yet. :-) 

Ha, point taken! :-)

Character sets really shine when you need a fixed range of
letters or numbers which are NOT defined by one of the
"special characters" of \d \D \W \w, etc... 

Say you want to match any letters between "c" and "m" or the
digits between "3" and "6". Defining that pattern using OR'd
"char literals" would be a massive undertaking!

Another great use of character sets is skipping chars that
don't match a "target". For instance, a python comment will
start with one hash char and proceedeth to the end of the
line,,, which when accounting for leading white-space,,,
could be defined by the pattern:

r'\s*#[^\n]'

> Regex is also not as easy to use in Python as it is in a
> language like Perl where it's baked into the syntax.  As a
> result, pythonistas tend to shy away from regex, and
> either never learn the full power, or let their skills
> grow rusty. Which is a shame, because for many tasks,
> there's no better tool.

Agreed, but unfortunately like many other languages, Python
has decided to import all the illogical of regex syntax from
other languages instead of creating a "new" regex syntax
that is consistent and logical. They did the same thing with
Tkinter, and what a nightmare!

And don't misunderstand my statements, i don't intend that
we should create a syntax of verbosity, NO, we *CAN* keep
the syntax succinct whist eliminating the illogical and
inconsistent aspects that plague our patterns.  

Will regex ever be easy to learn, probably not, but they can
be easier to use if only we put on our "big boy" pants and
decide to do something about it!

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Roy Smith
In article ,
 Rick Johnson  wrote:

> As an aside i prefer to only utilize a "character set" when
> nothing else will suffice. And in this case r"[0-9][0-9]*"
> can be expressed just as correctly  (and less noisy IMHO) as
> r"\d\d*".

Even better, r"\d+"

>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)
>>> re.search(r'(\d+)', '111aaa222').groups()
('111',)

Oddly enough, I prefer character sets to the backslash notation, but I 
suppose that's largely because when I first learned regexes, that 
new-fangled backslash stuff hadn't been invented yet. :-)

I know I've said this before, but people should put more effort into 
learning regex.  There are lots of good tools in Python (startswith, 
endswith, split, in, etc) which handle many of the most common regex use 
cases.  Regex is also not as easy to use in Python as it is in a 
language like Perl where it's baked into the syntax.  As a result, 
pythonistas tend to shy away from regex, and either never learn the full 
power, or let their skills grow rusty.  Which is a shame, because for 
many tasks, there's no better tool.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Rick Johnson
[CONTINUED FROM LAST REPLY...]

Likewise if your intent is to filter out any match strings
which contain non-digits, then define the start and stop
points of the pattern:

# Match only if all are digits
>>> re.match(r'\d\d*$', '111aaa222') # fails

# Match only if all are digits and,
# allow leading white-space
>>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object at 0x026D8410>
# But not trailing space!
>>> re.match(r'\s*\d\d*$', '   111 ') # fails
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Rick Johnson
On Sunday, July 6, 2014 10:50:13 AM UTC-5, Devin Jeanpierre wrote:
> In related news, the regexp I gave for numbers will match "1a".

Well of course it matched, because your pattern defines "one
or more consecutive digits". So it will match the "1" of
"1a" and the "11" of "11a" likewise.

As an aside i prefer to only utilize a "character set" when
nothing else will suffice. And in this case r"[0-9][0-9]*"
can be expressed just as correctly  (and less noisy IMHO) as
r"\d\d*".


 INTERACTIVE SESSION: Python 2.x

# Note: Grouping used for explicitness.

#
# Using character sets:
>>> import re
>>> re.search(r'([0-9][0-9]*)', '1a').groups()
('1',)
>>> re.search(r'([0-9][0-9]*)', '11a').groups()
('11',)
>>> re.search(r'([0-9][0-9]*)', '111aaa222').groups()
('111',)

#
# Same result without charactor sets:
>>> re.search(r'(\d\d*)', '1a').groups()
('1',)
>>> re.search(r'(\d\d*)', '11a').groups()
('11',)
>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Devin Jeanpierre
In related news, the regexp I gave for numbers will match "1a".

-- Devin

On Sun, Jul 6, 2014 at 8:32 AM, MRAB  wrote:
> On 2014-07-06 13:09, Devin Jeanpierre wrote:
>>
>> On Sun, Jul 6, 2014 at 4:51 AM,   wrote:
>>>
>>> Hi,
>>>
>>> I just begin to learn Python. I do not see the usefulness of '*' in its
>>> description below:
>>>
>>>
>>>
>>>
>>> The first metacharacter for repeating things that we'll look at is *. *
>>> doesn't
>>> match the literal character *; instead, it specifies that the previous
>>> character
>>> can be matched zero or more times, instead of exactly once.
>>>
>>> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
>>> characters), and so forth.
>>>
>>>
>>>
>>> It has to be used with other search constraints?
>>
>>
>> (BTW, this is a regexp question, not really a Python question per se.)
>>
>> That's usually when it's useful, yeah. For example, [0-9] matches any
>> of the characters 0 through 9. So to match a natural number written in
>> decimal form, we might use the regexp [0-9][0-9]*, which matches the
>> strings "1", "12", and "007", but not "" or "Jeffrey".
>>
>> Another useful one is `.*` -- `.` matches exactly one character, no
>> matter what that character is. So, `.*` matches any string at all.
>>
> Not quite. It won't match a '\n' unless the DOTALL flag is turned on.
>
>
>> The power of regexps stems from the ability to mix and match all of
>> the regexp pieces in pretty much any way you want.
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread MRAB

On 2014-07-06 13:09, Devin Jeanpierre wrote:

On Sun, Jul 6, 2014 at 4:51 AM,   wrote:

Hi,

I just begin to learn Python. I do not see the usefulness of '*' in its
description below:




The first metacharacter for repeating things that we'll look at is *. * doesn't
match the literal character *; instead, it specifies that the previous character
can be matched zero or more times, instead of exactly once.

For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
characters), and so forth.



It has to be used with other search constraints?


(BTW, this is a regexp question, not really a Python question per se.)

That's usually when it's useful, yeah. For example, [0-9] matches any
of the characters 0 through 9. So to match a natural number written in
decimal form, we might use the regexp [0-9][0-9]*, which matches the
strings "1", "12", and "007", but not "" or "Jeffrey".

Another useful one is `.*` -- `.` matches exactly one character, no
matter what that character is. So, `.*` matches any string at all.


Not quite. It won't match a '\n' unless the DOTALL flag is turned on.


The power of regexps stems from the ability to mix and match all of
the regexp pieces in pretty much any way you want.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Question about metacharacter '*'

2014-07-06 Thread Devin Jeanpierre
On Sun, Jul 6, 2014 at 4:51 AM,   wrote:
> Hi,
>
> I just begin to learn Python. I do not see the usefulness of '*' in its
> description below:
>
>
>
>
> The first metacharacter for repeating things that we'll look at is *. * 
> doesn't
> match the literal character *; instead, it specifies that the previous 
> character
> can be matched zero or more times, instead of exactly once.
>
> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
> characters), and so forth.
>
>
>
> It has to be used with other search constraints?

(BTW, this is a regexp question, not really a Python question per se.)

That's usually when it's useful, yeah. For example, [0-9] matches any
of the characters 0 through 9. So to match a natural number written in
decimal form, we might use the regexp [0-9][0-9]*, which matches the
strings "1", "12", and "007", but not "" or "Jeffrey".

Another useful one is `.*` -- `.` matches exactly one character, no
matter what that character is. So, `.*` matches any string at all.

The power of regexps stems from the ability to mix and match all of
the regexp pieces in pretty much any way you want.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Question about metacharacter '*'

2014-07-06 Thread rxjwg98
Hi,

I just begin to learn Python. I do not see the usefulness of '*' in its
description below:




The first metacharacter for repeating things that we'll look at is *. * doesn't
match the literal character *; instead, it specifies that the previous character
can be matched zero or more times, instead of exactly once.

For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
characters), and so forth. 



It has to be used with other search constraints?


Thanks,
-- 
https://mail.python.org/mailman/listinfo/python-list