On Mon, Nov 10, 2014 at 03:08:23PM -0800, Clayton Kirkwood wrote: > Also of confusion, the library reference says: > > Match objects always have a boolean value of True. Since match() and > search() return None when there is no match, you can test whether there was > a match with a simple if statement: > > match = re.search(pattern, string) > if match: > process(match) > > blah = re.search( > r'<\w\w>(\w{3})\.\s+(\d{2}),\s+(\d{2}).+([AP]M)\s+(E[SD]T)', line) > >>> blah > <_sre.SRE_Match object; span=(45, 73), match='<BR>Nov. 09, 07:15:46 PM EST'> > >>> if blah: print(blah) > <_sre.SRE_Match object; span=(45, 73), match='<BR>Nov. 09, 07:15:46 PM EST'> > >>> if blah == True: print(blah)> > No print out > > To me, this doesn't *appear* to be quite true.
I think you are misreading a plain English expression, namely to "have a boolean value", as "is a boolean value". If I said: The binary string '0b1001' has a numeric value of 9. I don't think anyone would interpret that sentence as meaning that Python treats the string equal to the int: '0b1001' == 9 # returns False but rather that converting the string to an int returns 9: int('0b1001', 0) == 9 # returns True Somebody unfamiliar with Python might (wrongly) believe that Python requires an explicit bool() conversion, just as Python requires an explicit int() conversion, but to avoid that misapprehension, the docs show an example of the correct idiomatic code to use. You tried it yourself and saw that it works: if blah: print(blah) prints blah, exactly as the docs suggest. As you can see from the printed string value of blah, it is a Match object, and it behaves like True in conditionals (if-statement). On the other hand, this piece of code does something completely different: s = "<_sre.SRE_Match object; span=(45, 73), match='<BR>Nov. 09, 07:15:46 PM EST'>" if blah == s: print(blah) First it checks whether blah equals the given string, then it tests the condition. Not surprisingly, that doesn't print anything. Match objects are not strings, and although they do have a printable string representation, they are not equal to that representation. Nor are they equal to True: if blah == True: print(blah) # also fails to print anything The comparison "blah == True" returns False, as it should, and the if clause does not run. Match objects might not be equal to True, however they are true, in the same way that my car is not equal to red, but it is red. (You'll have to take my word for it that I actually do own a red car.) [...] > I would expect len(sizeof, whatever)(blah) to return the number of (in this > case) matches, so 5. Doing a search suggests what is important: the number > of matches. Why else would you do a search, normally. The number of groups in a match is comparatively unimportant. The *content* of the matched groups is important. Consider this regular expression: regex = r'(\w*?)\s*=\s*\$(\d*)' That has two groups. It *always* has two groups, regardless of what it matches: py> re.match(regex, "profit = $10").groups() ('profit', '10') py> re.match(regex, "loss = $3456").groups() ('loss', '3456') I can imagine writing code that looks like: key, amount = mo.groups() if key == 'profit': handle_profit(amount) elif key == 'loss': handle_loss(amount) else: raise ValueError('unexpected key "%s"' % key) but I wouldn't expect to write code like this: t = mo.groups() if len(t) == 2: handle_two_groups(t) else: raise ValueError('and a miracle occurs') It truly would be a miracle, or perhaps a bug is more accurate, if the regular expression r'(\w*?)\s*=\s*\$(\d*)' ever returned a match object with less than, or more than, two groups. That would be like: mylist = [1, 2] if len(mylist) != 2: raise ValueError The only time you don't know how many groups are in a Match object is if the regular expression itself was generated programmatically, and that's very unusual. > That could then be used in the range() > It would be nice to have the number of arguments. > I would expect len(blah.group()) to be 5, because that is the relevant > number of elements returned from group. And that is the basic thing that > group is about; the groups, what they are and how many there are. I > certainly wouldn't want len(group) to return the number of characters, in > this case, 28 (which it does:>{{{ > > > >>> blah.group() > '<BR>Nov. 09, 07:15:46 PM EST' MatchObject.group() with no arguments is like a default argument of 0, which returns the entire matched string. For many purposes, that is all you need, you may not care about the individual groups in the regex. > >>> len(blah.group()) > 28 What would you expect len('<BR>Nov. 09, 07:15:46 PM EST') to return? There are 28 characters, so returning anything other than 28 would be a terrible bug. There is no way that len() can tell the difference between any of these: len('<BR>Nov. 09, 07:15:46 PM EST') len(blah.group()) len('<BR>Nov. %s, 07:15:46 PM EST' % '09') s = '<BR>Nov. 09, 07:15:46 PM EST'; len(s) len((lambda: '<BR>Nov. 09, 07:15:46 PM EST')()) or any other of an infinite number of ways to get the same string. All len() sees is the string, not where it came from. If you want to know how many groups are in the regex, *look at it*: r'<\w\w>(\w{3})\.\s+(\d{2}),\s+(\d{2}).+([AP]M)\s+(E[SD]T)' has five groups. Or call groups and count the number of items returned: len(blah.groups()) > I didn't run group to find out the number of characters in a string, I ran > it to find out something about blah and its matches. Well, of course nobody is stopping you from calling blah.group() to find out the number of groups, in the same way that nobody is stopping you from calling int('123456') to find out the time of day. But in both cases you will be disappointed. You have to use the correct tool for the correct job, and blah.group() returns the entire matching string, not a tuple of groups. For that, you call blah.groups() (note plural). -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor