Brian D wrote: > On Jun 11, 9:22 am, Brian D <brianden...@gmail.com> wrote: >> On Jun 11, 2:01 am, Lie Ryan <lie.1...@gmail.com> wrote: >> >> >> >>> 504cr...@gmail.com wrote: >>>> I've encountered a problem with my RegEx learning curve -- how to >>>> escape hash characters # in strings being matched, e.g.: >>>>>>> string = re.escape('123#abc456') >>>>>>> match = re.match('\d+', string) >>>>>>> print match >>>> <_sre.SRE_Match object at 0x00A6A800> >>>>>>> print match.group() >>>> 123 >>>> The correct result should be: >>>> 123456 >>>> I've tried to escape the hash symbol in the match string without >>>> result. >>>> Any ideas? Is the answer something I overlooked in my lurching Python >>>> schooling? >>> As you're not being clear on what you wanted, I'm just guessing this is >>> what you wanted: >>>>>> s = '123#abc456' >>>>>> re.match('\d+', re.sub('#\D+', '', s)).group() >>> '123456' >>>>>> s = '123#this is a comment and is ignored456' >>>>>> re.match('\d+', re.sub('#\D+', '', s)).group() >>> '123456' >> Sorry I wasn't more clear. I positively appreciate your reply. It >> provides half of what I'm hoping to learn. The hash character is >> actually a desirable hook to identify a data entity in a scraping >> routine I'm developing, but not a character I want in the scrubbed >> data. >> >> In my application, the hash makes a string of alphanumeric characters >> unique from other alphanumeric strings. The strings I'm looking for >> are actually manually-entered identifiers, but a real machine-created >> identifier shouldn't contain that hash character. The correct pattern >> should be 'A1234509', but is instead often merely entered as '#12345' >> when the first character, representing an alphabet sequence for the >> month, and the last two characters, representing a two-digit year, can >> be assumed. Identifying the hash character in a RegEx match is a way >> of trapping the string and transforming it into its correct machine- >> generated form. >> >> I'm surprised it's been so difficult to find an example of the hash >> character in a RegEx string -- for exactly this type of situation, >> since it's so common in the real world that people want to put a pound >> symbol in front of a number. >> >> Thanks! > > By the way, other forms the strings can take in their manually created > forms: > > A#12345 > #1234509 > > Garbage in, garbage out -- I know. I wish I could tell the people > entering the data how challenging it is to work with what they > provide, but it is, after all, a screen-scraping routine.
perhaps it's like this? >>> # you can use re.search if that suits better >>> a = re.match('([A-Z]?)#(\d{5})(\d\d)?', 'A#12345') >>> b = re.match('([A-Z]?)#(\d{5})(\d\d)?', '#1234509') >>> a.group(0) 'A#12345' >>> a.group(1) 'A' >>> a.group(2) '12345' >>> a.group(3) >>> b.group(0) '#1234509' >>> b.group(1) '' >>> b.group(2) '12345' >>> b.group(3) '09' -- http://mail.python.org/mailman/listinfo/python-list