[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-11 Thread Alec Koumjian

Alec Koumjian akoumj...@gmail.com added the comment:

Thanks, Matthew. I did not realize I could access either of those. I should be 
able to build a helper function now to do what I want.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2636
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2011-07-10 Thread Alec Koumjian

Alec Koumjian akoumj...@gmail.com added the comment:

I apologize if this is the wrong place for this message. I did not see the link 
to a separate list.

First let me explain what I am trying to accomplish. I would like to be able to 
take an unknown regular expression that contains both named and unnamed groups 
and tag their location in the original string where a match was found. Take the 
following redundantly simple example:

 a_string = rThis is a demo sentence.
 pattern = r(?a_thing\w+) (\w+) (?another_thing\w+)
 m = regex.search(pattern, a_string)

What I want is a way to insert named/numbered tags into the original string, so 
that it looks something like this:

ra_thingThis/a_thing 2is/2 another_thinga/another_thing demo 
sentence.

The syntax doesn't have to be exactly like that, but you get the place. I have 
inserted the names and/or indices of the groups into the original string, 
around the span that the groups occupy. 

This task is exceedingly difficult with the current implementation, unless I am 
missing something obvious. We could call the groups by index, the groups as a 
tuple, or the groupdict:

 m.group(1)
'This'
 m.groups()
('This', 'is', 'a')
 m.groupdict()
{'another_thing': 'a', 'a_thing': 'This'}

If all I wanted was to tag the groups by index, it would be a simple function. 
I would be able to call m.spans() for each index in the length of m.groups() 
and insert the  and / tags around the right indices.

The hard part is finding out how to find the spans of the named groups. Do any 
of you have a suggestion?

It would make more sense from my perspective, if each group was an object that 
had its own .span property. It would work like this with the above example:

 first = m.group(1)
 first.name()
'a_thing'
 second = m.group(2)
 second.name()
None


You could still call .spans() on the Match object itself, but it would query 
its children group objects for the data. Overall I think this would be a much 
more Pythonic approach, especially given that you have added subscripting and 
key lookup.

So instead of this:
 m['a_thing']
'This'
 type(m['a_thing'])
type 'str'

You could have:
 m['a_thing']
'This'
 type(m['a_thing'])
'regex.Match.Group object'

With the noted benefit of this:
 m['a_thing'].span()
(0, 4)
 m['a_thing'].index()
1


Maybe I'm missing a major point or functionality here, but I've been pouring 
over the docs and don't currently think what I'm trying to achieve is possible.

Thank you for taking the time to read all this.

-Alec

--
nosy: +akoumjian
versions:  -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2636
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com