Re: [Python-Dev] hierarchicial named groups extension to the re library

2005-04-03 Thread Pierre Barbier de Reuille

[EMAIL PROTECTED] a écrit :
Nicolas Fleury  wrote:
>
[...]
Actually, I ~would~ like to limit it to just named groups.
I reckon, if you're not going to bother naming a group, then why would
you have any interest in it.
I guess its up for discussion how confusing this "new" way of thinking
could be and what drawbacks it might have.
I would find interesting to match every groups without naming them ! For 
example, if the position in the father group is the best meaning, why 
bother with names ? If you just allow the user to skip the compression 
stage it will do the trick !

That leads me to a question: would it be possible to use, as names for 
unnamed groups, integers instead of strings ? That way, you could access 
unnamed groups by their rank in their father group for example.

A small example of what I would want:
>>> buf="123 234 345, 123 256, and 123 289"
>>> regex=r'^(( *\d+)+,)+ *(?P[^ ]+)(( *\d+)+).*$'
>>> pat2=re2.compile(regex)
>>> x=pat2.extract(buf)
>>> x
{ 0: {'_value': "123 234 345,", 0: "123", 1: " 234", 2: " 345"},
  1: {'_value': " 123 256,", 0: " 123", 1:" 256"},
  'logic': {'_value': 'and'},
  3: {'_value': " 123 289", 1: " 123", 2:" 289"} }
Pierre
Regards.
Chris.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pierre.barbier%40cirad.fr
--
Pierre Barbier de Reuille
INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP
Botanique et Bio-informatique de l'Architecture des Plantes
TA40/PSII, Boulevard de la Lironde
34398 MONTPELLIER CEDEX 5, France
tel   : (33) 4 67 61 65 77fax   : (33) 4 67 61 56 68
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-03 Thread Phillip J. Eby
At 08:48 AM 4/3/05 +0200, Martin v. Löwis wrote:
I personally think that the proposed functionality should *not* live
in a separate module, but somehow be integrated into SRE.
+1.

 Whether or
not the proposed functionality is useful in the first place, I don't
know. I never have nested named groups in my regular expressions.
Neither have I, but only because it doesn't do what re2 does.  :)
I'd like to suggest that the addition also allow you to match a group by a 
named reference, thus allowing a complete grammar to be formed.  Of course, 
I don't know if the underlying regular expression engine could actually do 
that, but it would be nice if it could, since it would allow simple 
grammars to be more easily parsed without recourse to a more complex 
parsing module.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] longobject.c & ob_size

2005-04-03 Thread Michael Hudson
Asking mostly for curiousity, how hard would it be to have longs store
their sign bit somewhere less aggravating?  It seems to me that the
top bit of ob_digit[0] is always 0, for example, and I'm sure this
would result no less convolution in longobject.c it'd be considerably
more localized convolution.

Cheers,
mwh

-- 
   CDATA is not an integration strategy.
-- from Twisted.Quotes
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] longobject.c & ob_size

2005-04-03 Thread Martin v. Löwis
Michael Hudson wrote:
Asking mostly for curiousity, how hard would it be to have longs store
their sign bit somewhere less aggravating?  It seems to me that the
top bit of ob_digit[0] is always 0, for example, and I'm sure this
would result no less convolution in longobject.c it'd be considerably
more localized convolution.
I think the amount of special-casing that you need would remain the
same - i.e. you would have to mask out the sign before performing
the algorithms, then bring it back in. Masking out the bit from digit[0]
might slow down the algorithms somewhat, because you would probably mask
it out from every digit, not only digit[0] (or else test for digit[0],
which test would then be performed for all digits).
You would also have to keep the special case for 0L, which has
ob_size==0 (i.e. doesn't have digit[0]).
That said, I think the change could be implemented within a few hours,
taking a day to make the testsuite run again; depending on the review
process, you might need two releases to fix the bugs (but then, it
is also reasonable to expect to get it right the first time).
Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: hierarchicial named groups extension to the re library

2005-04-03 Thread Gustavo Niemeyer
Greetings,

> If this kind of functionality would fall on immediate rejection for
> some reason, even writing the PEP might be pointless. If the
[...]

In my opinion the functionality is useful.

> I personally think that the proposed functionality should *not* live
> in a separate module, but somehow be integrated into SRE. Whether or
[...]

Agreed. I propose to integrate this functionality into the SRE syntax,
so that this special kind of group may be used when explicitly wanted.
This would avoid backward compatibility problems, would give each
regular expression a single meaning, and would allow interleaving
hierarchical/non-hierarchical groups.

I offer myself to integrate the change once we decide on the right
way to implement it, and achieve consensus on its adoption.

Best regards,

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hierarchicial named groups extension to the re library

2005-04-03 Thread Gustavo Niemeyer
Greetings Chris,

> Well, that would be something I'd want to discuss here.  As I'm not
> sure if I actually ~want~ to match the API of the re module.

If this feature is considered a good addition for the standard
library, integrating it on re would be an interesting option.
But given what you say above, I'm not sure if *you* want to
make it a part of re itself.

[...]
> IMO If you don't bother to name a group then you probably aren't going
> to be interested in it anyway - so why keeping a reference to it?

That's not true. There's a lot of code out there using unnamed
groups genuinely. The syntax (?: ) is used when the group content
is considered unuseful.

> If you only wanted to extract the numbers from those verses...
> 
> >>> regex='^(((?P\d+) ([^,]+))(, )?)*$'
> >>> pat2=re2.compile(regex)
> >>> x=pat2.extract(buf)
> >>> x
> {'number': ['12', '11', '10']}
> 
> Before the compression stage the _Match object actually looked like this:
> 
> {'_group0': {'_value': '12 drummers drumming, 11 pipers piping, 10
> lords
[...]
> '10'}}]}}
> 
> But the compression algorithm collected the named groups and brought
> them to the surface, to return the much nicer looking:
> 
> {'number': ['12', '11', '10']}

I confess I didn't thought about how that could be cleanly
implemented, but both outputs you present above look inadequate
in my opinion. Regular expressions already have a widely adopted
meaning. If we're going to introduce new features, we should try
to do that without breaking the current well known meanings they
have.

> > I find the feature very interesting, but being used to live without it,
> > I have difficulty evaluating its usefulness.
> 
> Yes - this is a good point too, because it ~is~ different from the re
> library.  re2 aims to do all that searching, grouping, iterating and
> collecting and constructing work for you.
[...]
> Actually, I ~would~ like to limit it to just named groups.
> I reckon, if you're not going to bother naming a group, then why would
> you have any interest in it.
> I guess its up for discussion how confusing this "new" way of thinking
> could be and what drawbacks it might have.

Your target seems to be a new kind of regular expressions indeed.
In that case, I'm not sure if "re2" is the right name for it, given
that you haven't written an improved SRE, but a completely new
kind of regular expression matching which depends on SRE itself
rather than extending it on a compatible way.

While I would like to see *some* kind of successive matching
implemented in SRE (besides the Scanner which is already available),
I'm not in favor of that specific implementation.

I'm open to discuss that further.

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] longobject.c & ob_size

2005-04-03 Thread Armin Rigo
Hi Michael,

On Sun, Apr 03, 2005 at 04:14:16PM +0100, Michael Hudson wrote:
> Asking mostly for curiousity, how hard would it be to have longs store
> their sign bit somewhere less aggravating?

As I guess your goal is to get rid of all the "if (size < 0) size = -size" in
object.c and friends, I should point out that longobject.c has set out an
example that might have been followed by C extension writers.  Maybe it is too
late to say now that ob_size cannot be negative any more :-(


Armin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] hierarchicial named groups extension to the re library

2005-04-03 Thread ottrey

Hi Gustavo!,

On 4/4/2005, "Gustavo Niemeyer" <[EMAIL PROTECTED]> wrote:
>> Well, that would be something I'd want to discuss here.  As I'm not
>> sure if I actually ~want~ to match the API of the re module.
>
>If this feature is considered a good addition for the standard
>library, integrating it on re would be an interesting option.
>But given what you say above, I'm not sure if *you* want to
>make it a part of re itself.
>

After taking in the great comments made in this discussion, I'm now
thinking that it ~would~ be best to try and integrate the new
functionality with the existing re library (matching the current API),
as there is (at least some) re2 functionality that I think could fit
neatly into the existing re API.

As, like you say:
> This would avoid backward compatibility problems, would give each
> regular expression a single meaning, and would allow interleaving
> hierarchical/non-hierarchical groups.

>If we're going to introduce new features, we should try
>to do that without breaking the current well known meanings they
>have.

Agreed.

>I'm not in favor of that specific implementation.
>
>I'm open to discuss that further.

And I'm happy to work on a proposal that attempts to implement the new
functionality in a backwardly compatible, integrated way.

> I offer myself to integrate the change

Thanx!  That'd be great.

> once we decide on the right way to implement it,
> and achieve consensus on its adoption.

Great.
So I'll conclude from this discussion that (some implementation) of re2
is indeed worth adding to the re library (once we achieve consensus).

And as for creating a PEP...

>Josiah Carlson wrote:
>In general, if developers can readily agree that a functionality should
>be added (i.e. it is "obvious" for some reason), it is added right away.
>Otherwise, a PEP should be written, and reviewed by the community

I'd like to call the current functionality a "work in progress".
ie. I'd like to work on it more, taking on board the comments made here.

I'd also like to take this discussion off the python-dev list now and
shift it to pyre2.  (possibly to come back with a more polished proposal.)

We've set up a development wiki here:

  http://py.redsoft.be/pyre2/wiki/

(feel free to add any more suggestions.)

And there is also a mailing list, if anyone is interested and would like
to subscribe:

  http://lists.sourceforge.net/lists/listinfo/pyre2-devel


Regards.

Chris.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com