Steven D'Aprano writes:

 > I mean, if all you are doing is splitting the source by some separators 
 > regardless of order, surely this does the same job and is *vastly* more 
 > obvious?
 > 
 > >>> re.split(r'[:;]', 'foo:bar;baz')
 > ['foo', 'bar', 'baz']

"Obvious" yes, but it's also easy to invest that call with semantics
(eg, "just three segments because that's the allowed syntax") that it
doesn't possess.  You haven't stated how many elements it should be
split into, nor whether the separator characters are permitted in
components, nor whether this component is the whole input and this
regexp defines the whole syntax.  The point of the "well-known idiom"
is to specify most of that (and it doesn't take much much more to
specify all of it, specifying "no separators in components" is the
most space-consuming part of the expression!)

Your other alternatives have the same potential issues.

 > > But that's characteristic of many examples.
 > 
 > Great. Then for *those* structured examples you can happily write your 
 > regex and put the separators in the order you expect.
 > 
 > But I'm talking about *unstructured* examples where you don't know the 
 > order of the separators, you want to split on whichever one comes first 
 > regardless of the order, and you need to know which separator that
 > was.

That's easy enough to do with a (relatively unknown to some ;-)
regular expression:

    re.match("([^;:]*)([;:])(.*)", source)

The question is whether the need is frequent enough and that's hard
enough to understand / ugly enough to warrant another method or an
incompatible extension to str.partition (and str.rpartition).[1]

 > > Examples where the order of separators doesn't matter?  In most of the
 > > examples I need, swapping order is a parse error.
 > 
 > Okay, then you *mostly* don't need this.

I already knew that.  Without real examples, I can't judge whether I'm
pro-status quo or pro-serving-the-nonuniversal-but-still-useful-case.

 > str.partition does *one* three way split, into (head, sep, tail).
 > If you want to continue to partition the tail, you have to call
 > it again.

I'm much more favorable to proposals where str.partition and
str.rpartition split at *one* point, but the OP seemed intended to do
more work (but not arbitrary amounts!) per call.

 > I'm not sure I quite understand you there, but if I do, I would
 > prefer to split the string and then validate the head and tail
 > afterwards, rather than just have the regex fail.

For me, often that depends on how hard I'm willing to work to support
users.  If the only user is myself, that's very often zero.  In the
case of the "well-known idiom", the only ways the regexp can fail
involve wrong number of separators.  I'd be willing to impose that
burden on users with a "wrong number of separators" message.  Another
case is where I want an efficient parser for the vast majority of
conformant cases and am willing to do redundant work for the error
cases.



Footnotes: 
[1]  Here "incompatible" means that people writing code that must
support previous versions of Python can't use it.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WG57TUZEEPD73QNFNCRR2LOA5NEZFP4J/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to