Tim Chase [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
r.findall(s)
['(a c)b(c d)', 'e']
Ah, it's exactly what I want! I thought the left and right
sides of | are equal, but it is not true.
In theory, they *should* be
Hunh! I thought pyparsing was included with Debian.
(http://packages.debian.org/stable/source/pyparsing)
Yes, it's available. Laziness is the main factor
here...however, it's simply an apt-get install pyparsing
away.
And is downloading a package really such a hardship?
What, are you on
I've got some strings to split. They are main words, but some words
are inside a pair of brackets and should be considered as one unit. I
prefer to use re.split, but haven't written a working one after hours
of work.
Example:
a (b c) d [e f g] h i
should be splitted to
[a, (b c), d, [e f g],
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
Qiangning Hong wrote:
I've got some strings to split. They are main words, but some words
are inside a pair of brackets and should be considered as one unit. I
prefer to use re.split, but haven't written a working one after hours
of work.
Example:
er,
...|\[[^\]]*\]|...
^_^
faulkner wrote:
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
Qiangning Hong wrote:
I've got some strings to split. They are main words, but some words
are inside a pair of brackets and should be considered as one unit. I
prefer to use re.split, but haven't
faulkner wrote:
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
sorry i forgot to give a limitation: if a letter is next to a bracket,
they should be considered as one word. i.e.:
a(b c) d becomes [a(b c), d]
because there is no blank between a and (.
--
faulkner wrote:
er,
...|\[[^\]]*\]|...
^_^
That's why it is nice to use re.VERBOSE:
def splitup(s):
return re.findall('''
\( [^\)]* \) |
\[ [^\]]* \] |
\S+
''', s, re.VERBOSE)
Much less error prone this way
--
- Justin
--
a (b c) d [e f g] h i
should be splitted to
[a, (b c), d, [e f g], h, i]
As speed is a factor to consider, it's best if there is a
single line regular expression can handle this. I tried
this but failed:
re.split(r(?![\(\[].*?)\s+(?!.*?[\)\]]), s). It work
for (a b) c but not
Qiangning Hong wrote:
faulkner wrote:
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
sorry i forgot to give a limitation: if a letter is next to a bracket,
they should be considered as one word. i.e.:
a(b c) d becomes [a(b c), d]
because there is no blank between a and (.
This variation seems
Tim Chase wrote:
import re
s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
r.findall(s)
['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd',
'[e f g]', 'h', 'i']
[...]
However, the above monstrosity passes
Simon Forman wrote:
def splitup(s):
return re.findall('''
\S*\( [^\)]* \)\S* |
\S*\[ [^\]]* \]\S* |
\S+
''', s, re.VERBOSE)
Yours is the same as Tim's, it can't handle a word with two or more
brackets pairs, too.
I tried to change the \S*\([^\)]*\)\S*
but it can't pass this one: (a c)b(c d) e the above regex
gives out ['(a c)b(c', 'd)', 'e'], but the correct one should
be ['(a c)b(c d)', 'e']
Ah...the picture is becoming a little more clear:
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
r.findall(s)
['(a c)b(c d)', 'e']
It also
Qiangning Hong wrote:
Tim Chase wrote:
import re
s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
r.findall(s)
['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd',
'[e f g]', 'h', 'i']
[...]
Simon Forman wrote:
What are the desired results in cases like this:
(a b)[c d] or (a b)(c d) ?
[(a b)[c d]], [(a b)(c d)]
--
http://mail.python.org/mailman/listinfo/python-list
Tim Chase wrote:
Ah...the picture is becoming a little more clear:
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
r.findall(s)
['(a c)b(c d)', 'e']
It also works on my original test data, and is a cleaner regexp
than the original.
The clearer the problem, the clearer the answer. :)
Tim Chase [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
I'm sure there's a *much* more elegant pyparsing solution to
this, but I don't have the pyparsing module on this machine.
It's much better/clearer and will be far more readable when
you come back to it later.
However, the
Ah, I had just made the same change!
from pyparsing import *
wrd = Word(alphas)
parenList = ( + SkipTo()) + )
brackList = [ + SkipTo(]) + ]
listExpr = ZeroOrMore( Combine( OneOrMore( parenList | brackList | wrd ) ) )
t = a (b c) d [e f g] h i(j k) l [m n o]p q r[s] (t u)v(w) (x)(y)z
print
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
r.findall(s)
['(a c)b(c d)', 'e']
Ah, it's exactly what I want! I thought the left and right
sides of | are equal, but it is not true.
In theory, they *should* be equal. I was baffled by the nonparity
of the situation. You *should be
Paul McGuire wrote:
Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing,
vs. 0.13 for re's, so about 15x faster for re's. If psyco is used (and we
skip the first call, which incurs all the compiling overhead), the speed
difference drops to about 7-10x. I did try
19 matches
Mail list logo