Re: splitting words with brackets

2006-07-27 Thread Paul McGuire
Tim Chase [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') r.findall(s) ['(a c)b(c d)', 'e'] Ah, it's exactly what I want! I thought the left and right sides of | are equal, but it is not true. In theory, they *should* be

Re: splitting words with brackets

2006-07-27 Thread Tim Chase
Hunh! I thought pyparsing was included with Debian. (http://packages.debian.org/stable/source/pyparsing) Yes, it's available. Laziness is the main factor here...however, it's simply an apt-get install pyparsing away. And is downloading a package really such a hardship? What, are you on

splitting words with brackets

2006-07-26 Thread Qiangning Hong
I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't written a working one after hours of work. Example: a (b c) d [e f g] h i should be splitted to [a, (b c), d, [e f g],

Re: splitting words with brackets

2006-07-26 Thread faulkner
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) Qiangning Hong wrote: I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't written a working one after hours of work. Example:

Re: splitting words with brackets

2006-07-26 Thread faulkner
er, ...|\[[^\]]*\]|... ^_^ faulkner wrote: re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) Qiangning Hong wrote: I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
faulkner wrote: re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) sorry i forgot to give a limitation: if a letter is next to a bracket, they should be considered as one word. i.e.: a(b c) d becomes [a(b c), d] because there is no blank between a and (. --

Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
faulkner wrote: er, ...|\[[^\]]*\]|... ^_^ That's why it is nice to use re.VERBOSE: def splitup(s): return re.findall(''' \( [^\)]* \) | \[ [^\]]* \] | \S+ ''', s, re.VERBOSE) Much less error prone this way -- - Justin --

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
a (b c) d [e f g] h i should be splitted to [a, (b c), d, [e f g], h, i] As speed is a factor to consider, it's best if there is a single line regular expression can handle this. I tried this but failed: re.split(r(?![\(\[].*?)\s+(?!.*?[\)\]]), s). It work for (a b) c but not

Re: splitting words with brackets

2006-07-26 Thread Simon Forman
Qiangning Hong wrote: faulkner wrote: re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) sorry i forgot to give a limitation: if a letter is next to a bracket, they should be considered as one word. i.e.: a(b c) d becomes [a(b c), d] because there is no blank between a and (. This variation seems

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Tim Chase wrote: import re s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i' r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+') r.findall(s) ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', '[e f g]', 'h', 'i'] [...] However, the above monstrosity passes

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Simon Forman wrote: def splitup(s): return re.findall(''' \S*\( [^\)]* \)\S* | \S*\[ [^\]]* \]\S* | \S+ ''', s, re.VERBOSE) Yours is the same as Tim's, it can't handle a word with two or more brackets pairs, too. I tried to change the \S*\([^\)]*\)\S*

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
but it can't pass this one: (a c)b(c d) e the above regex gives out ['(a c)b(c', 'd)', 'e'], but the correct one should be ['(a c)b(c d)', 'e'] Ah...the picture is becoming a little more clear: r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') r.findall(s) ['(a c)b(c d)', 'e'] It also

Re: splitting words with brackets

2006-07-26 Thread Simon Forman
Qiangning Hong wrote: Tim Chase wrote: import re s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i' r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+') r.findall(s) ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', '[e f g]', 'h', 'i'] [...]

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Simon Forman wrote: What are the desired results in cases like this: (a b)[c d] or (a b)(c d) ? [(a b)[c d]], [(a b)(c d)] -- http://mail.python.org/mailman/listinfo/python-list

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Tim Chase wrote: Ah...the picture is becoming a little more clear: r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') r.findall(s) ['(a c)b(c d)', 'e'] It also works on my original test data, and is a cleaner regexp than the original. The clearer the problem, the clearer the answer. :)

Re: splitting words with brackets

2006-07-26 Thread Paul McGuire
Tim Chase [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I'm sure there's a *much* more elegant pyparsing solution to this, but I don't have the pyparsing module on this machine. It's much better/clearer and will be far more readable when you come back to it later. However, the

Re: splitting words with brackets

2006-07-26 Thread Paul McGuire
Ah, I had just made the same change! from pyparsing import * wrd = Word(alphas) parenList = ( + SkipTo()) + ) brackList = [ + SkipTo(]) + ] listExpr = ZeroOrMore( Combine( OneOrMore( parenList | brackList | wrd ) ) ) t = a (b c) d [e f g] h i(j k) l [m n o]p q r[s] (t u)v(w) (x)(y)z print

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') r.findall(s) ['(a c)b(c d)', 'e'] Ah, it's exactly what I want! I thought the left and right sides of | are equal, but it is not true. In theory, they *should* be equal. I was baffled by the nonparity of the situation. You *should be

Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
Paul McGuire wrote: Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing, vs. 0.13 for re's, so about 15x faster for re's. If psyco is used (and we skip the first call, which incurs all the compiling overhead), the speed difference drops to about 7-10x. I did try