Re: Regexp Neg. set of chars HowTo?

Paul McGuire Thu, 21 Dec 2006 03:56:04 -0800

On Dec 20, 7:40 am, durumdara <[EMAIL PROTECTED]> wrote:
> Hi!
>
> I want to replace some seqs. in a html.
> Let:
> a-
> b
> = ab
>
> but:
> xxx -
> b
> must be unchanged, because it is not word split.
>
> I want to search and replace with re, but I don't know how to neg. this
> set ['\ \n\t'].
>
> This time I use full set without these chars, but neg. is better and
> shorter.
>
> Ok, I can use [^\s], but I want to know, how to neg. set of chars.
> sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working.
>
> Thanks for the help:
> dd
>
> sNorm1= '([%s]{1})\-\<br\ \/\>\n'
> c = range(0, 256)
> c.remove(32)
> c.remove(13)
> c.remove(10)
> c.remove(9)
> s = ["\\%s" % (hex(v).replace('00x', '')) for v in c]
> sNorm1 = sNorm1 % ("".join(s))
> print sNorm1
>
> def Normalize(Text):
>
>     rx = re.compile(sNorm1)
>     def replacer(match):
>         return match.group(1)
>     return rx.sub(replacer, Text)
>
> print Normalize('a -<br />\nb')
> print Normalize('a-<br />\nb')
> sys.exit()


It looks like you are trying to de-hyphenate words that have been
broken across line breaks.

Well, this isn't a regexp solution, it uses pyparsing instead.  But
I've added a number of other test cases which may be problematic for an
re.

-- Paul

from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress

brTag,brEndTag = makeHTMLTags("br")
hyphen = Literal("-")
hyphen.leaveWhitespace() # don't skip whitespace before matching this

collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \
                + Word(alphas)
# define action to replace expression with the word before hyphen
# concatenated with the word after the <BR> tag
collapse.setParseAction(lambda toks: toks[0]+toks[1])

print collapse.transformString('a -<br />\nb')
print collapse.transformString('a-<br />\nb')
print collapse.transformString('a-<br/>\nb')
print collapse.transformString('a-<br>\nb')
print collapse.transformString('a- <BR clear=all>\nb')

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regexp Neg. set of chars HowTo?

Reply via email to