On Dec 20, 7:40 am, durumdara <[EMAIL PROTECTED]> wrote: > Hi! > > I want to replace some seqs. in a html. > Let: > a- > b > = ab > > but: > xxx - > b > must be unchanged, because it is not word split. > > I want to search and replace with re, but I don't know how to neg. this > set ['\ \n\t']. > > This time I use full set without these chars, but neg. is better and > shorter. > > Ok, I can use [^\s], but I want to know, how to neg. set of chars. > sNorm1= '([^[\ \t\n]]{1})\-\<br\ \/\>\n' - this is not working. > > Thanks for the help: > dd > > sNorm1= '([%s]{1})\-\<br\ \/\>\n' > c = range(0, 256) > c.remove(32) > c.remove(13) > c.remove(10) > c.remove(9) > s = ["\\%s" % (hex(v).replace('00x', '')) for v in c] > sNorm1 = sNorm1 % ("".join(s)) > print sNorm1 > > def Normalize(Text): > > rx = re.compile(sNorm1) > def replacer(match): > return match.group(1) > return rx.sub(replacer, Text) > > print Normalize('a -<br />\nb') > print Normalize('a-<br />\nb') > sys.exit()
It looks like you are trying to de-hyphenate words that have been broken across line breaks. Well, this isn't a regexp solution, it uses pyparsing instead. But I've added a number of other test cases which may be problematic for an re. -- Paul from pyparsing import makeHTMLTags,Literal,Word,alphas,Suppress brTag,brEndTag = makeHTMLTags("br") hyphen = Literal("-") hyphen.leaveWhitespace() # don't skip whitespace before matching this collapse = Word(alphas) + Suppress(hyphen) + Suppress(brTag) \ + Word(alphas) # define action to replace expression with the word before hyphen # concatenated with the word after the <BR> tag collapse.setParseAction(lambda toks: toks[0]+toks[1]) print collapse.transformString('a -<br />\nb') print collapse.transformString('a-<br />\nb') print collapse.transformString('a-<br/>\nb') print collapse.transformString('a-<br>\nb') print collapse.transformString('a- <BR clear=all>\nb') -- http://mail.python.org/mailman/listinfo/python-list