On Tue, 6 Apr 2004 18:05:32 -0400 gohaku <[EMAIL PROTECTED]> wrote: > Hi everyone, > I have some ( actually many ) records in a Database that I want to > "clean" > Some of these records contain Unicode Text ( Mostly East-Asian ) > > I have tried matching for "\W+" and "\S+" but that is not what I am > looking for because I would like to keep "&" and "-" > > Thanks in advance. > -gohaku
Hello. A solution may depend on which contamination may be mixed in your records. If contamination is an unassigned code points which shall not be used, \p{Assigned}+ may be useful. SADAHIRO Tomoyuki