Re: [PHP] preg for unicode strings?
On Sat, November 5, 2005 3:02 pm, Andy Pieters wrote: Hi List I am doing some data validation and the following regexp fails [\W] When using characters like £ or ⬠Obviously because they are technically more then one character, even though they are only displayed as one. The script is encoded in UTF-8 Anybody know a fix for this? You could use http://php.net/utf8_decode on it first, and then validate... I dunno if that would allow any nasties to get past, but it least it should validate the input as legal I think... I always feel overwhelmed by all this multi-lingual character-encoding multi-byte stuff, frankly. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] preg for unicode strings?
Andy, try that one: /^[a-zA-Z]{3}|\p{Sc}$/u You don't want to put \p{Sc} in square brackets as \p{Sc} itself already is a character class. Umm.. Kinda don't make myself clear here, do I? You just don't want to, it's 5am in the morning here I gotta go to the next bed ;p Regards, Niels Andy Pieters: Hi Thank you for your reply. My regexp was /^([a-zA-Z]{3,}|[\W])/ Meaning match any string that is either 3 letters or 1 word character I'd like to change this to 3 letters or 1 currency character So I changed the regexp accordingly /^([a-zA-Z]{3,}|[\p{Sc}])/u And I tested with £ but it fails. Any ideas? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] preg for unicode strings?
Hi List I am doing some data validation and the following regexp fails [\W] When using characters like £ or € Obviously because they are technically more then one character, even though they are only displayed as one. The script is encoded in UTF-8 Anybody know a fix for this? With kind regards Andy -- Now listening to The Prophet - I Can't Stand It on amaroK Geek code: www.vlaamse-kern.com/geek Registered Linux User No 379093 If life was for sale, what would be its price? www.vlaamse-kern.com/sas/ for free php utilities -- pgpgretkqqhNR.pgp Description: PGP signature
Re: [PHP] preg for unicode strings?
On 11/5/05, Andy Pieters [EMAIL PROTECTED] wrote: I am doing some data validation and the following regexp fails [\W] When using characters like £ or € The script is encoded in UTF-8 Are you using the 'u' modifier to put PCRE in utf-8 mode? preg_match( '/\W/u', $text); -robin
Re: [PHP] preg for unicode strings?
Andy, you might want to check out http://www.regular-expressions.info/unicode.html Please note two things while using the described syntax: 1. You have to additionally use the u modificator. 2. While \p{Ll} for instance works in PHP, \p{Lowercase_Letter} doesn't. Regards, Niels Hi List I am doing some data validation and the following regexp fails [\W] When using characters like £ or € Obviously because they are technically more then one character, even though they are only displayed as one. The script is encoded in UTF-8 Anybody know a fix for this? With kind regards Andy -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] preg for unicode strings?
Hi Thank you for your reply. My regexp was /^([a-zA-Z]{3,}|[\W])/ Meaning match any string that is either 3 letters or 1 word character I'd like to change this to 3 letters or 1 currency character So I changed the regexp accordingly /^([a-zA-Z]{3,}|[\p{Sc}])/u And I tested with £ but it fails. Any ideas? With kind regards Andy On Sunday 06 November 2005 02:11, Niels Ganser wrote: Andy, you might want to check out http://www.regular-expressions.info/unicode.html Please note two things while using the described syntax: 1. You have to additionally use the u modificator. 2. While \p{Ll} for instance works in PHP, \p{Lowercase_Letter} doesn't. Regards, Niels Hi List I am doing some data validation and the following regexp fails [\W] When using characters like £ or € Obviously because they are technically more then one character, even though they are only displayed as one. The script is encoded in UTF-8 Anybody know a fix for this? With kind regards Andy -- Now listening to Top! Radio Live www.topradio.be/stream on amaroK Geek code: www.vlaamse-kern.com/geek Registered Linux User No 379093 If life was for sale, what would be its price? www.vlaamse-kern.com/sas/ for free php utilities -- pgpT8ldDDW3eO.pgp Description: PGP signature