On 07/04/2014 18:04, Ihe Onwuka wrote:
On Mon, Apr 7, 2014 at 5:49 PM, David Carlisle <[email protected]>
wrote:

No just that if you are writing vocabulary specific regex you need
 to use vocabulary specific regex terms. If I'm looking for words
in English I tend to use [a-z] even if some people try to sneak
accents into cafe or naive :-)


Well mine is not a regional vocabulary scenario. The backtick
appears in a title which is used to create a url which (I believe)
will not tolerate such characters.

well then grave accent is the least of your concerns with \w

URI letters are defined as ALPHA (%41-%5A and %61-%7A) ie [a-zA-Z] so
doesn't allow accented letters, or Greek or Cyrillic or 10s of thousands
of other characters included in \w

https://tools.ietf.org/html/rfc3986

Of course most user-facing systems such as html or XML allow a much
wider set of characters in href attributes and SYSTEM identifiers and
leave it to the system to %-encode according to the somewhat arcane URI
rules, cf IRI or LEIRI syntax.

David



_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

Reply via email to