> I will write my own if I have to. But before I do, I'd like to understand as
> many details as possible about the specifics of Twitter's RFC 3986 behavior.
This is the regex I'm using, which is known to work:
$x =~ s/([^-0-9a-zA-Z._~])/"%".uc(unpack("H2",$1))/eg;
In short, letters, numbers, and the set of -._~ are NOT URL encoded.
Everything else is.
Note this routine is not 100% UTF-8 safe as written; I have other code
that handles that, so you may need to do that as your library warrants.
--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * [email protected]
-- People are weird. -- Law & Order SVU ---------------------------------------