> I will write my own if I have to. But before I do, I'd like to understand as
> many details as possible about the specifics of Twitter's RFC 3986 behavior.

This is the regex I'm using, which is known to work:

        $x =~ s/([^-0-9a-zA-Z._~])/"%".uc(unpack("H2",$1))/eg;

In short, letters, numbers, and the set of -._~ are NOT URL encoded. 
Everything else is.

Note this routine is not 100% UTF-8 safe as written; I have other code
that handles that, so you may need to do that as your library warrants.

------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- People are weird. -- Law & Order SVU ---------------------------------------

Reply via email to