I've recently switch to using this regex for pulling out links,
haven't spotted any issues with any extra characters surrounding the
links as yet.
/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d?[.])(?:[^\s()<>]+|\
([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:\'".,<>?«»“”‘’\s]))/
It was post
This might be relevant to your interests:
http://daringfireball.net/2009/11/liberal_regex_for_matching_urls
Something definitely changed in the twitter web front-end code which
is borking url matching as of a month or so ago...
-Chad
On Fri, Dec 18, 2009 at 2:44 AM, Harshad RJ wrote:
> Although
Although not an API issue, it might be good to track it as such, because
Twitter clients can then follow exactly the same policies that Twitter web
interface does.
If there is a standard regular expression that can be used for detecting a
URL, it could be published as a guideline in the API docume
d out long ago - the RFC was
> published in '94.
> >
> >
> >
> >
> >
> > > Date: Thu, 17 Dec 2009 07:55:14 -0800
> > > Subject: [twitter-dev] Re: URLification
> > > From: dba...@gmail.com
> > > To: twitter-development-talk@googlegroups.com
> >
safe bet too.
> I'm sure those rules have been worked out long ago - the RFC was published in
> '94.
>
>
>
>
>
> > Date: Thu, 17 Dec 2009 07:55:14 -0800
> > Subject: [twitter-dev] Re: URLification
> > From: dba...@gmail.com
> > To: twi
A closing parenthesis followed by a space seems like a pretty safe bet too. I'm
sure those rules have been worked out long ago - the RFC was published in '94.
> Date: Thu, 17 Dec 2009 07:55:14 -0800
> Subject: [twitter-dev] Re: URLification
> From: dba...@gmail.com
> T
> Date: Thu, 17 Dec 2009 05:48:31 -0800
> > Subject: [twitter-dev] Re: URLification
> > From: dba...@gmail.com
> > To: twitter-development-talk@googlegroups.com
>
> > Periods and parentheses are valid url characters. Assuming that an
> > adjacent period or c
True, but Yahoo! Mail and others do get it right.
It's been a few years I no longer worry sending an email with a URL at the end
of a sentence. I wonder how they do it.
> Date: Thu, 17 Dec 2009 05:48:31 -0800
> Subject: [twitter-dev] Re: URLification
> From: dba...@gmail.com
Periods and parentheses are valid url characters. Assuming that an
adjacent period or closing parenthesis is not part of the url is a
gamble. The most sensible urlification includes all valid characters
until it finds one that clearly delimits the url such as a space.
http://www.ietf.org/rfc/rfc17