Hello Saifi, You can try this one also:
((https?|ftp)://[\w\d+\?\.\-:#...@%/&=~_]*) Actually I have modified it a bit as I was using this regex in my java code (may have an extra '\' here and there). But this is the most standard way to extract urls in any web-mining or web-indexing code. regards, Dhiraj Chawla > Hi all: > > There is an HTML file with more than 5000 entries that look like > > <li><a href="http://blogtrader.net/dcaoyuan/feed/entries/atom" > title="subscribe"><img src="p_files/feed-icon-10x10.png" alt= > "(feed)"></a> <a href="http://blogtrader.net/" title= > "BlogTrader">Caoyuan Deng</a></li> > ... > ... > ... > > and i need to extract the URL links. > > Here is my PERL solution. > > #!/usr/bin/env perl > > $ok = open(FH, "<", "p.htm"); > > foreach (<FH>) > { > if ( /href=\"*[^\">]*/ ) > { > print "$& \n"; > } > } > > close(FH); > -- > > Is there a better way to express the regular expression and > print the URL links ? > > All suggestions and code snippets are welcome :) > > > thanks > Saifi. >

