On 20 Nov 98, at 16:59, [EMAIL PROTECTED] wrote:

> What about the fact that browsers have been dumbed down to take www.blah.com as a 
>"valid" URL ?

> And what about ftp://blah.blah.blah/ or gopher://blah.blah.blah/ ?

Actually I've said you can look for string containing
valid char+point+valid char and then scan left and right to find a 
plausible termination (space, point, comma, invalid char)...
Dont ask me the flow chart... I'm sure you can imagine the details 
(string containing ".." etc...)
But actually I think that in a file without formatting rule or edited by 
a not so aware person it would be impossible to isolate the URLs
(eg: "You'll find this wonderful piece of software at 
www.microsoft.com.It would be aviable..." the valid url is 
www.microsoft.com or www.microsoft.com.It)

You can refine your search including as a rule domain ending with 
"valid" postfix as .com .it .net etc... but as I showed this is not the 
definitive answer...

I think a more reasonable hypothesis is to assume there is some 
sort of formatting (is this the right world?)... I think a man or an AI 
program are unlikely.
-------------------------------------------
Ivan Sergio Borgonovo [EMAIL PROTECTED]
Webmaster Gorilla Bookstore http://www.gorilla.it
Tel. +39 2 3311105/34530455 Fax. +39 2 34531591
Via Mac Mahon 9, Milano, Italy
-------------------------------------------
____________________________________________________________________
--------------------------------------------------------------------
 Join The Web Consultants Association :  Register on our web site Now
Web Consultants Web Site : http://just4u.com/webconsultants
If you lose the instructions All subscription/unsubscribing can be done
directly from our website for all our lists.
---------------------------------------------------------------------

Reply via email to