On 20 Nov 98, at 16:59, [EMAIL PROTECTED] wrote:
> What about the fact that browsers have been dumbed down to take www.blah.com as a
>"valid" URL ?
> And what about ftp://blah.blah.blah/ or gopher://blah.blah.blah/ ?
Actually I've said you can look for string containing
valid char+point+valid char and then scan left and right to find a
plausible termination (space, point, comma, invalid char)...
Dont ask me the flow chart... I'm sure you can imagine the details
(string containing ".." etc...)
But actually I think that in a file without formatting rule or edited by
a not so aware person it would be impossible to isolate the URLs
(eg: "You'll find this wonderful piece of software at
www.microsoft.com.It would be aviable..." the valid url is
www.microsoft.com or www.microsoft.com.It)
You can refine your search including as a rule domain ending with
"valid" postfix as .com .it .net etc... but as I showed this is not the
definitive answer...
I think a more reasonable hypothesis is to assume there is some
sort of formatting (is this the right world?)... I think a man or an AI
program are unlikely.
-------------------------------------------
Ivan Sergio Borgonovo [EMAIL PROTECTED]
Webmaster Gorilla Bookstore http://www.gorilla.it
Tel. +39 2 3311105/34530455 Fax. +39 2 34531591
Via Mac Mahon 9, Milano, Italy
-------------------------------------------
____________________________________________________________________
--------------------------------------------------------------------
Join The Web Consultants Association : Register on our web site Now
Web Consultants Web Site : http://just4u.com/webconsultants
If you lose the instructions All subscription/unsubscribing can be done
directly from our website for all our lists.
---------------------------------------------------------------------