I thought that the root dir could be something definable, and therefore may be different than the domain name? Like the root dir for "www.geocities.com/Athens/111/delphi/docs/sockets.html" would be "www.geocities.com/Athens/111/delphi"? But you say it would be "www.geocities.com", correct?
Thanks for the replies - this is helpful. <-----Original Message-----> >From: Francois PIETTE [francois.pie...@skynet.be] >Sent: 5/2/2009 2:21:07 AM >To: email@example.com >Subject: Re: [twsocket] webpage image source > >> Hi, I'm using httpcli to save a webpage html doc and I extract all of >> it's image locations to a text file by saving the '<IMG SRC=' tags. >> Afterward I want to download all of the images, but how can I determine >> the TRUE location of the images? For example, say the image tag is: >> '<IMG SRC='test.com/photo.jpg'' - for all I know, "test.com" could just >> be a directory on the server or it could be the website. Another >> example, say the image tag is: '<IMG SRC='/photo.jpg'' - so the image is >> in the root directory of the website, but who knows what the root >> directory is? It may simply be 'test.com', or if the html doc is located >> in a subdirectory, it may be something like 'test.com/users/me'. >> >> So, what is the appropriate way to determine the actual true location of >> these images from the 'IMG' tags? > >If the image URL starts with "/" then it is an absolute URL. Just prepend >the website URL and you have the image URL. >If the image URL doesn't starts with "/", then it is a relative URL. You >must prepent de URL of the page where the you've found the image, excluding >the document itself. > >Example: Assuming you are getting a page from >"http://www.mysite.com/docs/page.html". >If you find an image source URL as "/photo.jpg" then the complete URL is >"http://www.mysite.com/photo.jpg" >If you find an image with URL "test.com/photo.jpg" then the complete URL is >"http://www.mysite.com/docs/test.com/photo.jpg" > > >> but who knows what the root directory is? > >The root directory is alwas easy to find. It is the URL starting from >"http:" up to the first "/". In my above example, the root is simply >"http://www.mysite.com". > >-- >francois.pie...@overbyte.be >The author of the freeware multi-tier middleware MidWare >The author of the freeware Internet Component Suite (ICS) >http://www.overbyte.be > >-- >To unsubscribe or change your settings for TWSocket mailing list >please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket >Visit our website at http://www.overbyte.be >. > <span id=m2wTl><p><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Get the Free email that has everyone talking at <a href=http://www.mail2world.com target=new>http://www.mail2world.com</a><br> <font color=#999999>Unlimited Email Storage – POP3 – Calendar – SMS – Translator – Much More!</font></font></span> -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be