Re: [twsocket] webpage image source

Xxxx Xxxx Sat, 02 May 2009 12:19:26 -0700

I thought that the root dir could be something definable, and therefore
may be different than the domain name?
Like the root dir for
"www.geocities.com/Athens/111/delphi/docs/sockets.html" would be
"www.geocities.com/Athens/111/delphi"? But you say it would be
"www.geocities.com", correct?


Thanks for the replies - this is helpful.


<-----Original Message-----> 
>From: Francois PIETTE [francois.pie...@skynet.be]
>Sent: 5/2/2009 2:21:07 AM
>To: twsocket@elists.org
>Subject: Re: [twsocket] webpage image source
>
>> Hi, I'm using httpcli to save a webpage html doc and I extract all of
>> it's image locations to a text file by saving the '<IMG SRC=' tags.
>> Afterward I want to download all of the images, but how can I
determine
>> the TRUE location of the images? For example, say the image tag is:
>> '<IMG SRC='test.com/photo.jpg'' - for all I know, "test.com" could
just
>> be a directory on the server or it could be the website. Another
>> example, say the image tag is: '<IMG SRC='/photo.jpg'' - so the image
is
>> in the root directory of the website, but who knows what the root
>> directory is? It may simply be 'test.com', or if the html doc is
located
>> in a subdirectory, it may be something like 'test.com/users/me'.
>>
>> So, what is the appropriate way to determine the actual true location
of
>> these images from the 'IMG' tags?
>
>If the image URL starts with "/" then it is an absolute URL. Just
prepend 
>the website URL and you have the image URL.
>If the image URL doesn't starts with "/", then it is a relative URL.
You 
>must prepent de URL of the page where the you've found the image,
excluding 
>the document itself.
>
>Example: Assuming you are getting a page from 
>"http://www.mysite.com/docs/page.html";.
>If you find an image source URL as "/photo.jpg" then the complete URL
is 
>"http://www.mysite.com/photo.jpg";
>If you find an image with URL "test.com/photo.jpg" then the complete
URL is 
>"http://www.mysite.com/docs/test.com/photo.jpg";
>
>
>> but who knows what the root directory is?
>
>The root directory is alwas easy to find. It is the URL starting from 
>"http:" up to the first "/". In my above example, the root is simply 
>"http://www.mysite.com";.
>
>--
>francois.pie...@overbyte.be
>The author of the freeware multi-tier middleware MidWare
>The author of the freeware Internet Component Suite (ICS)
>http://www.overbyte.be
>
>-- 
>To unsubscribe or change your settings for TWSocket mailing list
>please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
>Visit our website at http://www.overbyte.be
>.
> 


<span id=m2wTl><p><font face="Arial, Helvetica, sans-serif" size="2" 
style="font-size:13.5px">_______________________________________________________________<BR>Get
 the Free email that has everyone talking at <a href=http://www.mail2world.com 
target=new>http://www.mail2world.com</a><br>  <font color=#999999>Unlimited 
Email Storage &#150; POP3 &#150; Calendar &#150; SMS &#150; Translator &#150; 
Much More!</font></font></span>
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Re: [twsocket] webpage image source

Reply via email to