Re: [twsocket] webpage image source

2009-05-04 Thread Xxxx Xxxx
I tried HttpCli1.RcvdStream := nil in OnLocationChange but the images
are still invalid.
Any other ideas?


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 2:08:43 PM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

Probably the datastream contains data from the not relocated URL as
well as 
the image from the relocated url. Use OnLocationChange to clear the 
datastream.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Sunday, May 03, 2009 8:37 PM
Subject: Re: [twsocket] webpage image source


 This is what I thought also, but I have FollowRelocation set to True
and
 it still downloads these images as invalid - like in the
'firebug.gif'
 example. There is data there, but the image won't display.


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
 appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only
 works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a
relocation if you prepend www. Your browser automatically follow
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli
besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
 layer,
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica, sans-serif
size=2 

style=font-size:13.5px__
_BRGet

 the Free email that has everyone talking at a 
 href=http://www.mail2world.com 
 target=newhttp://www.mail2world.com/abr font 
 color=#99Unlimited Email Storage - POP3 - Calendar - 
 SMS - Translator - Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be 

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.
 


span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-04 Thread Francois PIETTE
I tried HttpCli1.RcvdStream := nil in OnLocationChange but the images
 are still invalid.

This doesn't remove content from RcvdStream, it just disable access by the 
component. You should instead clear the content (it depends on the type of 
stream you are using. You may also destroy and recreate it).

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be


- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Monday, May 04, 2009 7:56 PM
Subject: Re: [twsocket] webpage image source


I tried HttpCli1.RcvdStream := nil in OnLocationChange but the images
 are still invalid.
 Any other ideas?


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 2:08:43 PM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

Probably the datastream contains data from the not relocated URL as
 well as
the image from the relocated url. Use OnLocationChange to clear the
datastream.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Sunday, May 03, 2009 8:37 PM
Subject: Re: [twsocket] webpage image source


 This is what I thought also, but I have FollowRelocation set to True
 and
 it still downloads these images as invalid - like in the
 'firebug.gif'
 example. There is data there, but the image won't display.


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
 appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only
 works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a
relocation if you prepend www. Your browser automatically follow
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli
 besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
 layer,
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica, sans-serif
size=2

style=font-size:13.5px__
 _BRGet

 the Free email that has everyone talking at a
 href=http://www.mail2world.com
 target=newhttp://www.mail2world.com/abr font
 color=#99Unlimited Email Storage - POP3 - Calendar -
 SMS - Translator - Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
 style=font-size:13.5px___BRGet
  
 the Free email that has everyone talking at a 
 href=http://www.mail2world.com 
 target=newhttp://www.mail2world.com/abr  font 
 color=#99Unlimited Email Storage #150; POP3 #150; Calendar #150; 
 SMS #150; Translator #150; Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be 

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-04 Thread Xxxx Xxxx
Ok, this is exactly what I did per your advice:

procedure TForm1.HttpCli1MainLocationChange(Sender: TObject);
var
HttpCli: THttpCli;
begin
HttpCli := Sender as THttpCli;
HttpCli.RcvdStream.Destroy;
HttpCli.RcvdStream := TMemoryStream.Create;
end;

Same problem still. Google.com still results in an invalid image
(logo.gif).


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/4/2009 1:42:50 PM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

I tried HttpCli1.RcvdStream := nil in OnLocationChange but the images
 are still invalid.

This doesn't remove content from RcvdStream, it just disable access by
the 
component. You should instead clear the content (it depends on the type
of 
stream you are using. You may also destroy and recreate it).

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be


- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Monday, May 04, 2009 7:56 PM
Subject: Re: [twsocket] webpage image source


I tried HttpCli1.RcvdStream := nil in OnLocationChange but the images
 are still invalid.
 Any other ideas?


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 2:08:43 PM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

Probably the datastream contains data from the not relocated URL as
 well as
the image from the relocated url. Use OnLocationChange to clear the
datastream.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Sunday, May 03, 2009 8:37 PM
Subject: Re: [twsocket] webpage image source


 This is what I thought also, but I have FollowRelocation set to
True
 and
 it still downloads these images as invalid - like in the
 'firebug.gif'
 example. There is data there, but the image won't display.


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
 appended
 www., the image does not download, but when I omit the
www., it
 downloads correctly.
 Try doing a GET request for
notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does
not.
 Either way will work in my web browser, but with HttpCli it
only
 works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server
do a
relocation if you prepend www. Your browser automatically follow
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli
 besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
 layer,
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica,
sans-serif
size=2

style=font-size:13.5px
__
 _BRGet

 the Free email that has everyone talking at a
 href=http://www.mail2world.com
 target=newhttp://www.mail2world.com/abr font
 color=#99Unlimited Email Storage - POP3 - Calendar -
 SMS - Translator - Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica, sans-serif
size=2 

style=font-size:13.5px__
_BRGet

 the Free email that has everyone talking at a 
 href=http://www.mail2world.com 
 target=newhttp://www.mail2world.com/abr font 
 color=#99Unlimited Email Storage - POP3 - Calendar - 
 SMS - Translator - Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be 

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http

Re: [twsocket] webpage image source

2009-05-03 Thread Xxxx Xxxx
One more question: when I try to retrieve some images using the appended
www., the image does not download, but when I omit the www., it
downloads correctly.
Try doing a GET request for notepad.com/images/firebug.gif and it
works. Try www.notepad.com/images/firebug.gif and it does not.
Either way will work in my web browser, but with HttpCli it only works
without www.

Is it possible to determine which format to use with HttpCli besides
doing a HEAD every time?

Thanks for the help again.


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/2/2009 2:38:16 PM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

I thought that the root dir could be something definable, and
therefore
 may be different than the domain name?
 Like the root dir for
 www.geocities.com/Athens/111/delphi/docs/sockets.html would be
 www.geocities.com/Athens/111/delphi? But you say it would be
 www.geocities.com, correct?

Correct: The root is www.geocities.com.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.
 


span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-03 Thread Francois PIETTE
 One more question: when I try to retrieve some images using the appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a 
relocation if you prepend www. Your browser automatically follow 
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport layer, 
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-03 Thread Xxxx Xxxx
This is what I thought also, but I have FollowRelocation set to True and
it still downloads these images as invalid - like in the 'firebug.gif'
example. There is data there, but the image won't display.


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only
works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a 
relocation if you prepend www. Your browser automatically follow 
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
layer, 
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.
 


span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-03 Thread Xxxx Xxxx
Even the main logo of google.com vs. www.google.com gives me the same
problem:
Try doing a GET on google.com/intl/en_ALL/images/logo.gif - this does
not download correctly for me with 'FollowRelocation' set to 'True'.


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only
works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a 
relocation if you prepend www. Your browser automatically follow 
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
layer, 
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.
 


span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-03 Thread Francois PIETTE
Probably the datastream contains data from the not relocated URL as well as 
the image from the relocated url. Use OnLocationChange to clear the 
datastream.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

- Original Message - 
From: Xxxx Xxxx delphi...@mail2computers.com
To: twsocket@elists.org
Sent: Sunday, May 03, 2009 8:37 PM
Subject: Re: [twsocket] webpage image source


 This is what I thought also, but I have FollowRelocation set to True and
 it still downloads these images as invalid - like in the 'firebug.gif'
 example. There is data there, but the image won't display.


 -Original Message-
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/3/2009 8:45:14 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 One more question: when I try to retrieve some images using the
 appended
 www., the image does not download, but when I omit the www., it
 downloads correctly.
 Try doing a GET request for notepad.com/images/firebug.gif and it
 works. Try www.notepad.com/images/firebug.gif and it does not.
 Either way will work in my web browser, but with HttpCli it only
 works
 without www.

Actually it works if you allow HttpCli to follow relocation.
The real location for the image is without www, but the server do a
relocation if you prepend www. Your browser automatically follow
relocation.

Use HttpTst sample program to play with that url.

 Is it possible to determine which format to use with HttpCli besides
 doing a HEAD every time?

I don't understand what you mean.
HttpCli doesn't care about data format. It is just a HTTP transport
 layer,
whatever data is transported.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.



 span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
 style=font-size:13.5px___BRGet
  
 the Free email that has everyone talking at a 
 href=http://www.mail2world.com 
 target=newhttp://www.mail2world.com/abr  font 
 color=#99Unlimited Email Storage #150; POP3 #150; Calendar #150; 
 SMS #150; Translator #150; Much More!/font/font/span
 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be 

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread Francois PIETTE
 Hi, I'm using httpcli to save a webpage html doc and I extract all of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images, but how can I determine
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could just
 be a directory on the server or it could be the website. Another
 example, say the image tag is: 'IMG SRC='/photo.jpg'' - so the image is
 in the root directory of the website, but who knows what the root
 directory is? It may simply be 'test.com', or if the html doc is located
 in a subdirectory, it may be something like 'test.com/users/me'.

 So, what is the appropriate way to determine the actual true location of
 these images from the 'IMG' tags?

If the image URL starts with / then it is an absolute URL. Just prepend 
the website URL and you have the image URL.
If the image URL doesn't starts with /, then it is a relative URL. You 
must prepent de URL of the page where the you've found the image, excluding 
the document itself.

Example: Assuming you are getting a page from 
http://www.mysite.com/docs/page.html;.
If you find an image source URL as /photo.jpg then the complete URL is 
http://www.mysite.com/photo.jpg;
If you find an image with URL test.com/photo.jpg then the complete URL is 
http://www.mysite.com/docs/test.com/photo.jpg;


 but who knows what the root directory is?

The root directory is alwas easy to find. It is the URL starting from 
http: up to the first /. In my above example, the root is simply 
http://www.mysite.com;.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread Angus Robertson - Magenta Systems Ltd
 Hi, I'm using httpcli to save a webpage html doc and I extract all 
 of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images

There is already a free ICS component TMagHttp descended from THttpCli
that does all this, it parses a web page, generates a list of files on
that page, and optionally downloads them.  

http://www.magsys.co.uk/delphi/magxfer.asp

There is an EXE demo in the zip. 


 but how can I  determine
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could 
 just be a directory on the server or it could be the website. 

There is no HTTP in the URL, so it's a local file. 

Angus

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread RTT

RFC1738 - Uniform Resource Locators (URL)
http://www.faqs.org/rfcs/rfc1738.html

RFC1808 - Relative Uniform Resource Locators
http://www.faqs.org/rfcs/rfc1808.html

 Hi, I'm using httpcli to save a webpage html doc and I extract all of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images, but how can I determine
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could just
 be a directory on the server or it could be the website. Another
 example, say the image tag is: 'IMG SRC='/photo.jpg'' - so the image is
 in the root directory of the website, but who knows what the root
 directory is? It may simply be 'test.com', or if the html doc is located
 in a subdirectory, it may be something like 'test.com/users/me'.

 So, what is the appropriate way to determine the actual true location of
 these images from the 'IMG' tags?

 Much thanks in advance. 
   

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread Fastream Technologies
Hello,

Unless the first part of the string contains ://, it is current domain.
Else unless it starts with /, it is relative path.

Regards,

SZ

On 5/2/09, Xxxx Xxxx delphi...@mail2computers.com wrote:

 Hi, I'm using httpcli to save a webpage html doc and I extract all of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images, but how can I determine
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could just
 be a directory on the server or it could be the website. Another
 example, say the image tag is: 'IMG SRC='/photo.jpg'' - so the image is
 in the root directory of the website, but who knows what the root
 directory is? It may simply be 'test.com', or if the html doc is located
 in a subdirectory, it may be something like 'test.com/users/me'.

 So, what is the appropriate way to determine the actual true location of
 these images from the 'IMG' tags?

 Much thanks in advance.


 span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2
 style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=
 http://www.mail2world.com target=newhttp://www.mail2world.com/abr  font
 color=#99Unlimited Email Storage #150; POP3 #150; Calendar #150; SMS
 #150; Translator #150; Much More!/font/font/span
 --
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread Xxxx Xxxx
I thought that the root dir could be something definable, and therefore
may be different than the domain name?
Like the root dir for
www.geocities.com/Athens/111/delphi/docs/sockets.html would be
www.geocities.com/Athens/111/delphi? But you say it would be
www.geocities.com, correct?

Thanks for the replies - this is helpful.


-Original Message- 
From: Francois PIETTE [francois.pie...@skynet.be]
Sent: 5/2/2009 2:21:07 AM
To: twsocket@elists.org
Subject: Re: [twsocket] webpage image source

 Hi, I'm using httpcli to save a webpage html doc and I extract all of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images, but how can I
determine
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could
just
 be a directory on the server or it could be the website. Another
 example, say the image tag is: 'IMG SRC='/photo.jpg'' - so the image
is
 in the root directory of the website, but who knows what the root
 directory is? It may simply be 'test.com', or if the html doc is
located
 in a subdirectory, it may be something like 'test.com/users/me'.

 So, what is the appropriate way to determine the actual true location
of
 these images from the 'IMG' tags?

If the image URL starts with / then it is an absolute URL. Just
prepend 
the website URL and you have the image URL.
If the image URL doesn't starts with /, then it is a relative URL.
You 
must prepent de URL of the page where the you've found the image,
excluding 
the document itself.

Example: Assuming you are getting a page from 
http://www.mysite.com/docs/page.html;.
If you find an image source URL as /photo.jpg then the complete URL
is 
http://www.mysite.com/photo.jpg;
If you find an image with URL test.com/photo.jpg then the complete
URL is 
http://www.mysite.com/docs/test.com/photo.jpg;


 but who knows what the root directory is?

The root directory is alwas easy to find. It is the URL starting from 
http: up to the first /. In my above example, the root is simply 
http://www.mysite.com;.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be
.
 


span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
style=font-size:13.5px___BRGet
 the Free email that has everyone talking at a href=http://www.mail2world.com 
target=newhttp://www.mail2world.com/abr  font color=#99Unlimited 
Email Storage #150; POP3 #150; Calendar #150; SMS #150; Translator #150; 
Much More!/font/font/span
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] webpage image source

2009-05-02 Thread RTT

Just read sections 3 and 4
http://www.faqs.org/rfcs/rfc1808.html

Section 3 explain how you define, to the page you are parsing, its base 
URL. The base URL is needed to resolve later relative URLs.
Relative URLs resolving method is explained at section 4.
 I thought that the root dir could be something definable, and therefore
 may be different than the domain name?
 Like the root dir for
 www.geocities.com/Athens/111/delphi/docs/sockets.html would be
 www.geocities.com/Athens/111/delphi? But you say it would be
 www.geocities.com, correct?

 Thanks for the replies - this is helpful.


 -Original Message- 
   
 From: Francois PIETTE [francois.pie...@skynet.be]
 Sent: 5/2/2009 2:21:07 AM
 To: twsocket@elists.org
 Subject: Re: [twsocket] webpage image source

 
 Hi, I'm using httpcli to save a webpage html doc and I extract all of
 it's image locations to a text file by saving the 'IMG SRC=' tags.
 Afterward I want to download all of the images, but how can I
   
 determine
   
 the TRUE location of the images? For example, say the image tag is:
 'IMG SRC='test.com/photo.jpg'' - for all I know, test.com could
   
 just
   
 be a directory on the server or it could be the website. Another
 example, say the image tag is: 'IMG SRC='/photo.jpg'' - so the image
   
 is
   
 in the root directory of the website, but who knows what the root
 directory is? It may simply be 'test.com', or if the html doc is
   
 located
   
 in a subdirectory, it may be something like 'test.com/users/me'.

 So, what is the appropriate way to determine the actual true location
   
 of
   
 these images from the 'IMG' tags?
   
 If the image URL starts with / then it is an absolute URL. Just
 
 prepend 
   
 the website URL and you have the image URL.
 If the image URL doesn't starts with /, then it is a relative URL.
 
 You 
   
 must prepent de URL of the page where the you've found the image,
 
 excluding 
   
 the document itself.

 Example: Assuming you are getting a page from 
 http://www.mysite.com/docs/page.html;.
 If you find an image source URL as /photo.jpg then the complete URL
 
 is 
   
 http://www.mysite.com/photo.jpg;
 If you find an image with URL test.com/photo.jpg then the complete
 
 URL is 
   
 http://www.mysite.com/docs/test.com/photo.jpg;


 
 but who knows what the root directory is?
   
 The root directory is alwas easy to find. It is the URL starting from 
 http: up to the first /. In my above example, the root is simply 
 http://www.mysite.com;.

 --
 francois.pie...@overbyte.be
 The author of the freeware multi-tier middleware MidWare
 The author of the freeware Internet Component Suite (ICS)
 http://www.overbyte.be

 -- 
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be
 .

 


 span id=m2wTlpfont face=Arial, Helvetica, sans-serif size=2 
 style=font-size:13.5px___BRGet
  the Free email that has everyone talking at a 
 href=http://www.mail2world.com target=newhttp://www.mail2world.com/abr  
 font color=#99Unlimited Email Storage #150; POP3 #150; Calendar 
 #150; SMS #150; Translator #150; Much More!/font/font/span
   

-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be