Re: [twsocket] HTTPcli: source path question

2010-09-09 Thread Anton S.
Hello SZ!

You must parse the HTML for this. We use a Delphi HTML parser which I
downloaded from sourceforge for this but sometimes it raises an exception.
Search for that and if you cannot find it I will do my best to search it for
you in our projects...
Actually I'm trying to extend Angus' Magenta Copy so HTML parsing is already 
implemented.

Zvone,
right! I'll take more intent look at the headers.

Thanks to all for answers!

-- 
Anton
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HTTPcli: source path question

2010-09-08 Thread Anton S.
Francois wrote:
In HTTP world, there is no real directory concept. There are only documents. 
It happens that some webservers, if configured so could display a directory 
content if the default document is missing. That directory content is a HTML 
page built automatically by the webserver.
Yes, I've realized it already

This is not always the case.
I would not rely on that behaviour.

Zvone wrote:
So you cannot really know how folders are structured on the server is
just by looking at the URL.

Sad :( That's what I was afraid of...

Well, then I have a question: maybe you have some ideas of how to organize 
recursive download: for example, if user started to download 
www.example.com/path/index.html, we should also accept 
www.example.com/path/logo.jpg and so on, but not www.example.com/index.php. If 
user started www.example.com/path/foo, we should accept 
www.example.com/path/foo/index.php but NOT www.example.com/path/bar.jpg.
Applications like Wget do support this behavior but the question is how they do 
it.

--
Anton
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HTTPcli: source path question

2010-09-08 Thread Fastream Technologies
Hello Anton,

You must parse the HTML for this. We use a Delphi HTML parser which I
downloaded from sourceforge for this but sometimes it raises an exception.
Search for that and if you cannot find it I will do my best to search it for
you in our projects...

Regards,

SZ

On Wed, Sep 8, 2010 at 10:15 AM, Anton S. an...@rambler.ru wrote:

 Francois wrote:
 In HTTP world, there is no real directory concept. There are only
 documents.
 It happens that some webservers, if configured so could display a
 directory
 content if the default document is missing. That directory content is a
 HTML
 page built automatically by the webserver.
 Yes, I've realized it already

 This is not always the case.
 I would not rely on that behaviour.

 Zvone wrote:
 So you cannot really know how folders are structured on the server is
 just by looking at the URL.

 Sad :( That's what I was afraid of...

 Well, then I have a question: maybe you have some ideas of how to organize
 recursive download: for example, if user started to download
 www.example.com/path/index.html, we should also accept
 www.example.com/path/logo.jpg and so on, but not www.example.com/index.php.
 If user started www.example.com/path/foo, we should accept
 www.example.com/path/foo/index.php but NOT www.example.com/path/bar.jpg.
 Applications like Wget do support this behavior but the question is how
 they do it.

 --
 Anton
 --
 To unsubscribe or change your settings for TWSocket mailing list
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HTTPcli: source path question

2010-09-08 Thread Zvone
 Well, then I have a question: maybe you have some ideas of how to organize 
 recursive download: for example, if user started to download 
 www.example.com/path/index.html, we should also accept 
 www.example.com/path/logo.jpg and so on, but not www.example.com/index.php. 
 If user started www.example.com/path/foo, we should accept 
 www.example.com/path/foo/index.php but NOT www.example.com/path/bar.jpg.
 Applications like Wget do support this behavior but the question is how they 
 do it.

HTTP reply consists of header and document. In header you can find
useful info about the type of the document being served.
Wget uses this info to determine filename and hint the directory
structure. It parses HTML but not in a way that it creates a folder
structure. Rather it creates a browsable structure that you can open
in your web browser.

Basically for each document you receive you have to scan for a
href=link links (and possibly also CSS-based links) and internally
in your program organize them into folder structure. You also need to
look at base link in html header if it exists.

To create browsable structure sometimes also a href links in
downloaded documents need to be modified as well, to point to
different location.
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HTTPcli: source path question

2010-09-07 Thread Francois PIETTE

Currently I'm starting some research on HTTP downloads with ICS THttpCli.
I want to add recursive download functionality but faced with 
impossibility to distinguish file of directory.


In HTTP world, there is no real directory concept. There are only documents. 
It happens that some webservers, if configured so could display a directory 
content if the default document is missing. That directory content is a HTML 
page built automatically by the webserver.


Then I noticed that requests to folder without trailing slash (GET 
/somepath/foo/bar)
are redirected to locations with slash (/somepath/foo/bar/) so it's easy 
to tell it's a directory.


This is not always the case. Here again, it happens that either the web 
designer of the webserver by itself redirect the client to the location 
ending with a slash when one is missing.


I'm far not close to HTTP specs and don't know whether it's obligatory 
behavior and could I rely on it.


I would not rely on that behaviour.


--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] HTTPcli: source path question

2010-09-07 Thread Zvone
 Then I noticed that requests to folder without trailing slash (GET
 /somepath/foo/bar)
 are redirected to locations with slash (/somepath/foo/bar/) so it's easy
 to tell it's a directory.

this depends how server is configured to treat trailing slash. In most
cases it will treat it as access to folder and look for default files
there (index.htm, index.html, index.php, default.asp, default.aspx
etc.). But this can be easily changed by simply changing .htaccess
files on Apache for example so even if web server is configured one
way, navigating to a certain folder with different .htaccess
directives can change this behaviour completely.

You will see that for example WordPress has an option how it will
display URL path - as folders as html file but in reality this is
just a choice of format which will be parsed later by index.php or
whatever. This is just a modification of .htaccess

So you cannot really know how folders are structured on the server is
just by looking at the URL. Furthermore a lot of servers are
configured as virtual hosting meaning a single host hosts hundreds or
even thousands of sites that share the same IP address (just have
their own user account directory configured on the server).
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be