> Well, then I have a question: maybe you have some ideas of how to organize > recursive download: for example, if user started to download > www.example.com/path/index.html, we should also accept > www.example.com/path/logo.jpg and so on, but not www.example.com/index.php. > If user started www.example.com/path/foo, we should accept > www.example.com/path/foo/index.php but NOT www.example.com/path/bar.jpg. > Applications like Wget do support this behavior but the question is how they > do it.
HTTP reply consists of header and document. In header you can find useful info about the type of the document being served. Wget uses this info to determine filename and hint the directory structure. It parses HTML but not in a way that it creates a folder structure. Rather it creates a browsable structure that you can open in your web browser. Basically for each document you receive you have to scan for <a href="link"> links (and possibly also CSS-based links) and internally in your program organize them into folder structure. You also need to look at <base> link in html header if it exists. To create browsable structure sometimes also <a href> links in downloaded documents need to be modified as well, to point to different location. -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be