Static Mirror of DB-Driven Site
Hi all, Assume I have a site that I want to create a static mirror of. Normally this site is database driven, but I figure if I spider the entire site, and map all the GET URLS to static urls I can have a full mirror. Has anyone known of this being successfully done? How would I get apache to see the page names as full names (for example a page named exec.pl?name=blahfoo=bar actually being a file rather than a command?) (or would I rewrite all the above to be url-encoded? If so how do I do this from within wget). -Dan Mahoney -- Zaren Christ almighty... my EYES! They're melting! -Zaren, Efnet #macintosh, in response to: www.geocities.com/CollegePark/Classroom/1944 The WEBSITE DESIGN class that gave my fiancee a D. Dan Mahoney Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC ICQ: 13735144 AIM: LarpGM Site: http://www.gushi.org ---
Re: Static Mirror of DB-Driven Site
Dan Mahoney, System Admin wrote: Assume I have a site that I want to create a static mirror of. Normally this site is database driven, but I figure if I spider the entire site, and map all the GET URLS to static urls I can have a full mirror. Has anyone known of this being successfully done? How would I get apache to see the page names as full names (for example a page named exec.pl?name=blahfoo=bar actually being a file rather than a command?) Wget should already do what you want (provided that the file system where you will be mirroring the results can handle things like ?, =, and in a file name). Wget does not care how Apache processes a URL; it only cares that when it does a GET of a URL that some object is returned. The issue for you will be making sure that all the things you want to mirror are referenced as links on the site. How does a person visiting your site know that blah is a valid value for name or that bar is a valid value for foo? If they learn this by clicking on a link, then everything should work as you want. However, if the user must supply the value for name and foo (perhaps by entering them in a form) then there is no way for wget to know those values. If that is the case, you will have to construct your own list of URLs with all the combinations of name and foo that you want to mirror. HTH. Tony
wget future (was Re: Not 100% rfc 1738 complience for FTP URLs =bug
On Thu, 13 Mar 2003, Max Bowsher wrote: David Balazic wrote: So it is do it yourself , huh ? :-) More to the point, *no one* is available who has cvs write access. what if for the time being the task of keeping track of submissions for wget was done with its debian package? http://bugs.debian.org/wget http://packages.qa.debian.org/wget that way, at least some of the work of incorporating and releasing and testing these code submissions can be accomplished, making things perhaps slightly easier when the wget authors get back. /end lame idea
Re: Static Mirror of DB-Driven Site
On Mon, 17 Mar 2003, Tony Lewis wrote: Dan Mahoney, System Admin wrote: Assume I have a site that I want to create a static mirror of. Normally this site is database driven, but I figure if I spider the entire site, and map all the GET URLS to static urls I can have a full mirror. Has anyone known of this being successfully done? How would I get apache to see the page names as full names (for example a page named exec.pl?name=blahfoo=bar actually being a file rather than a command?) Wget should already do what you want (provided that the file system where you will be mirroring the results can handle things like ?, =, and in a file name). Can most unix systems? I thought the question mark was the only thing at issue. If that's the case, is there a way to save the filenames url-encoded? Actually that seems to be a bad idea. But is there some way to universally translate links from one format to another? Wget does not care how Apache processes a URL; it only cares that when it does a GET of a URL that some object is returned. The issue for you will be making sure that all the things you want to mirror are referenced as links on the site. How does a person visiting your site know that blah is a valid value for name or that bar is a valid value for foo? If they learn this by clicking on a link, then everything should work as you want. However, if the user must supply the value for name and foo (perhaps by entering them in a form) then there is no way for wget to know those values. If that is the case, you will have to construct your own list of URLs with all the combinations of name and foo that you want to mirror. HTH. Tony -- I'm sorry, that is [EMAIL PROTECTED], but they did not say 'Exsqueeze Me' A Long Time Ago in a Galaxy Far Far Away. -Richard Bozzello, on Jar Jar Binks Dan Mahoney Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC ICQ: 13735144 AIM: LarpGM Site: http://www.gushi.org ---