Static Mirror of DB-Driven Site

2003-03-17 Thread Dan Mahoney, System Admin
Hi all,

Assume I have a site that I want to create a static mirror of.  Normally
this site is database driven, but I figure if I spider the entire site,
and map all the GET URLS to static urls I can have a full mirror.  Has
anyone known of this being successfully done?  How would I get apache to
see the page names as full names (for example a page named
exec.pl?name=blahfoo=bar actually being a file rather than a command?)

(or would I rewrite all the above to be url-encoded?  If so how do I do
this from within wget).

-Dan Mahoney

--

Zaren Christ almighty...  my EYES!  They're melting!

-Zaren, Efnet #macintosh, in response to:

www.geocities.com/CollegePark/Classroom/1944
The WEBSITE DESIGN class that gave my fiancee a D.

Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---




Re: Static Mirror of DB-Driven Site

2003-03-17 Thread Tony Lewis
Dan Mahoney, System Admin wrote:

 Assume I have a site that I want to create a static mirror of.  Normally
 this site is database driven, but I figure if I spider the entire site,
 and map all the GET URLS to static urls I can have a full mirror.  Has
 anyone known of this being successfully done?  How would I get apache to
 see the page names as full names (for example a page named
 exec.pl?name=blahfoo=bar actually being a file rather than a command?)

Wget should already do what you want (provided that the file system where
you will be mirroring the results can handle things like ?, =, and 
in a file name). Wget does not care how Apache processes a URL; it only
cares that when it does a GET of a URL that some object is returned.

The issue for you will be making sure that all the things you want to mirror
are referenced as links on the site. How does a person visiting your site
know that blah is a valid value for name or that bar is a valid value
for foo? If they learn this by clicking on a link, then everything should
work as you want.

However, if the user must supply the value for name and foo (perhaps by
entering them in a form) then there is no way for wget to know those values.
If that is the case, you will have to construct your own list of URLs with
all the combinations of name and foo that you want to mirror.

HTH.

Tony



wget future (was Re: Not 100% rfc 1738 complience for FTP URLs =bug

2003-03-17 Thread Aaron S. Hawley
On Thu, 13 Mar 2003, Max Bowsher wrote:

  David Balazic wrote:
 
  So it is do it yourself , huh ? :-)

 More to the point, *no one* is available who has cvs write access.

what if for the time being the task of keeping track of submissions for
wget was done with its debian package?

http://bugs.debian.org/wget
http://packages.qa.debian.org/wget

that way, at least some of the work of incorporating and releasing and
testing these code submissions can be accomplished, making things perhaps
slightly easier when the wget authors get back.

/end lame idea


Re: Static Mirror of DB-Driven Site

2003-03-17 Thread Dan Mahoney, System Admin
On Mon, 17 Mar 2003, Tony Lewis wrote:

 Dan Mahoney, System Admin wrote:

  Assume I have a site that I want to create a static mirror of.  Normally
  this site is database driven, but I figure if I spider the entire site,
  and map all the GET URLS to static urls I can have a full mirror.  Has
  anyone known of this being successfully done?  How would I get apache to
  see the page names as full names (for example a page named
  exec.pl?name=blahfoo=bar actually being a file rather than a command?)

 Wget should already do what you want (provided that the file system where
 you will be mirroring the results can handle things like ?, =, and 
 in a file name).

Can most unix systems?  I thought the question mark was the only thing at
issue.  If that's the case, is there a way to save the filenames
url-encoded?

Actually that seems to be a bad idea.  But is there some way to
universally translate links from one format to another?

 Wget does not care how Apache processes a URL; it only
 cares that when it does a GET of a URL that some object is returned.

 The issue for you will be making sure that all the things you want to mirror
 are referenced as links on the site. How does a person visiting your site
 know that blah is a valid value for name or that bar is a valid value
 for foo? If they learn this by clicking on a link, then everything should
 work as you want.

 However, if the user must supply the value for name and foo (perhaps by
 entering them in a form) then there is no way for wget to know those values.
 If that is the case, you will have to construct your own list of URLs with
 all the combinations of name and foo that you want to mirror.

 HTH.

 Tony


--

I'm sorry, that is [EMAIL PROTECTED], but they did not say 'Exsqueeze Me' A Long Time 
Ago in a Galaxy Far Far Away.

-Richard Bozzello, on Jar Jar Binks

Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---