[EMAIL PROTECTED] writes:

> Will wget build me such a copy of the entire site?  Full interlinked
> and spiderable?

Yes, with several "buts".

1. Your site should be written and interlinked in fairly discernable
   HTML.  No image rollovers linked only through JavaScript.  No CSS
   imports.

2. Banners are usually a problem, although probably not in your case.
   Since they are off-site, Wget converts them to full links
   (http://...), but google shouldn't mind.

3. Wget cannot make the URLs on your site short and nice.  It will
   follow the redirects provided by mod_rewrite, but replacing the
   links in the HTML pages will be up to you.

The command to make the copy would be something like
`wget --mirror --convert-links --html-extension URL'.  If your site
includes images from another host, you'll probably need to add
`--span-hosts -D DOMAIN-TO-SPAN'.  See the info documentation for more
details.

> I am thinking to use a tool for making the dynamic url�s to short
> static urls e.g.
> mydomain/shop.cgi?action=add&templ=cart1  -> mydomain/add/cart1
> Such a "Dynamic2Static Rewriting" can be triggered by cron.
> The indexed static url�s will be rewritten by mod_rewrite.
>
> Whats a goog Linux tool for that stringreplacement?
> A table for stringreplacement is required with regular expressions:
> action=add&templ=cart1  -> mydomain/add/cart1
> action=add&templ=cart2  -> mydomain/add/cart2

Different people use different tools.  For simple in-place regexp
substitutions, the one-liner `perl -pi -e 's/FOO/BAR/g' FILES...' is
probably a good choice.

Reply via email to