Re: [SLUG] Spider a website

James Polley Mon, 02 Jun 2008 22:59:24 -0700

wget-smubble-yew-get. Wget works great for getting a single file or a very
simple all-under-this-tree setup, but it can take forever.

Try httrack - http://www.httrack.com/. Ignore the pretty little screenshots,
the linux commandline version does the same job, just requires much
command-line-fu. It handles simple javascript links, is intelligent about
fetching requisites (images, css etc) from off-domain without trying to
cache the whole internet, is multi-threaded - and is actually designed
specifically for the purpose of making a static, offline copy of a website.

The user's guide at http://www.httrack.com/html/fcguide.html goes through
most common scenarios for you, and $DISTRO should be able to apt-get install
it for you. Urrr.. or whatever broken tool distros unfortunate enough not to
have apt-get use.

On Tue, Jun 3, 2008 at 2:20 PM, Peter Rundle <[EMAIL PROTECTED]>
wrote:

> I'm looking for some recommendations for a *simple* Linux based tool to
> spider a web site and pull the content back into plain html files, images,
> js, css etc.
>
> I have a site written in PHP which needs to be hosted temporarily on a
> server which is incapable (read only does static content). This is not a
> problem from a temp presentation point of view as the default values for
> each page will suffice. So I'm just looking for a tool which will quickly
> pull the real site (on my home php capable server) into a directory that I
> can zip and send to the internet addressable server.
>
> I know there's a lot of code out there, I'm asking for recommendations.
>
> TIA's
>
> Pete
>
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
>

-- 
There is nothing more worthy of contempt than a man who quotes himself -
Zhasper, 2004
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] Spider a website

Reply via email to