Re: OT help with wget

2022-02-17 Thread Chris Johnson
Alternatively, using Google's Takeout service will include images etc.
See things like
https://www.peggyktc.com/2021/08/back-up-all-your-blogger-account-data.html
for details.


On Thu, Feb 17, 2022 at 4:08 PM MacFH - C E Macfarlane - News
 wrote:
>
> I've sometimes used this bash script, which originally I wrote to
> download a section of a site concerning some hardware I owned when the
> site looked like it was about to close down, which in fact it did, so in
> due course I was glad that I'd had the foresight.
>
> I usually find that it gets related images, etc, except where they
> aren't stored in the same sub tree as the root document.
>
> (Note, beware unintended line wrap)
>
> wGetList.sh
> ===
>
> #!/bin/sh
> #   The name of the directory containing this script
> DIRY="${0%/*}/"
> # echo "Directory: ${DIRY}"
>
> #   The filename of this script
> SCRIPT="${0##*/}"
> # echo "Filename: ${SCRIPT}"
>
> #   User Agent string  -  Chrome
> UAGENT="--user-agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0)
> Gecko/20100101 Firefox/56.0'"
>
> #   Options for wget command
> OPTIONS="-F -r -c -nc -t 3 -T 60 --retry-connrefused -w 20 --random-wait
> --waitretry=60 --no-check-certificate ${UAGENT}"
>
> #   Input list of URLs to retrieve
> LIST=""
> LOG=""
> if [ "${1:0:4}" != "http" ]
> then
> if [ -f "${1}" ]
> then
> LIST="-i ${1}"
> LOG="${1%.*}.log"
> shift
> OPTIONS="${OPTIONS} ${LIST} ${*}"
> else
> echo "WARNING - url list file '$1' not found!"
> exit 1
> fi
> else
> OPTIONS="${OPTIONS} ${*}"
> LOG="${1##*//}"
> LOG="${LOG%%/*}.log"
> fi
>
>
> #   WGET the files in the list of URLs
> echo "echo ${OPTIONS} | xargs wget > \"${LOG}\" 2>&1"
> echo ${OPTIONS} | xargs wget > "${LOG}" 2>&1 &
>
>
> On 17/02/2022 17:03, Dave Widgery wrote:
> > Hi
> > Sorry I know this is very OT but i thought there might be a few people
> > here that might be able to help and possibly email me directly.
> > We have several blogs (using googles blogger) that my wife has created
> > over the years, but I want to create local copy's on my PC. I used the
> > following command.
> > wget --mirror --convert-links --adjust-extension --page-requisites
> > --no-parent http://.blogspot.com
> > and it created a full structure of the blog on my PC but it still
> > relies  on links to external websites for the images, can anybody
> > suggest how to get it to also download copy's of the all the images as
> > well?
> > Again sorry for the OT post but I have been going round in circles for a 
> > while.
> > Dave
> >
> > ___
> > get_iplayer mailing list
> > get_iplayer@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/get_iplayer
>
>
> ___
> get_iplayer mailing list
> get_iplayer@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/get_iplayer

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


Re: OT help with wget

2022-02-17 Thread MacFH - C E Macfarlane - News
I've sometimes used this bash script, which originally I wrote to 
download a section of a site concerning some hardware I owned when the 
site looked like it was about to close down, which in fact it did, so in 
due course I was glad that I'd had the foresight.


I usually find that it gets related images, etc, except where they 
aren't stored in the same sub tree as the root document.


(Note, beware unintended line wrap)

wGetList.sh
===

#!/bin/sh
#   The name of the directory containing this script
DIRY="${0%/*}/"
# echo "Directory: ${DIRY}"

#   The filename of this script
SCRIPT="${0##*/}"
# echo "Filename: ${SCRIPT}"

#   User Agent string  -  Chrome
UAGENT="--user-agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) 
Gecko/20100101 Firefox/56.0'"


#   Options for wget command
OPTIONS="-F -r -c -nc -t 3 -T 60 --retry-connrefused -w 20 --random-wait 
--waitretry=60 --no-check-certificate ${UAGENT}"


#   Input list of URLs to retrieve
LIST=""
LOG=""
if [ "${1:0:4}" != "http" ]
then
if [ -f "${1}" ]
then
LIST="-i ${1}"
LOG="${1%.*}.log"
shift
OPTIONS="${OPTIONS} ${LIST} ${*}"
else
echo "WARNING - url list file '$1' not found!"
exit 1
fi
else
OPTIONS="${OPTIONS} ${*}"
LOG="${1##*//}"
LOG="${LOG%%/*}.log"
fi


#   WGET the files in the list of URLs
echo "echo ${OPTIONS} | xargs wget > \"${LOG}\" 2>&1"
echo ${OPTIONS} | xargs wget > "${LOG}" 2>&1 &


On 17/02/2022 17:03, Dave Widgery wrote:

Hi
Sorry I know this is very OT but i thought there might be a few people
here that might be able to help and possibly email me directly.
We have several blogs (using googles blogger) that my wife has created
over the years, but I want to create local copy's on my PC. I used the
following command.
wget --mirror --convert-links --adjust-extension --page-requisites
--no-parent http://.blogspot.com
and it created a full structure of the blog on my PC but it still
relies  on links to external websites for the images, can anybody
suggest how to get it to also download copy's of the all the images as
well?
Again sorry for the OT post but I have been going round in circles for a while.
Dave

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer



___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer


OT help with wget

2022-02-17 Thread Dave Widgery
Hi
Sorry I know this is very OT but i thought there might be a few people
here that might be able to help and possibly email me directly.
We have several blogs (using googles blogger) that my wife has created
over the years, but I want to create local copy's on my PC. I used the
following command.
wget --mirror --convert-links --adjust-extension --page-requisites
--no-parent http://.blogspot.com
and it created a full structure of the blog on my PC but it still
relies  on links to external websites for the images, can anybody
suggest how to get it to also download copy's of the all the images as
well?
Again sorry for the OT post but I have been going round in circles for a while.
Dave

___
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer