Re: [Bug-wget] Help: Why wget wall clock time much higher than download time?

2019-06-20 Thread David Bodin
Tim,

Genuine thanks for your response--and especially for your contribution of
wget2. I ran into an issue setting it up (on my aws ami) and can't find any
resources online that address the issue.

I followed the instructions you provided on how to build
<https://gitlab.com/gnuwget/wget2/blob/master/README.md> it, but after
building it and trying "*wget [url]*" , I first ran into  "*Failed to
connect: Wget has been built without TLS support*," but then found the
solution <https://github.com/rockdaboot/wget2/issues/201> and fixed it with
"*sudo yum -y install gnutls-devel*" and confirmed this by running
"./configure" and checking "SSL/TLS support:yes", and then rebuilt it
and tried to use wget again "*wget [url]*", but then ran into:

TLS False Start requested but Wget built with insufficient GnuTLS version

WARNING: OCSP is not available in this version of GnuTLS.

ERROR: The certificate is not trusted.

ERROR: The certificate doesn't have a known issuer.

Failed to connect: Certificate error

But when I try to install/update "*gnutls,*" I'm informed:

Package gnutls-2.12.23-21.18.amzn1.x86_64 already installed and latest
version
so I'm not sure how to proceed as it shows the most up to date package.

Thanks in advance for any help you can provide.

Sincerely,
Dave

P.S.
I'm going to use wget2, but wanted to briefly follow up on my original
question with wget to hopefully learn a little more.

1.) Thanks for the note on on not using "*--random-wait*" in the future,
but it made no difference when I ran my command with or without this flag.
Even with the flag, the download completed in 35s, but with 248 file, if
the wait was only .5s (for a best case scenario), it should have taken
around 62s.

2.) If I ran my wget command with "*--no-clobber*", it would correctly
download all files the first time, and the second time I ran the same
command, it would acknowledge it has already downloaded all the files and
finish almost immediately. I tried to parallelize the downloads by running
multiple instances of the program (wget --noclobber [url] & wget
--noclobber [url]), but it didn't download multiple files at the same time.
I expected the first program to start the download of a file, and the
second program to see it and to skip to the next file that needed to be
downloaded, and for the programs to move in parallel downloading all the
files. Do you know why this behavior happened instead of what I expected?

Many thanks.


On Thu, Jun 20, 2019 at 12:55 AM Tim Rühsen  wrote:

> On 6/17/19 10:32 PM, David Bodin wrote:
> > *wget --page-requisites --span-hosts --convert-links --adjust-extension
> > --execute robots=off --user-agent Mozilla
> > --random-wait
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> > <
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> >*
> >
> > This command above provides the following stats:
> >
> > Total wall clock time: 35s
> >
> > Downloaded: 248 files, 39M in 4.2s (9.36 MB/s)
> >
> > This website takes about 5 seconds to download and display all files on a
> > hard refresh in the browser.
> >
> > Why is the wall clock time *significantly longer* than the download time
> > and is there a way to make it faster?
>
> First of all, --random-wait waits 0.5 to 1.5 seconds after each
> downloaded page. Don't use it - there have been times when web servers
> blocked fast clients, but that shouldn't be the case today.
>
> Wget uses just one connection for downloading, no compression by
> default, no http/2.
>
> You can try Wget2 which uses as many parallel connections as you like,
> uses compression by default and http/2 if possible. Depending on the
> HTTP server, Wget2 is often 10x faster then Wget just with it's default
> settings.
>
> You find the latest Wget2 tarball at
> https://gnuwget.gitlab.io/wget2/wget2-latest.tar.gz.
>
> Instructions how to build at
> https://gitlab.com/gnuwget/wget2/blob/master/README.md
>
> Regards, Tim
>
>


[Bug-wget] Help: Why wget wall clock time much higher than download time?

2019-06-17 Thread David Bodin
*wget --page-requisites --span-hosts --convert-links --adjust-extension
--execute robots=off --user-agent Mozilla
--random-wait 
https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
*

This command above provides the following stats:

Total wall clock time: 35s

Downloaded: 248 files, 39M in 4.2s (9.36 MB/s)

This website takes about 5 seconds to download and display all files on a
hard refresh in the browser.

Why is the wall clock time *significantly longer* than the download time
and is there a way to make it faster?


Re: [Bug-wget] Help: 'wget --page-requisites' is slow

2019-06-15 Thread David Bodin
Sorry, just wanted to clarify that the time difference I'm seeing is a
direct comparison between the wget command and a hard refresh of the
webpage, so no caching to assist with page request times.

On Sat, Jun 15, 2019 at 3:49 PM David Bodin  wrote:

> Hello wget community,
>
> *Goal*
> My goal is to download a single webpage to be fully functional offline in
> the same time it takes a browser to request and show the page.
>
> *Problem*
> The following command downloads a page and makes it fully functional
> offline, but it takes approximately 35 seconds where the browser requests
> and shows the page in about 5 seconds. Can someone please help me
> understand why my *wget* command is taking *so much longer* and how I can
> make it faster? Or is there any locations or chat groups where I can seek
> help? Sincere thanks in advance for any help anyone can provide.
>
> *wget --page-requisites --span-hosts --convert-links --adjust-extension
> --execute robots=off --user-agent Mozilla --random-wait
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> <https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/>*
>
> *More info & attempted solutions*
>
>1. I removed '*--random-wait*' because I thought it might be adding
>time for each file request, but this did nothing.
>2. I thought the https protocol might slow it down with extra calls
>back and forth for each file so I added '*--no-check-certificate*',
>but this did nothing.
>3. I read there could be an issue with IPv6 so I added '*--inet4-only*',
>but this did nothing.
>4. I read the dns could slow things down so I added '*--no-dns-cache*',
>but this did nothing.
>5. I thought perhaps *wget* was downloading the assets sequentially
>one at a time so I tried to run multiple commands concurrently with between
>3 and 16 threads/processes by removing '*--convert-links*' adding '
>*--no-clobber*' in the hopes that with multiple files would be
>downloaded at the same time and after all files were downloaded that I
>could run the command again removing '*--no-clobber*' and '
>*--page-requisites*' and adding '*--convert-links*' to make it fully
>functional offline. but this did nothing. I also thought that multiple
>threads would speed things up because it would remove the latency of the
>https checks by doing multiple at a time, but I didn't observe this.
>6. I read an article about running the command as root user in case
>there were any limits on a given user, but this did nothing.
>
> Sincere thanks in advance, again,
> Dave
>


[Bug-wget] Help: 'wget --page-requisites' is slow

2019-06-15 Thread David Bodin
Hello wget community,

*Goal*
My goal is to download a single webpage to be fully functional offline in
the same time it takes a browser to request and show the page.

*Problem*
The following command downloads a page and makes it fully functional
offline, but it takes approximately 35 seconds where the browser requests
and shows the page in about 5 seconds. Can someone please help me
understand why my *wget* command is taking *so much longer* and how I can
make it faster? Or is there any locations or chat groups where I can seek
help? Sincere thanks in advance for any help anyone can provide.

*wget --page-requisites --span-hosts --convert-links --adjust-extension
--execute robots=off --user-agent Mozilla --random-wait
https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
*

*More info & attempted solutions*

   1. I removed '*--random-wait*' because I thought it might be adding time
   for each file request, but this did nothing.
   2. I thought the https protocol might slow it down with extra calls back
   and forth for each file so I added '*--no-check-certificate*', but this
   did nothing.
   3. I read there could be an issue with IPv6 so I added '*--inet4-only*',
   but this did nothing.
   4. I read the dns could slow things down so I added '*--no-dns-cache*',
   but this did nothing.
   5. I thought perhaps *wget* was downloading the assets sequentially one
   at a time so I tried to run multiple commands concurrently with between 3
   and 16 threads/processes by removing '*--convert-links*' adding '
   *--no-clobber*' in the hopes that with multiple files would be
   downloaded at the same time and after all files were downloaded that I
   could run the command again removing '*--no-clobber*' and '
   *--page-requisites*' and adding '*--convert-links*' to make it fully
   functional offline. but this did nothing. I also thought that multiple
   threads would speed things up because it would remove the latency of the
   https checks by doing multiple at a time, but I didn't observe this.
   6. I read an article about running the command as root user in case
   there were any limits on a given user, but this did nothing.

Sincere thanks in advance, again,
Dave