Re: [Bug-wget] Help: 'wget --page-requisites' is slow

2019-06-15 Thread David Bodin
Sorry, just wanted to clarify that the time difference I'm seeing is a
direct comparison between the wget command and a hard refresh of the
webpage, so no caching to assist with page request times.

On Sat, Jun 15, 2019 at 3:49 PM David Bodin  wrote:

> Hello wget community,
>
> *Goal*
> My goal is to download a single webpage to be fully functional offline in
> the same time it takes a browser to request and show the page.
>
> *Problem*
> The following command downloads a page and makes it fully functional
> offline, but it takes approximately 35 seconds where the browser requests
> and shows the page in about 5 seconds. Can someone please help me
> understand why my *wget* command is taking *so much longer* and how I can
> make it faster? Or is there any locations or chat groups where I can seek
> help? Sincere thanks in advance for any help anyone can provide.
>
> *wget --page-requisites --span-hosts --convert-links --adjust-extension
> --execute robots=off --user-agent Mozilla --random-wait
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> *
>
> *More info & attempted solutions*
>
>1. I removed '*--random-wait*' because I thought it might be adding
>time for each file request, but this did nothing.
>2. I thought the https protocol might slow it down with extra calls
>back and forth for each file so I added '*--no-check-certificate*',
>but this did nothing.
>3. I read there could be an issue with IPv6 so I added '*--inet4-only*',
>but this did nothing.
>4. I read the dns could slow things down so I added '*--no-dns-cache*',
>but this did nothing.
>5. I thought perhaps *wget* was downloading the assets sequentially
>one at a time so I tried to run multiple commands concurrently with between
>3 and 16 threads/processes by removing '*--convert-links*' adding '
>*--no-clobber*' in the hopes that with multiple files would be
>downloaded at the same time and after all files were downloaded that I
>could run the command again removing '*--no-clobber*' and '
>*--page-requisites*' and adding '*--convert-links*' to make it fully
>functional offline. but this did nothing. I also thought that multiple
>threads would speed things up because it would remove the latency of the
>https checks by doing multiple at a time, but I didn't observe this.
>6. I read an article about running the command as root user in case
>there were any limits on a given user, but this did nothing.
>
> Sincere thanks in advance, again,
> Dave
>


[Bug-wget] Help: 'wget --page-requisites' is slow

2019-06-15 Thread David Bodin
Hello wget community,

*Goal*
My goal is to download a single webpage to be fully functional offline in
the same time it takes a browser to request and show the page.

*Problem*
The following command downloads a page and makes it fully functional
offline, but it takes approximately 35 seconds where the browser requests
and shows the page in about 5 seconds. Can someone please help me
understand why my *wget* command is taking *so much longer* and how I can
make it faster? Or is there any locations or chat groups where I can seek
help? Sincere thanks in advance for any help anyone can provide.

*wget --page-requisites --span-hosts --convert-links --adjust-extension
--execute robots=off --user-agent Mozilla --random-wait
https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
*

*More info & attempted solutions*

   1. I removed '*--random-wait*' because I thought it might be adding time
   for each file request, but this did nothing.
   2. I thought the https protocol might slow it down with extra calls back
   and forth for each file so I added '*--no-check-certificate*', but this
   did nothing.
   3. I read there could be an issue with IPv6 so I added '*--inet4-only*',
   but this did nothing.
   4. I read the dns could slow things down so I added '*--no-dns-cache*',
   but this did nothing.
   5. I thought perhaps *wget* was downloading the assets sequentially one
   at a time so I tried to run multiple commands concurrently with between 3
   and 16 threads/processes by removing '*--convert-links*' adding '
   *--no-clobber*' in the hopes that with multiple files would be
   downloaded at the same time and after all files were downloaded that I
   could run the command again removing '*--no-clobber*' and '
   *--page-requisites*' and adding '*--convert-links*' to make it fully
   functional offline. but this did nothing. I also thought that multiple
   threads would speed things up because it would remove the latency of the
   https checks by doing multiple at a time, but I didn't observe this.
   6. I read an article about running the command as root user in case
   there were any limits on a given user, but this did nothing.

Sincere thanks in advance, again,
Dave


Re: [Bug-wget] --progress=dot:giga

2019-06-15 Thread Greg Knittl

Hi Tim,

I would find another column for the wall clock absolute estimated 
completion time useful as I often try to calculate this in my head.


I think I only care about the dots because they show that the download 
is proceeding normally. Wget already outputs a message if it retries. If 
there is a situation where the download is hanging but wget is not 
restarting it then I think I would prefer a message instead of trying to 
guess what is happening from dots not appearing. With sufficient 
messages in place it would be possible to skip the dots entirely. Or 
instead of dots output a brief message to give download status.


Whether there are dots or not, let the user select a time or per cent 
complete interval for status updates. I typically pipe wget progress 
output to a file. Usually I don't look at the output at all. In most 
cases it would be sufficient if wget output an initial row after say 1% 
complete and then I could signal wget if I want it to output another 
progress line.


If wget must have dots then instead of mega and giga options, perhaps 
let the user set the dot size and calibrate the left column units around 
that. If wget allowed per cent complete or time intervals that would set 
the dot size, although not likely to a round number.


Personally I'm fine with 32 dots per line. I find blocks of 8 easier to 
read (which I usually don't) than blocks of 10. But if anyone cares and 
has time to code it, perhaps an option to specify the dot cluster size 
and clusters per line.


thanks,
Greg

On 2019-06-12 2:58 p.m., Tim Rühsen wrote:

Hi Greg,

looking at it when downloading a large file, I have to agree.

A dynamic multiplier on the left would be nice (K->M->G).

I added this as a comment to https://gitlab.com/gnuwget/wget2/issues/342.

Thanks for the feedback.

Regards, Tim

On 12.06.19 06:19, Greg Knittl wrote:

Hi,

--progress=dot:giga on wget 1.17.1 outputs 32 dots per line where each
dot represents 1MB downloaded.

The cumulative total at the left of each line is in KB. I would find MB
easier to understand. It matches up with the dots and better matches the
scale of GigaByte files. I still often have download speeds measured in
KB/sec but I don't recall ever comparing download speed to the
cumulative total on the the left.

I would ask that you at least consider this for wget2. I personally
don't have any programs that read this column of wget output so I would
be fine if it changes in the current wget1 but I would understand if you
consider it an API that you don't want to break.

thanks,
Greg