Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-12 Thread Dale R. Worley
Paul Wagner  writes:
> That's what the OP thinks, too.  I attributed the slow startup to DNS 
> resolution.

Depending on your circumstances, one way to fix that is set up a local
caching-only DNS server.  Direct ordinary processes to use that.  Then
the first lookup is expensive, but the caching server saves the
resolution and answers later queries very quickly.

Dale



Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-11 Thread Paul Wagner

Dear all,

On 12.09.2018 03:51, wor...@alum.mit.edu wrote:

Tim Rühsen  writes:

Thanks for the pointer to coproc, never heard of it ;-) (That means I
never had a problem that needed coproc).

Anyways, copy the script results in a file '[1]' with bash 
4.4.23.


Yeah, I'm not surprised there are bugs in it.

Also, wget -i - waits with downloading until stdin has been closed. 
How

can you circumvent that ?


The more I think about the original problem, the more puzzled I am.  
The

OP said that starting wget for each URL took a long time.  But my
experience is that starting processes is quite quick.  (I once modified
tar to compress each file individually with gzip before writing it to 
an

Exabyte type.  On a much slower processor than modern processors, the
writing was not delayed by starting a process for each file written.)

I suspect the delay is not starting wget but establishing the initial
HTTP connection to the server.


That's what the OP thinks, too.  I attributed the slow startup to DNS 
resolution.



Probably a better approach to the problem is to download the files in
batches on N consecutive URLs, where N is large enough that the HTTP
startup time is well less than the total download time.  Process each
batch with a seperate invocation of wget, and exit the loop when an
attempted batch doesn't create any new downloaded files (or, the last
file in the batch doesn't exist), indicating there are no more files to
download.


Neat idea.  Finally, I solved it by estimating the number of chunks from 
the total running time and the duration of each chunk.  But thanks for 
giving it a thought!


Regards,

Paul




Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-11 Thread Dale R. Worley
Tim Rühsen  writes:
> Thanks for the pointer to coproc, never heard of it ;-) (That means I
> never had a problem that needed coproc).
>
> Anyways, copy the script results in a file '[1]' with bash 4.4.23.

Yeah, I'm not surprised there are bugs in it.

> Also, wget -i - waits with downloading until stdin has been closed. How
> can you circumvent that ?

The more I think about the original problem, the more puzzled I am.  The
OP said that starting wget for each URL took a long time.  But my
experience is that starting processes is quite quick.  (I once modified
tar to compress each file individually with gzip before writing it to an
Exabyte type.  On a much slower processor than modern processors, the
writing was not delayed by starting a process for each file written.)

I suspect the delay is not starting wget but establishing the initial
HTTP connection to the server.

Probably a better approach to the problem is to download the files in
batches on N consecutive URLs, where N is large enough that the HTTP
startup time is well less than the total download time.  Process each
batch with a seperate invocation of wget, and exit the loop when an
attempted batch doesn't create any new downloaded files (or, the last
file in the batch doesn't exist), indicating there are no more files to
download.

Dale



Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-11 Thread Tim Rühsen
On 9/11/18 5:34 AM, Dale R. Worley wrote:
> Paul Wagner  writes:
>> Now I tried
>>
>>{ i=1; while [[ $i != 100 ]]; do echo 
>> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 
>> -i -
>>
>> which works like a charm *as long as the 'generator process' is finite*, 
>> i.e. the loop is actually programmed as in the example.  The problem is 
>> that it would be much easier if I could let the loop run forever, let 
>> wget get whatever is there and then fail after the counter extends to a 
>> segment number not available anymore, which would in turn fail the whole 
>> pipe.
> 
> Good God, this finally motivates me to learn about Bash coprocesses.
> 
> I think the answer is something like this:
> 
> coproc wget -O foo.mp4 -i -
> 
> i=1
> while true
> do
> rm -f foo.mp4
> echo "http://domain.com/path/segment_$((i++)).mp4" >&$wget[1]
> sleep 5
> # The only way to test for non-existence of the URL is whether the
> # output file exists.
> [[ ! -e foo.mp4 ]] && break
> # Do whatever you already do to wait for foo.mp4 to be completed and
> # then use it.
> done
> 
> # Close wget's input.
> exec $wget[1]<&-
> # Wait for it to finish.
> wait $wget_pid
> 
> Dale

Thanks for the pointer to coproc, never heard of it ;-) (That means I
never had a problem that needed coproc).

Anyways, copy the script results in a file '[1]' with bash 4.4.23.

Also, wget -i - waits with downloading until stdin has been closed. How
can you circumvent that ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget in a 'dynamic' pipe

2018-09-10 Thread Dale R. Worley
Paul Wagner  writes:
> Now I tried
>
>{ i=1; while [[ $i != 100 ]]; do echo 
> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 
> -i -
>
> which works like a charm *as long as the 'generator process' is finite*, 
> i.e. the loop is actually programmed as in the example.  The problem is 
> that it would be much easier if I could let the loop run forever, let 
> wget get whatever is there and then fail after the counter extends to a 
> segment number not available anymore, which would in turn fail the whole 
> pipe.

Good God, this finally motivates me to learn about Bash coprocesses.

I think the answer is something like this:

coproc wget -O foo.mp4 -i -

i=1
while true
do
rm -f foo.mp4
echo "http://domain.com/path/segment_$((i++)).mp4" >&$wget[1]
sleep 5
# The only way to test for non-existence of the URL is whether the
# output file exists.
[[ ! -e foo.mp4 ]] && break
# Do whatever you already do to wait for foo.mp4 to be completed and
# then use it.
done

# Close wget's input.
exec $wget[1]<&-
# Wait for it to finish.
wait $wget_pid

Dale



Re: [Bug-wget] wget in a 'dynamic' pipe

2018-07-19 Thread Tim Rühsen


On 19.07.2018 17:24, Paul Wagner wrote:
> Dear wgetters,
> 
> apologies if this has been asked before.
> 
> I'm using wget to download DASH media files, i.e. a number of URLs in
> the form domain.com/path/segment_1.mp4, domain.com/path/segment_2.mp4,
> ..., which represent chunks of audio or video, and which are to be
> combined to form the whole programme.  I used to call individuall
> instances of wget for each chunk and combine them, which was dead slow. 
> Now I tried
> 
>   { i=1; while [[ $i != 100 ]]; do echo
> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4
> -i -
> 
> which works like a charm *as long as the 'generator process' is finite*,
> i.e. the loop is actually programmed as in the example.  The problem is
> that it would be much easier if I could let the loop run forever, let
> wget get whatever is there and then fail after the counter extends to a
> segment number not available anymore, which would in turn fail the whole
> pipe.  Turns out that
> 
>   { i=1; while true; do echo
> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4
> -i -
> 
> hangs in the sense that the first process loops forever while wget
> doesn't even bother to start retrieving.  Am I right assuming that wget
> waits until the file specified by -i is actually fully written?  Is
> there any way to change this behavour?
> 
> Any help appreciated.  (I'm using wget 1.19.1 under cygwin.)

Hi Paul,

Wget2 behaves like what you need. So you can run it with an endless loop
without wget2 hanging.

I should build under CygWin without problems, though my last test is a
while ago.

See https://gitlab.com/gnuwget/wget2

Latest tarball is
https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz

or latest git
git clone https://gitlab.com/gnuwget/wget2.git


Regards, Tim



signature.asc
Description: OpenPGP digital signature