Re: [Bug-wget] wget in a 'dynamic' pipe
Paul Wagner writes: > That's what the OP thinks, too. I attributed the slow startup to DNS > resolution. Depending on your circumstances, one way to fix that is set up a local caching-only DNS server. Direct ordinary processes to use that. Then the first lookup is expensive, but the caching server saves the resolution and answers later queries very quickly. Dale
Re: [Bug-wget] wget in a 'dynamic' pipe
Dear all, On 12.09.2018 03:51, wor...@alum.mit.edu wrote: Tim Rühsen writes: Thanks for the pointer to coproc, never heard of it ;-) (That means I never had a problem that needed coproc). Anyways, copy the script results in a file '[1]' with bash 4.4.23. Yeah, I'm not surprised there are bugs in it. Also, wget -i - waits with downloading until stdin has been closed. How can you circumvent that ? The more I think about the original problem, the more puzzled I am. The OP said that starting wget for each URL took a long time. But my experience is that starting processes is quite quick. (I once modified tar to compress each file individually with gzip before writing it to an Exabyte type. On a much slower processor than modern processors, the writing was not delayed by starting a process for each file written.) I suspect the delay is not starting wget but establishing the initial HTTP connection to the server. That's what the OP thinks, too. I attributed the slow startup to DNS resolution. Probably a better approach to the problem is to download the files in batches on N consecutive URLs, where N is large enough that the HTTP startup time is well less than the total download time. Process each batch with a seperate invocation of wget, and exit the loop when an attempted batch doesn't create any new downloaded files (or, the last file in the batch doesn't exist), indicating there are no more files to download. Neat idea. Finally, I solved it by estimating the number of chunks from the total running time and the duration of each chunk. But thanks for giving it a thought! Regards, Paul
Re: [Bug-wget] wget in a 'dynamic' pipe
Tim Rühsen writes: > Thanks for the pointer to coproc, never heard of it ;-) (That means I > never had a problem that needed coproc). > > Anyways, copy the script results in a file '[1]' with bash 4.4.23. Yeah, I'm not surprised there are bugs in it. > Also, wget -i - waits with downloading until stdin has been closed. How > can you circumvent that ? The more I think about the original problem, the more puzzled I am. The OP said that starting wget for each URL took a long time. But my experience is that starting processes is quite quick. (I once modified tar to compress each file individually with gzip before writing it to an Exabyte type. On a much slower processor than modern processors, the writing was not delayed by starting a process for each file written.) I suspect the delay is not starting wget but establishing the initial HTTP connection to the server. Probably a better approach to the problem is to download the files in batches on N consecutive URLs, where N is large enough that the HTTP startup time is well less than the total download time. Process each batch with a seperate invocation of wget, and exit the loop when an attempted batch doesn't create any new downloaded files (or, the last file in the batch doesn't exist), indicating there are no more files to download. Dale
Re: [Bug-wget] wget in a 'dynamic' pipe
On 9/11/18 5:34 AM, Dale R. Worley wrote: > Paul Wagner writes: >> Now I tried >> >>{ i=1; while [[ $i != 100 ]]; do echo >> "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 >> -i - >> >> which works like a charm *as long as the 'generator process' is finite*, >> i.e. the loop is actually programmed as in the example. The problem is >> that it would be much easier if I could let the loop run forever, let >> wget get whatever is there and then fail after the counter extends to a >> segment number not available anymore, which would in turn fail the whole >> pipe. > > Good God, this finally motivates me to learn about Bash coprocesses. > > I think the answer is something like this: > > coproc wget -O foo.mp4 -i - > > i=1 > while true > do > rm -f foo.mp4 > echo "http://domain.com/path/segment_$((i++)).mp4" >&$wget[1] > sleep 5 > # The only way to test for non-existence of the URL is whether the > # output file exists. > [[ ! -e foo.mp4 ]] && break > # Do whatever you already do to wait for foo.mp4 to be completed and > # then use it. > done > > # Close wget's input. > exec $wget[1]<&- > # Wait for it to finish. > wait $wget_pid > > Dale Thanks for the pointer to coproc, never heard of it ;-) (That means I never had a problem that needed coproc). Anyways, copy the script results in a file '[1]' with bash 4.4.23. Also, wget -i - waits with downloading until stdin has been closed. How can you circumvent that ? Regards, Tim signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] wget in a 'dynamic' pipe
Paul Wagner writes: > Now I tried > >{ i=1; while [[ $i != 100 ]]; do echo > "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 > -i - > > which works like a charm *as long as the 'generator process' is finite*, > i.e. the loop is actually programmed as in the example. The problem is > that it would be much easier if I could let the loop run forever, let > wget get whatever is there and then fail after the counter extends to a > segment number not available anymore, which would in turn fail the whole > pipe. Good God, this finally motivates me to learn about Bash coprocesses. I think the answer is something like this: coproc wget -O foo.mp4 -i - i=1 while true do rm -f foo.mp4 echo "http://domain.com/path/segment_$((i++)).mp4" >&$wget[1] sleep 5 # The only way to test for non-existence of the URL is whether the # output file exists. [[ ! -e foo.mp4 ]] && break # Do whatever you already do to wait for foo.mp4 to be completed and # then use it. done # Close wget's input. exec $wget[1]<&- # Wait for it to finish. wait $wget_pid Dale
Re: [Bug-wget] wget in a 'dynamic' pipe
On 19.07.2018 17:24, Paul Wagner wrote: > Dear wgetters, > > apologies if this has been asked before. > > I'm using wget to download DASH media files, i.e. a number of URLs in > the form domain.com/path/segment_1.mp4, domain.com/path/segment_2.mp4, > ..., which represent chunks of audio or video, and which are to be > combined to form the whole programme. I used to call individuall > instances of wget for each chunk and combine them, which was dead slow. > Now I tried > > { i=1; while [[ $i != 100 ]]; do echo > "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 > -i - > > which works like a charm *as long as the 'generator process' is finite*, > i.e. the loop is actually programmed as in the example. The problem is > that it would be much easier if I could let the loop run forever, let > wget get whatever is there and then fail after the counter extends to a > segment number not available anymore, which would in turn fail the whole > pipe. Turns out that > > { i=1; while true; do echo > "http://domain.com/path/segment_$((i++)).mp4"; done } | wget -O foo.mp4 > -i - > > hangs in the sense that the first process loops forever while wget > doesn't even bother to start retrieving. Am I right assuming that wget > waits until the file specified by -i is actually fully written? Is > there any way to change this behavour? > > Any help appreciated. (I'm using wget 1.19.1 under cygwin.) Hi Paul, Wget2 behaves like what you need. So you can run it with an endless loop without wget2 hanging. I should build under CygWin without problems, though my last test is a while ago. See https://gitlab.com/gnuwget/wget2 Latest tarball is https://alpha.gnu.org/gnu/wget/wget2-1.99.1.tar.gz or latest git git clone https://gitlab.com/gnuwget/wget2.git Regards, Tim signature.asc Description: OpenPGP digital signature