Re: Wget: Adding a prefix to downloaded files?
On 12/17/19 5:08 PM, Daniel Stenberg wrote: > On Tue, 17 Dec 2019, Tim Rühsen wrote: > >> wget has no parallel processing. Wget2 has. > > This phrasing could use some clarity. > > Does this mean that Wget2 is not Wget? Is 'Wget2' the name of a separate > tool/project? > > Or is Wget2 actually Wget version 2? Because if it is, then Wget has > parallel processing, in version two... > > I think it would help to have this clarified. Good point. Wget2 is Wget version 2 *and* an own tool. So both are installable in parallel. Wget2 is a drop-in replacement and will be the successor of Wget at some time. Both tools belong to the GNU Wget project. Regards, Tim signature.asc Description: OpenPGP digital signature
Re: Wget: Adding a prefix to downloaded files?
On Tue, 17 Dec 2019, Tim Rühsen wrote: wget has no parallel processing. Wget2 has. This phrasing could use some clarity. Does this mean that Wget2 is not Wget? Is 'Wget2' the name of a separate tool/project? Or is Wget2 actually Wget version 2? Because if it is, then Wget has parallel processing, in version two... I think it would help to have this clarified. -- / daniel.haxx.se
Re: Wget: Adding a prefix to downloaded files?
Hi Michel, wget has no parallel processing. Wget2 has. Regards, Tim On 12/17/19 12:56 PM, michel.kempene...@telenet.be wrote: > Hi Tim, > > It seems completely logical that Wget --- or any application for that matter > --- works through an input list sequentially. > But the resulting order might depend upon whether Wget only handles a single > file at a time, or whether it is capable of processing several files in > parallel. > I suppose the answer is a single file only, as I cannot find anything about > parallel processing in the Manual. > But I wouldn't put money on it. > > On the other hand, I may have been tricked by the settings of Windows > Explorer when wondering if the file size had an impact. > Indeed, when I try to doublecheck this behavior, it turns out that the > downloads simply are executed too quickly to visually confirm sth. of the > kind! > > Also, you are right in pointing out that in fact the target directory is > "ruled" by local settings (e.g. a folder in Windows Explorer), including the > sort order, which can have a confusing effect. > Some further testing learned me that in this particular case I also needed to > change the time switch for DOS' DIR command. Indeed, > > DIR /O: D /T: C > > sorts files per D (ate), and uses the C (reation date) to do so. > (the default values being W (= last written) for the Date, and > "sort-of-alphabetically" if no O(rdering) switch is applied. See: > [ https://ss64.com/nt/dir.html | https://ss64.com/nt/dir.html ] > [ https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 | > https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 ] ) > > > Windows Explorer offers many more possibilities apart from its default values > ("Date Created" and/or "Date Modified", I'm not really sure). > See the following screen shot (if that's of any use; I'm not sure if this > forum persists them): > > > The problem being that Windows Explorer itself does not explain what they > mean... So in a sense they are useless. > That's not just a remark, when you know that the default "Date created" in > Windows Explorer does NOT give the same output as the (apparent) DOS > equivalent !! > > Idem for the other date types proposed by Windows Explorer: none of them > matches the output of the above DIR command... > ("Date acquired", "Date archived", "Date completed", "Date received", "Date > released", and "Date sent" are even empty) > Typical MS clumzyness, I guess. > > If you'd want a stance of the mess MS keeps making of Date/Time fields, have > a look here: > [ > https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from > | > https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from > ] > Apparently, their meaning changes between versions (Win7 or Win10), and even > among Win10 releases... Go figure! > > Nevertheless, thx to your feedback I've been able to confirm that indeed, > this is not a Wget issue. > I suppose I can use this info to work around Wget's missing option for a > prefix/counter. (which remains the bottom line and triggered this question in > the first place) > > PS: > The workaround you suggest, is of the same type as the other ones mentioned > before. > For yes, it could be done by calling Wget as often as there are images to > download, and (externally) adding a prefix (counter) for every single > download. > But any such workaround would miss out on the efficiency of feeding Wget with > a plain input txt file. > And I can only repeat that such a feature could ad some power to Wget, as it > would avoid cumbersome workarounds. > > Thx again for all the feedback received, > > MK > > > > Van: "Tim Rühsen" > Aan: "Michel Kempeneers" , "bug-wget" > > Verzonden: Vrijdag 13 december 2019 15:39:24 > Onderwerp: Re: Wget: Adding a prefix to downloaded files? > > On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote: > > > Hi, > > I run into a particular problem when I'm trying to download a bunch of URLs I > grouped together in file "input.txt" like this: > > wget -nv -a log.txt -P .\Images\ -i input.txt > > Some of these files are huge, hence take a long time to download. > As a consequence, they will not appear in the same sorting order in the > download folder as int he input folder, and that's a problem, as this order > has its importance. > > > Since wget wo
RE: Wget: Adding a prefix to downloaded files?
Hi Tim, It seems completely logical that Wget --- or any application for that matter --- works through an input list sequentially. But the resulting order might depend upon whether Wget only handles a single file at a time, or whether it is capable of processing several files in parallel. I suppose the answer is a single file only, as I cannot find anything about parallel processing in the Manual. But I wouldn't put money on it. On the other hand, I may have been tricked by the settings of Windows Explorer when wondering if the file size had an impact. Indeed, when I try to doublecheck this behavior, it turns out that the downloads simply are executed too quickly to visually confirm sth. of the kind! Also, you are right in pointing out that in fact the target directory is "ruled" by local settings (e.g. a folder in Windows Explorer), including the sort order, which can have a confusing effect. Some further testing learned me that in this particular case I also needed to change the time switch for DOS' DIR command. Indeed, DIR /O: D /T: C sorts files per D (ate), and uses the C (reation date) to do so. (the default values being W (= last written) for the Date, and "sort-of-alphabetically" if no O(rdering) switch is applied. See: [ https://ss64.com/nt/dir.html | https://ss64.com/nt/dir.html ] [ https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 | https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 ] ) Windows Explorer offers many more possibilities apart from its default values ("Date Created" and/or "Date Modified", I'm not really sure). See the following screen shot (if that's of any use; I'm not sure if this forum persists them): The problem being that Windows Explorer itself does not explain what they mean... So in a sense they are useless. That's not just a remark, when you know that the default "Date created" in Windows Explorer does NOT give the same output as the (apparent) DOS equivalent !! Idem for the other date types proposed by Windows Explorer: none of them matches the output of the above DIR command... ("Date acquired", "Date archived", "Date completed", "Date received", "Date released", and "Date sent" are even empty) Typical MS clumzyness, I guess. If you'd want a stance of the mess MS keeps making of Date/Time fields, have a look here: [ https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from | https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from ] Apparently, their meaning changes between versions (Win7 or Win10), and even among Win10 releases... Go figure! Nevertheless, thx to your feedback I've been able to confirm that indeed, this is not a Wget issue. I suppose I can use this info to work around Wget's missing option for a prefix/counter. (which remains the bottom line and triggered this question in the first place) PS: The workaround you suggest, is of the same type as the other ones mentioned before. For yes, it could be done by calling Wget as often as there are images to download, and (externally) adding a prefix (counter) for every single download. But any such workaround would miss out on the efficiency of feeding Wget with a plain input txt file. And I can only repeat that such a feature could ad some power to Wget, as it would avoid cumbersome workarounds. Thx again for all the feedback received, MK Van: "Tim Rühsen" Aan: "Michel Kempeneers" , "bug-wget" Verzonden: Vrijdag 13 december 2019 15:39:24 Onderwerp: Re: Wget: Adding a prefix to downloaded files? On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote: Hi, I run into a particular problem when I'm trying to download a bunch of URLs I grouped together in file "input.txt" like this: wget -nv -a log.txt -P .\Images\ -i input.txt Some of these files are huge, hence take a long time to download. As a consequence, they will not appear in the same sorting order in the download folder as int he input folder, and that's a problem, as this order has its importance. Since wget works sequentially, why do you think the order of downloads has something to do with the file size ? If 'Images' is a fresh and empty directory *and* all files download OK, the order in the directory is the same as the order in input.txt. At least a sane file system should keep the order (is NTFS sane ?). Then, what is irritating: 'dir' or 'ls' tools like to use a certain sort order by default. E.g. here on GNU/Linux 'ls' orders the output files alphabetical by name. 'ls -rc' prints with a reverse order by creation time (oldest first, then newer files), which seems to be what you want. In short, wget likely is not your problem. Find out what
Re: Wget: Adding a prefix to downloaded files?
On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote: > Hi, > > I run into a particular problem when I'm trying to download a bunch of URLs I > grouped together in file "input.txt" like this: > > wget -nv -a log.txt -P .\Images\ -i input.txt > > Some of these files are huge, hence take a long time to download. > As a consequence, they will not appear in the same sorting order in the > download folder as int he input folder, and that's a problem, as this order > has its importance. Since wget works sequentially, why do you think the order of downloads has something to do with the file size ? If 'Images' is a fresh and empty directory *and* all files download OK, the order in the directory is the same as the order in input.txt. At least a sane file system should keep the order (is NTFS sane ?). Then, what is irritating: 'dir' or 'ls' tools like to use a certain sort order by default. E.g. here on GNU/Linux 'ls' orders the output files alphabetical by name. 'ls -rc' prints with a reverse order by creation time (oldest first, then newer files), which seems to be what you want. In short, wget likely is not your problem. Find out what it really is and you can find a mitigation. As a 'dump' work-around, save your files into a temp directory, then move them to Images\ in the order of occurrence in input.txt. Regards, Tim signature.asc Description: OpenPGP digital signature
RE: Wget: Adding a prefix to downloaded files?
Richard, I take it that is a "no, impossible"? :-) Rest assured: if the order weren't essential, and if I there were an obvious workaround, I wouldn't have bothered asking. As far as I can tell, there is no logic in the source's sort-order. It's the way it is, and certainly not the result of some sorting algorithm. (if only it were!) And it's that very way I copied the URLs into my input file. I don't know sed, but the alternative you suggest would be to loop through the individual URLs, and prefixing every output file. I had considered that as well (it can be done with a basic batch file), but wanted to try first if the single Wget call via an input file is possible. As this sounds much more efficient, and supposedly is amso quicker. Thx for your suggestions anyway. M. Van: "Richard Thomas" Aan: "Michel Kempeneers" Verzonden: Donderdag 12 december 2019 18:15:04 Onderwerp: Re: Wget: Adding a prefix to downloaded files? I'd look at why it's important you maintain the sort-order. Options would be to not require that. Or you could pre-sort the input folder so the output folder can be sorted with the same algorithm. Another option would be to generate the wget commands with your prefixes using something like sed.
Re: Wget: Adding a prefix to downloaded files?
I'd look at why it's important you maintain the sort-order. Options would be to not require that. Or you could pre-sort the input folder so the output folder can be sorted with the same algorithm. Another option would be to generate the wget commands with your prefixes using something like sed. On 12/12/2019 6:25 AM, michel.kempene...@telenet.be wrote: Hi, I run into a particular problem when I'm trying to download a bunch of URLs I grouped together in file "input.txt" like this: wget -nv -a log.txt -P .\Images\ -i input.txt Some of these files are huge, hence take a long time to download. As a consequence, they will not appear in the same sorting order in the download folder as int he input folder, and that's a problem, as this order has its importance. I tried working around this problem by sorting the downloaded files using the "Date Created" info in Windows Explorer (that's a field which is not displayed by default, but it can be added to the folder pane as an extra column) But for reasons I don't understand, this order is also different --- maybe because Windows only considers a file as created when it is complete?? Hence, that's no solution either. So I wonder: is there a way to add a prefix (or even a counter) to a file when it is downloaded using Wget? This would be a piece of cake if it could be done in the input file "input.txt", but obviously, these are external URLs, hence they cannot be touched. Other tools like DownThemAll (aka "dTa"), the FireFox or Chrome extension which I use when a GUI can be used (as opposed to using a command line tool in a batch file), offers a prefix as a variable for the mask which can be applied to the names of the target files. This prefix is certainly not perfect (lacking a.o. some flexibility), but at least it allows to impose this desired order. So, does Wget have a similar construct? Or is there another solution which would preserve the source's file order? M.