Re: Wget: Adding a prefix to downloaded files?

2019-12-17 Thread Tim Rühsen
On 12/17/19 5:08 PM, Daniel Stenberg wrote:
> On Tue, 17 Dec 2019, Tim Rühsen wrote:
> 
>> wget has no parallel processing. Wget2 has.
> 
> This phrasing could use some clarity.
> 
> Does this mean that Wget2 is not Wget? Is 'Wget2' the name of a separate
> tool/project?
> 
> Or is Wget2 actually Wget version 2? Because if it is, then Wget has
> parallel processing, in version two...
> 
> I think it would help to have this clarified.

Good point.

Wget2 is Wget version 2 *and* an own tool. So both are installable in
parallel. Wget2 is a drop-in replacement and will be the successor of
Wget at some time. Both tools belong to the GNU Wget project.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: Wget: Adding a prefix to downloaded files?

2019-12-17 Thread Daniel Stenberg

On Tue, 17 Dec 2019, Tim Rühsen wrote:


wget has no parallel processing. Wget2 has.


This phrasing could use some clarity.

Does this mean that Wget2 is not Wget? Is 'Wget2' the name of a separate 
tool/project?


Or is Wget2 actually Wget version 2? Because if it is, then Wget has parallel 
processing, in version two...


I think it would help to have this clarified.

--

 / daniel.haxx.se


Re: Wget: Adding a prefix to downloaded files?

2019-12-17 Thread Tim Rühsen
Hi Michel,

wget has no parallel processing. Wget2 has.

Regards, Tim


On 12/17/19 12:56 PM, michel.kempene...@telenet.be wrote:
> Hi Tim, 
> 
> It seems completely logical that Wget --- or any application for that matter 
> --- works through an input list sequentially. 
> But the resulting order might depend upon whether Wget only handles a single 
> file at a time, or whether it is capable of processing several files in 
> parallel. 
> I suppose the answer is a single file only, as I cannot find anything about 
> parallel processing in the Manual. 
> But I wouldn't put money on it. 
> 
> On the other hand, I may have been tricked by the settings of Windows 
> Explorer when wondering if the file size had an impact. 
> Indeed, when I try to doublecheck this behavior, it turns out that the 
> downloads simply are executed too quickly to visually confirm sth. of the 
> kind! 
> 
> Also, you are right in pointing out that in fact the target directory is 
> "ruled" by local settings (e.g. a folder in Windows Explorer), including the 
> sort order, which can have a confusing effect. 
> Some further testing learned me that in this particular case I also needed to 
> change the time switch for DOS' DIR command. Indeed, 
> 
> DIR /O: D /T: C 
> 
> sorts files per D (ate), and uses the C (reation date) to do so. 
> (the default values being W (= last written) for the Date, and 
> "sort-of-alphabetically" if no O(rdering) switch is applied. See: 
> [ https://ss64.com/nt/dir.html | https://ss64.com/nt/dir.html ] 
> [ https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 | 
> https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 ] ) 
> 
> 
> Windows Explorer offers many more possibilities apart from its default values 
> ("Date Created" and/or "Date Modified", I'm not really sure). 
> See the following screen shot (if that's of any use; I'm not sure if this 
> forum persists them): 
> 
> 
> The problem being that Windows Explorer itself does not explain what they 
> mean... So in a sense they are useless. 
> That's not just a remark, when you know that the default "Date created" in 
> Windows Explorer does NOT give the same output as the (apparent) DOS 
> equivalent !! 
> 
> Idem for the other date types proposed by Windows Explorer: none of them 
> matches the output of the above DIR command... 
> ("Date acquired", "Date archived", "Date completed", "Date received", "Date 
> released", and "Date sent" are even empty) 
> Typical MS clumzyness, I guess. 
> 
> If you'd want a stance of the mess MS keeps making of Date/Time fields, have 
> a look here: 
> [ 
> https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from
>  | 
> https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from
>  ] 
> Apparently, their meaning changes between versions (Win7 or Win10), and even 
> among Win10 releases... Go figure! 
> 
> Nevertheless, thx to your feedback I've been able to confirm that indeed, 
> this is not a Wget issue. 
> I suppose I can use this info to work around Wget's missing option for a 
> prefix/counter. (which remains the bottom line and triggered this question in 
> the first place) 
> 
> PS: 
> The workaround you suggest, is of the same type as the other ones mentioned 
> before. 
> For yes, it could be done by calling Wget as often as there are images to 
> download, and (externally) adding a prefix (counter) for every single 
> download. 
> But any such workaround would miss out on the efficiency of feeding Wget with 
> a plain input txt file. 
> And I can only repeat that such a feature could ad some power to Wget, as it 
> would avoid cumbersome workarounds. 
> 
> Thx again for all the feedback received, 
> 
> MK 
> 
> 
> 
> Van: "Tim Rühsen"  
> Aan: "Michel Kempeneers" , "bug-wget" 
>  
> Verzonden: Vrijdag 13 december 2019 15:39:24 
> Onderwerp: Re: Wget: Adding a prefix to downloaded files? 
> 
> On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote: 
> 
> 
> Hi, 
> 
> I run into a particular problem when I'm trying to download a bunch of URLs I 
> grouped together in file "input.txt" like this: 
> 
> wget -nv -a log.txt -P .\Images\ -i input.txt 
> 
> Some of these files are huge, hence take a long time to download. 
> As a consequence, they will not appear in the same sorting order in the 
> download folder as int he input folder, and that's a problem, as this order 
> has its importance. 
> 
> 
> Since wget wo

RE: Wget: Adding a prefix to downloaded files?

2019-12-17 Thread michel . kempeneers
Hi Tim, 

It seems completely logical that Wget --- or any application for that matter 
--- works through an input list sequentially. 
But the resulting order might depend upon whether Wget only handles a single 
file at a time, or whether it is capable of processing several files in 
parallel. 
I suppose the answer is a single file only, as I cannot find anything about 
parallel processing in the Manual. 
But I wouldn't put money on it. 

On the other hand, I may have been tricked by the settings of Windows Explorer 
when wondering if the file size had an impact. 
Indeed, when I try to doublecheck this behavior, it turns out that the 
downloads simply are executed too quickly to visually confirm sth. of the kind! 

Also, you are right in pointing out that in fact the target directory is 
"ruled" by local settings (e.g. a folder in Windows Explorer), including the 
sort order, which can have a confusing effect. 
Some further testing learned me that in this particular case I also needed to 
change the time switch for DOS' DIR command. Indeed, 

DIR /O: D /T: C 

sorts files per D (ate), and uses the C (reation date) to do so. 
(the default values being W (= last written) for the Date, and 
"sort-of-alphabetically" if no O(rdering) switch is applied. See: 
[ https://ss64.com/nt/dir.html | https://ss64.com/nt/dir.html ] 
[ https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 | 
https://devblogs.microsoft.com/oldnewthing/20140304-00/?p=1603 ] ) 


Windows Explorer offers many more possibilities apart from its default values 
("Date Created" and/or "Date Modified", I'm not really sure). 
See the following screen shot (if that's of any use; I'm not sure if this forum 
persists them): 


The problem being that Windows Explorer itself does not explain what they 
mean... So in a sense they are useless. 
That's not just a remark, when you know that the default "Date created" in 
Windows Explorer does NOT give the same output as the (apparent) DOS equivalent 
!! 

Idem for the other date types proposed by Windows Explorer: none of them 
matches the output of the above DIR command... 
("Date acquired", "Date archived", "Date completed", "Date received", "Date 
released", and "Date sent" are even empty) 
Typical MS clumzyness, I guess. 

If you'd want a stance of the mess MS keeps making of Date/Time fields, have a 
look here: 
[ 
https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from
 | 
https://superuser.com/questions/147525/what-is-the-date-column-in-windows-7-explorer-it-matches-no-date-column-from
 ] 
Apparently, their meaning changes between versions (Win7 or Win10), and even 
among Win10 releases... Go figure! 

Nevertheless, thx to your feedback I've been able to confirm that indeed, this 
is not a Wget issue. 
I suppose I can use this info to work around Wget's missing option for a 
prefix/counter. (which remains the bottom line and triggered this question in 
the first place) 

PS: 
The workaround you suggest, is of the same type as the other ones mentioned 
before. 
For yes, it could be done by calling Wget as often as there are images to 
download, and (externally) adding a prefix (counter) for every single download. 
But any such workaround would miss out on the efficiency of feeding Wget with a 
plain input txt file. 
And I can only repeat that such a feature could ad some power to Wget, as it 
would avoid cumbersome workarounds. 

Thx again for all the feedback received, 

MK 



Van: "Tim Rühsen"  
Aan: "Michel Kempeneers" , "bug-wget" 
 
Verzonden: Vrijdag 13 december 2019 15:39:24 
Onderwerp: Re: Wget: Adding a prefix to downloaded files? 

On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote: 


Hi, 

I run into a particular problem when I'm trying to download a bunch of URLs I 
grouped together in file "input.txt" like this: 

wget -nv -a log.txt -P .\Images\ -i input.txt 

Some of these files are huge, hence take a long time to download. 
As a consequence, they will not appear in the same sorting order in the 
download folder as int he input folder, and that's a problem, as this order has 
its importance. 


Since wget works sequentially, why do you think the order of downloads 
has something to do with the file size ? 

If 'Images' is a fresh and empty directory *and* all files download OK, 
the order in the directory is the same as the order in input.txt. At 
least a sane file system should keep the order (is NTFS sane ?). 

Then, what is irritating: 'dir' or 'ls' tools like to use a certain sort 
order by default. E.g. here on GNU/Linux 'ls' orders the output files 
alphabetical by name. 'ls -rc' prints with a reverse order by creation 
time (oldest first, then newer files), which seems to be what you want. 

In short, wget likely is not your problem. Find out what 

Re: Wget: Adding a prefix to downloaded files?

2019-12-13 Thread Tim Rühsen
On 12/12/19 1:25 PM, michel.kempene...@telenet.be wrote:
> Hi, 
> 
> I run into a particular problem when I'm trying to download a bunch of URLs I 
> grouped together in file "input.txt" like this: 
> 
> wget -nv -a log.txt -P .\Images\ -i input.txt 
> 
> Some of these files are huge, hence take a long time to download. 
> As a consequence, they will not appear in the same sorting order in the 
> download folder as int he input folder, and that's a problem, as this order 
> has its importance. 

Since wget works sequentially, why do you think the order of downloads
has something to do with the file size ?

If 'Images' is a fresh and empty directory *and* all files download OK,
the order in the directory is the same as the order in input.txt. At
least a sane file system should keep the order (is NTFS sane ?).

Then, what is irritating: 'dir' or 'ls' tools like to use a certain sort
order by default. E.g. here on GNU/Linux 'ls' orders the output files
alphabetical by name. 'ls -rc' prints with a reverse order by creation
time (oldest first, then newer files), which seems to be what you want.

In short, wget likely is not your problem. Find out what it really is
and you can find a mitigation.

As a 'dump' work-around, save your files into a temp directory, then
move them to Images\ in the order of occurrence in input.txt.

Regards, Tim



signature.asc
Description: OpenPGP digital signature


RE: Wget: Adding a prefix to downloaded files?

2019-12-12 Thread michel . kempeneers
Richard, 

I take it that is a "no, impossible"? :-) 

Rest assured: 
if the order weren't essential, and if I there were an obvious workaround, I 
wouldn't have bothered asking. 
As far as I can tell, there is no logic in the source's sort-order. It's the 
way it is, and certainly not the result of some sorting algorithm. (if only it 
were!) 
And it's that very way I copied the URLs into my input file. 

I don't know sed, but the alternative you suggest would be to loop through the 
individual URLs, and prefixing every output file. 
I had considered that as well (it can be done with a basic batch file), but 
wanted to try first if the single Wget call via an input file is possible. 
As this sounds much more efficient, and supposedly is amso quicker. 

Thx for your suggestions anyway. 

M. 



Van: "Richard Thomas"  
Aan: "Michel Kempeneers"  
Verzonden: Donderdag 12 december 2019 18:15:04 
Onderwerp: Re: Wget: Adding a prefix to downloaded files? 

I'd look at why it's important you maintain the sort-order. Options 
would be to not require that. Or you could pre-sort the input folder so 
the output folder can be sorted with the same algorithm. Another option 
would be to generate the wget commands with your prefixes using 
something like sed. 



Re: Wget: Adding a prefix to downloaded files?

2019-12-12 Thread Richard Thomas
I'd look at why it's important you maintain the sort-order. Options 
would be to not require that. Or you could pre-sort the input folder so 
the output folder can be sorted with the same algorithm. Another option 
would be to generate the wget commands with your prefixes using 
something like sed.


On 12/12/2019 6:25 AM, michel.kempene...@telenet.be wrote:

Hi,

I run into a particular problem when I'm trying to download a bunch of 
URLs I grouped together in file "input.txt" like this:


wget -nv -a log.txt -P .\Images\ -i input.txt

Some of these files are huge, hence take a long time to download.
As a consequence, they will not appear in the same sorting order in 
the download folder as int he input folder, and that's a problem, as 
this order has its importance.


I tried working around this problem by sorting the downloaded files 
using the "Date Created" info in Windows Explorer (that's a field 
which is not displayed by default, but it can be added to the folder 
pane as an extra column)
But for reasons I don't understand, this order is also different --- 
maybe because Windows only considers a file as created when it is 
complete??

Hence, that's no solution either.

So I wonder:
is there a way to add a prefix (or even a counter) to a file when it 
is downloaded using Wget?
This would be a piece of cake if it could be done in the input file 
"input.txt", but obviously, these are external URLs, hence they cannot 
be touched.
Other tools like DownThemAll (aka "dTa"), the FireFox or Chrome 
extension which I use when a GUI can be used (as opposed to using a 
command line tool in a batch file), offers a prefix as a variable for 
the mask which can be applied to the names of the target files. This 
prefix is certainly not perfect (lacking a.o. some flexibility), but 
at least it allows to impose this desired order.

So, does Wget have a similar construct?
Or is there another solution which would preserve the source's file order?


M.