Re: Working with index.html

2022-05-23 Thread Great Zverre
Hi!

Thanks for your response!
First of all I have the following version of wget:
# wget --version
GNU Wget 1.20.3 built on linux-gnu.

To reproduce the issue could you please do the following commands (it will take 
a couple of minutes):
1. mkdir test
2. cd test
3. mkdir -r releases.hashicorp.com/consul/1.12.0
4. wget -w 10s -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/

I get the following output:
**
--2022-05-23 11:03:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’

releases.hashicorp.com/consul/index.html[ <=>   
   ]  
19.51K  --.-KB/sin 0s  

Last-modified header missing -- time-stamps turned off.
2022-05-23 11:03:18 (66.0 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]

Loading robots.txt; please ignore errors.
--2022-05-23 11:03:28--  https://releases.hashicorp.com/robots.txt
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 23 [text/plain]
Saving to: ‘releases.hashicorp.com/robots.txt’

releases.hashicorp.com/robots.txt   
100%[=>]
  23  --.-KB/sin 0s 

2022-05-23 11:03:28 (1.53 MB/s) - ‘releases.hashicorp.com/robots.txt’ saved 
[23/23]

--2022-05-23 11:03:38--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory

Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
^C
***
What is your output? Thank you!

> On 21 May 2022, at 12:10, Tim Rühsen  wrote:
> 
> Hi,
> 
> I can not reproduce this issue with wget 1.21.3 nor with current wget2.
> 
> Please make sure you use the latest version of wget.
> 
> Regards, Tim
> 
> On 16.05.22 18:39, Great Zverre wrote:
>> Hello guys!
>> I’m using wget to make a mirror of https://releases.hashicorp.com but I 
>> don’t want to make a full mirror, rather I’d like to have a mirror of 
>> certain “subfolders” of this site (e.g. terraform, consul etc.). So I do 
>> this using the following command:
>> wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
>> The problem is that at first I get the following result
>> **
>> $ wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
>> --2022-05-16 16:28:18--  https://releases.hashicorp.com/consul/
>> Resolving releases.hashicorp.com (releases.hashicorp.com)... 
>> 151.101.193.183, 151.101.129.183, 151.101.65.183, ...
>> Connecting to releases.hashicorp.com 
>> (releases.hashicorp.com)|151.101.193.183|:443... connected.
>> HTTP request sent, awaiting response...
>>   HTTP/1.1 200 OK
>>   Connection: keep-alive
>>   Content-Type: text/html
>>   ETag: TvHhjlva/+c=
>>   X-Api-Version: 0.1.2
>>   X-Request-Id: 8a74122b-c155-88ff-511e-8d0d93155b2e
>>   X-Amz-Cf-Pop: AMS50-C1
>>   X-Amz-Cf-Id: Pdzhym0uq3XXjsZ_PxS8xvkntM0IsSCQtakE2EvgwC0v0tYMPJwCzQ==
>>   Age: 61398
>>   Access-Control-Allow-Origin: *
>>   Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
>>   X-XSS-Protection: 1; mode=block
>>   X-Content-Type-Options: nosniff
>>   X-Frame-Options: sameorigin
>>   Accept-Ranges: bytes
>>   Date: Mon, 16 May 2022 16:28:18 GMT
>>   Vary: Origin, Accept-Encoding
>>   transfer-encoding: chunked
>> Length: unspecified [text/html]
>> Saving to: ‘releases.hashicorp.com/consul/index.html’
>> releases.hashicorp.com/consul/index.html[ <=>
>>  
>>  ]  19.51K  --.-KB/sin 0s
>> Last-modified header missing -- time-stamps turned off.
>> 2022-05-16 16:28:18 (45.4 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
>> saved [19979]
>> **
>> We can see that whatever is there at https://releases.hashicorp.com/consul/ 
>> gets saved to local releases.hashicorp.com/consul/index.html which is fine, 
>> exactly what I want. But when in comes to the first href from the 
>> releases.hashicorp.com/consul/index.html I get the following:
>> **
>> --2022-05-16 16:30:21--  https://rel

Working with index.html

2022-05-16 Thread Great Zverre
Hello guys!

I’m using wget to make a mirror of https://releases.hashicorp.com but I don’t 
want to make a full mirror, rather I’d like to have a mirror of certain 
“subfolders” of this site (e.g. terraform, consul etc.). So I do this using the 
following command:

wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/

The problem is that at first I get the following result

**
$ wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
--2022-05-16 16:28:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Connection: keep-alive
  Content-Type: text/html
  ETag: TvHhjlva/+c=
  X-Api-Version: 0.1.2
  X-Request-Id: 8a74122b-c155-88ff-511e-8d0d93155b2e
  X-Amz-Cf-Pop: AMS50-C1
  X-Amz-Cf-Id: Pdzhym0uq3XXjsZ_PxS8xvkntM0IsSCQtakE2EvgwC0v0tYMPJwCzQ==
  Age: 61398
  Access-Control-Allow-Origin: *
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-Frame-Options: sameorigin
  Accept-Ranges: bytes
  Date: Mon, 16 May 2022 16:28:18 GMT
  Vary: Origin, Accept-Encoding
  transfer-encoding: chunked
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’

releases.hashicorp.com/consul/index.html[ <=>   
   ]  
19.51K  --.-KB/sin 0s  

Last-modified header missing -- time-stamps turned off.
2022-05-16 16:28:18 (45.4 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]
**

We can see that whatever is there at https://releases.hashicorp.com/consul/ 
gets saved to local releases.hashicorp.com/consul/index.html which is fine, 
exactly what I want. But when in comes to the first href from the 
releases.hashicorp.com/consul/index.html I get the following:
**
--2022-05-16 16:30:21--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Connection: keep-alive
  Content-Type: text/html
  X-Api-Version: 0.1.2
  X-Request-Id: ca8c47f5-2e54-b09a-adde-6e8cf5e92d45
  ETag: 8p+ndCqEoYc=
  X-Amz-Cf-Pop: AMS50-C1
  X-Amz-Cf-Id: qA5XZEv2hZReEYoZD29GRsD_M6u76VLv6g-usgKJAzTCQm_SyWVFRA==
  Age: 27384
  Access-Control-Allow-Origin: *
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-Frame-Options: sameorigin
  Accept-Ranges: bytes
  Date: Mon, 16 May 2022 16:30:21 GMT
  Vary: Origin, Accept-Encoding
  transfer-encoding: chunked
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory

Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
**
We can see that it tries to save whatever is there at 
https://releases.hashicorp.com/consul/1.12.0 into 
releases.hashicorp.com/consul/1.12.0, not 
releases.hashicorp.com/consul/1.12.0/index.html as I would prefer.

The mind blowing fact is that it used to work well for me even couple of weeks 
before with the same invocation. It would produce index.html not only at the 
root but at the leaves as well. Definitely something has changed on the server 
but how can I address the issue? As it works currently it leaves me no option 
to maintain my mirror properly because without these index.htmls I simply can’t 
offer my mirror to my users.