URL:
  <http://savannah.gnu.org/bugs/?50320>

                 Summary: Bad link conversion with mixed HTTP/HTTPS content
plus --mirror --adjust-extension
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Wed 15 Feb 2017 06:08:54 PM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: Thomas Claveirole
        Originator Email: thomas.claveirole@green-communications.f
             Open/Closed: Open
         Discussion Lock: Any
                 Release: trunk
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

Hello,

When I setup a local web server to provide :

<!DOCTYPE html>
<html>
  <head>
    <title>Wget test</title>
  </head>
  <body>
    <script src="http://localhost/wget-test/script.js?foo=bar";></script>
    <script src="https://localhost/wget-test/script.js?foo=bar";></script>
  </body>
</html>

when requesting /wget-test/, either as HTTP or HTTPS, as well as a
/wget-test/script.js resource (regardless of the scheme and query string; the
content of this file is irrelevant).

Then,

wget --mirror --adjust-extension --convert-links http://localhost/wget-test/
rewrites the script links as follows:

<!DOCTYPE html>
<html>
  <head>
    <title>Wget test</title>
  </head>
  <body>
    <script src="script.js%3Ffoo=bar"></script>
    <script src="script.js%3Ffoo=bar.html"></script>
  </body>
</html>

Note that the second link has an incorrect .html suffix appended.  On the
filesystem, the downloaded file does not have this suffix, so the link is
broken.  I guess the correct behavior should be not to append the .html
suffix, but I am unsure whether two URLs that differ only in scheme (http://
vs. https://) should be considered the same resource and rewritten to point to
the same location.

(This test case was derived from trying to mirror a much bigger site and it
took me some time to pinpoint the issue.  The bug also arises when multiple
pages from the website link to the same resource using mixed http and https
schemes -- which is a more realistic scenario.)

Looking at the bug tracker, I get the feeling that this bug might be related
to #50173 and #25340, but this is unclear to me.

Find attached a debug log for :
wget -o wget.log --debug --no-check-certificate --mirror --adjust-extension
--convert-links http://localhost/wget-test/ 
with my setup.

Regards,
Thomas Claveirole



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Wed 15 Feb 2017 06:08:54 PM UTC  Name: wget.log  Size: 9kB   By: None

<http://savannah.gnu.org/bugs/download.php?file_id=39762>

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50320>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


Reply via email to