Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
> | 5. Reference Resolution > | > |This section defines the process of resolving a URI reference > |within a context that allows relative references so that the result > |is a string matching the syntax rule of Section 3. > -- which doesn't really say *who* is supposed to be doing this, but I > believe it's meant to be understood as 'whenever manipulating URIs'. Whenever manipulating URIs *within a context that allows relative references*. That does not apply to the server as far as I can tell. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
I just tried the very simple DILLO graphic browser, and it doesn't have this problem. As I said before, this problem also occurs in the following pages: https://m.phys.org https://m.techxplore.com I think this should be corrected in the new release of LYNX. Alejandro Lieber Rosario Argentina ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
Mouse wrote: > 2396 does specifically say that > >URI that are hierarchical in nature use the slash "/" character for >separating hierarchical components. For some file systems, a "/" >character (used to denote the hierarchical structure of a URI) is the >delimiter used to construct a file name hierarchy, and thus the URI >path will look similar to a file pathname. This does NOT imply that >the resource is a file or that the URI maps to an actual filesystem >pathname. > > So speaking of /./ as "a reference to the current directory" is, at > least, misleading; path components in URIs/URLs do not need to bear any > relationship to directory structure anywhere. I also have not found > any indication that . or .. components are special in absolute > URIs/URLs; again, perhaps that's just because I haven't found the right > reference. It looks like RFC3986 is the current state of the art, and specifically https://tools.ietf.org/html/rfc3986#section-5.2.4 for this. This is part of section 5: | 5. Reference Resolution | |This section defines the process of resolving a URI reference |within a context that allows relative references so that the result |is a string matching the syntax rule of Section 3. -- which doesn't really say *who* is supposed to be doing this, but I believe it's meant to be understood as 'whenever manipulating URIs'. That is, both the client (Lynx) & the server (Apache) should be modifying '/./' => '/'. Both are at fault. The RFC never mentions HTTPS and uses HTTP all over the place, but I think this is simply because HTTP is being used as a standard example scheme, and URIs are meant to be uniform across schemes. > So I think lynx is at fault for not handling relative path resolution > correctly. Depending on what I've failed to find, the webserver may > also be at fault - does anyone have any pointers to the RFC(s) I've > missed? Does this suffice? I add another quote from 3986 (sec. 1.2.3): |It is often the case that a group or "tree" of documents has been |constructed to serve a common purpose, wherein the vast majority |of URI references in these documents point to resources within the |tree rather than outside it. Similarly, documents located at a |particular site are much more likely to refer to other resources at |that site than to resources at remote sites. Relative referencing |of URIs allows document trees to be partially independent of their |location and access scheme. For instance, it is possible for a |single set of hypertext documents to be simultaneously accessible |and traversable via each of the "file", "http", and "ftp" schemes |if the documents refer to each other with relative references. |Furthermore, such document trees can be moved, as a whole, without |changing any of the relative references. This seems to make it clear that (1) the designers of the whole concept of 'URI schemes' are strongly thinking of them mapping to filesystems and (2) that the really believe in the cross-scheme concordance of URIs. So this applies to HTTPS whether or not HTTPS is mentioned or even existed at the time of 3986 publication. |A relative reference (Section 4.2) refers to a resource by |describing the difference within a hierarchical name space between |the reference context and the target URI. The reference resolution |algorithm, presented in Section 5, defines how such a reference is |transformed to the target URI. This bit *could* be taken as an oblique suggestion that only the client (Lynx), who is composing the relative reference onto the base URI of the source document, is responsible. I don't believe it's meant that way. |All URI references are parsed by generic syntax parsers when used. -- this seems like a clumsy way of saying 'thou shalt run the canonicalization code whenever operating on a URI'; '/./' should never be present in the final output. The next sentence reiterates the use of that assumption: |However, because hierarchical processing has no effect on an absolute |URI used in a reference unless it contains one or more dot-segments |(complete path segments of "." or "..", as described in Section 3.3), |URI scheme specifications can define opaque identifiers by |disallowing use of slash characters, question mark characters, and |the URIs "scheme:." and "scheme:..". >Bela< ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
Wearing my pedant hat... >>> [... stuff about /./ in https: URLs...] > However, their web server is grossly at fault. '/./' in a URL is > just a reference to the current directory; Sometimes. 1738 2.1 says that In general, URLs are written as follows: : A URL contains the name of the scheme being used () followed by a colon and then a string (the ) whose interpretation depends on the scheme. so I think this is wrong as stated; depending on the scheme, /./ may or may not mean anything special. 1738 does not mention the https: scheme at all; of the http: scheme it describes / only to say that Within the and components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure. which is notably equivocal. So far I have failed to find any spec for the https: scheme; presumably I've just missed something. 2396 does specifically say that URI that are hierarchical in nature use the slash "/" character for separating hierarchical components. For some file systems, a "/" character (used to denote the hierarchical structure of a URI) is the delimiter used to construct a file name hierarchy, and thus the URI path will look similar to a file pathname. This does NOT imply that the resource is a file or that the URI maps to an actual filesystem pathname. So speaking of /./ as "a reference to the current directory" is, at least, misleading; path components in URIs/URLs do not need to bear any relationship to directory structure anywhere. I also have not found any indication that . or .. components are special in absolute URIs/URLs; again, perhaps that's just because I haven't found the right reference. However, the language cited upthread from 1808 also occurs in 2396, in a slightly different form (5.2 item 6, which describes something very much like UNIX-style path resolution; subitem c is especially close to the upthread quote). As I said, I haven't found anything about the https: scheme, but, it seems reasonable to assume that its spec (wherever it's hiding) specifies that it's hierarchical. So I think lynx is at fault for not handling relative path resolution correctly. Depending on what I've failed to find, the webserver may also be at fault - does anyone have any pointers to the RFC(s) I've missed? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
Ian Collier wrote: > The link is written (some syntax elided): > > load more > > Most other browsers, including links, don't copy this dot into the URL > when following the link, so they don't experience a problem. > > Is Lynx correct to copy the dot? I think not. According to RFC 1808 > in Section 4 where it describes an algorithm for resolving relative > URLs, in Step 6: > >a) All occurrences of "./", where "." is a complete path > segment, are removed. > > This carries through into RFC 3986 where section 5.2.4 describes a > "Remove Dot Segments" algorithm. Lynx appears to be slightly at fault here. However, their web server is grossly at fault. '/./' in a URL is just a reference to the current directory; www.foo.bar/baz/bletch.html should be understood identically to www.foo.bar/./baz/./bletch.html. I don't remember ever having heard of an HTTP server which gets this wrong, before today. (Server claims to be running some unspecified version of Apache. I don't believe any version of Apache is ever likely to have had this problem. There may be a default rewrite rule that is always present, and they've somehow managed to delete?) The third (probably inadvertent) culprit is the web page / page author itself. IF one is in the unique circumstance of using an incompetent HTTP server which chokes on '.' references in a path, one should definitely avoid constructing such paths. The reference should read: load more == For practical purposes, regardless of whose fault this is: having arrived at https://m.medicalxpress.com/./page2.html (and received a 404), you can fix the situation by hitting 'E' (edit current page URL); arrow back far enough to erase a './', leaving only https://m.medicalxpress.com/page2.html; hit Enter, and you're there. >Bela< ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
The problem appears when you enter to: https://m.medicalxpress.com/ and then you click: load more The response is: "The Page Cannot be Found Sorry, the page you were looking for could not be found. The page might have been removed, had its name changed, or become temporarily unavailable." Alejandro Lieber Rosario Argentina On 14/8/19 04:52, russellb...@gmail.com wrote: 'In various versions of Lynx, I cannot open the following address: 'https://m.medicalxpress.com/page2.html' Doesn't happen to me. I've never seen lynx change a target address. Addressees redirect often. What does lynx log? russell bell ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
> In various versions of Lynx, I cannot open the following address: > 'https://m.medicalxpress.com/page2.html' As others have said, Lynx can open this page fine. However, I think the problem is the following: If you browse https://m.medicalxpress.com/ in Lynx and find the "load more" link, when you select this, Lynx tries to load the URL https://m.medicalxpress.com/./page2.html and the server says that this is not found. The server does not expect the dot to be present in the URL. The link is written (some syntax elided): load more Most other browsers, including links, don't copy this dot into the URL when following the link, so they don't experience a problem. Is Lynx correct to copy the dot? I think not. According to RFC 1808 in Section 4 where it describes an algorithm for resolving relative URLs, in Step 6: a) All occurrences of "./", where "." is a complete path segment, are removed. This carries through into RFC 3986 where section 5.2.4 describes a "Remove Dot Segments" algorithm. imc ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
Alejandro Lieber wrote: > In various versions of Lynx, I cannot open the following address: > > https://m.medicalxpress.com/page2.html > > Lynx transforms this address adding ./ into: > > https://m.medicalxpress.com/./page2.html > > and the page cannot be found. > > The same happens in: > > https://m.phys.org/page2.html > > https://m.techxplore.com/page2.html > > This problem does not occur in Links and eLinks text browsers. The address https://m.medicalxpress.com/page2.html opens fine for me in lynx here. A link on that page, titled "load more", appears to add the './', leading to a 404. So I'm not sure your problem is anything to do with anything lynx is doing. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
[Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
'In various versions of Lynx, I cannot open the following address: 'https://m.medicalxpress.com/page2.html' Doesn't happen to me. I've never seen lynx change a target address. Addressees redirect often. What does lynx log? russell bell ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
[Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
In various versions of Lynx, I cannot open the following address: https://m.medicalxpress.com/page2.html Lynx transforms this address adding ./ into: https://m.medicalxpress.com/./page2.html and the page cannot be found. The same happens in: https://m.phys.org/page2.html https://m.techxplore.com/page2.html This problem does not occur in Links and eLinks text browsers. Alejandro Lieber Rosario Argentina ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev