Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-17 Thread Mouse
> | 5. Reference Resolution
> |
> |This section defines the process of resolving a URI reference
> |within a context that allows relative references so that the result
> |is a string matching the  syntax rule of Section 3.

> -- which doesn't really say *who* is supposed to be doing this, but I
> believe it's meant to be understood as 'whenever manipulating URIs'.

Whenever manipulating URIs *within a context that allows relative
references*.  That does not apply to the server as far as I can tell.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-16 Thread Alejandro Lieber
I just tried the very simple DILLO graphic browser, and it doesn't have 
this problem.


As I said before, this problem also occurs in the following pages:

https://m.phys.org

https://m.techxplore.com

I think this should be corrected in the new release of LYNX.

Alejandro Lieber

Rosario   Argentina



___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Bela Lubkin
Mouse wrote:

> 2396 does specifically say that
>
>URI that are hierarchical in nature use the slash "/" character for
>separating hierarchical components.  For some file systems, a "/"
>character (used to denote the hierarchical structure of a URI) is the
>delimiter used to construct a file name hierarchy, and thus the URI
>path will look similar to a file pathname.  This does NOT imply that
>the resource is a file or that the URI maps to an actual filesystem
>pathname.
>
> So speaking of /./ as "a reference to the current directory" is, at
> least, misleading; path components in URIs/URLs do not need to bear any
> relationship to directory structure anywhere.  I also have not found
> any indication that . or .. components are special in absolute
> URIs/URLs; again, perhaps that's just because I haven't found the right
> reference.

It looks like RFC3986 is the current state of the art, and specifically
https://tools.ietf.org/html/rfc3986#section-5.2.4 for this.  This is
part of section 5:

| 5. Reference Resolution
|
|This section defines the process of resolving a URI reference
|within a context that allows relative references so that the result
|is a string matching the  syntax rule of Section 3.

-- which doesn't really say *who* is supposed to be doing this, but I
believe it's meant to be understood as 'whenever manipulating URIs'.
That is, both the client (Lynx) & the server (Apache) should be
modifying '/./' => '/'.  Both are at fault.

The RFC never mentions HTTPS and uses HTTP all over the place, but I
think this is simply because HTTP is being used as a standard example
scheme, and URIs are meant to be uniform across schemes.

> So I think lynx is at fault for not handling relative path resolution
> correctly.  Depending on what I've failed to find, the webserver may
> also be at fault - does anyone have any pointers to the RFC(s) I've
> missed?

Does this suffice?

I add another quote from 3986 (sec. 1.2.3):

|It is often the case that a group or "tree" of documents has been
|constructed to serve a common purpose, wherein the vast majority
|of URI references in these documents point to resources within the
|tree rather than outside it.  Similarly, documents located at a
|particular site are much more likely to refer to other resources at
|that site than to resources at remote sites.  Relative referencing
|of URIs allows document trees to be partially independent of their
|location and access scheme.  For instance, it is possible for a
|single set of hypertext documents to be simultaneously accessible
|and traversable via each of the "file", "http", and "ftp" schemes
|if the documents refer to each other with relative references.
|Furthermore, such document trees can be moved, as a whole, without
|changing any of the relative references.

This seems to make it clear that (1) the designers of the whole concept
of 'URI schemes' are strongly thinking of them mapping to filesystems
and (2) that the really believe in the cross-scheme concordance of URIs.
So this applies to HTTPS whether or not HTTPS is mentioned or even
existed at the time of 3986 publication.

|A relative reference (Section 4.2) refers to a resource by
|describing the difference within a hierarchical name space between
|the reference context and the target URI.  The reference resolution
|algorithm, presented in Section 5, defines how such a reference is
|transformed to the target URI.

This bit *could* be taken as an oblique suggestion that only the client
(Lynx), who is composing the relative reference onto the base URI of the
source document, is responsible.  I don't believe it's meant that way.

|All URI references are parsed by generic syntax parsers when used.

-- this seems like a clumsy way of saying 'thou shalt run the
canonicalization code whenever operating on a URI'; '/./' should never
be present in the final output.  The next sentence reiterates the use of
that assumption:

|However, because hierarchical processing has no effect on an absolute
|URI used in a reference unless it contains one or more dot-segments
|(complete path segments of "." or "..", as described in Section 3.3),
|URI scheme specifications can define opaque identifiers by
|disallowing use of slash characters, question mark characters, and
|the URIs "scheme:." and "scheme:..".

>Bela<

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Mouse
Wearing my pedant hat...

>>> [... stuff about /./ in https: URLs...]
> However, their web server is grossly at fault.  '/./' in a URL is
> just a reference to the current directory;

Sometimes.

1738 2.1 says that

   In general, URLs are written as follows:

   :

   A URL contains the name of the scheme being used () followed
   by a colon and then a string (the ) whose
   interpretation depends on the scheme.

so I think this is wrong as stated; depending on the scheme, /./ may or
may not mean anything special.

1738 does not mention the https: scheme at all; of the http: scheme it
describes / only to say that

   Within the  and  components, "/", ";", "?" are
   reserved.  The "/" character may be used within HTTP to designate a
   hierarchical structure.

which is notably equivocal.  So far I have failed to find any spec for
the https: scheme; presumably I've just missed something.

2396 does specifically say that

   URI that are hierarchical in nature use the slash "/" character for
   separating hierarchical components.  For some file systems, a "/"
   character (used to denote the hierarchical structure of a URI) is the
   delimiter used to construct a file name hierarchy, and thus the URI
   path will look similar to a file pathname.  This does NOT imply that
   the resource is a file or that the URI maps to an actual filesystem
   pathname.

So speaking of /./ as "a reference to the current directory" is, at
least, misleading; path components in URIs/URLs do not need to bear any
relationship to directory structure anywhere.  I also have not found
any indication that . or .. components are special in absolute
URIs/URLs; again, perhaps that's just because I haven't found the right
reference.

However, the language cited upthread from 1808 also occurs in 2396, in
a slightly different form (5.2 item 6, which describes something very
much like UNIX-style path resolution; subitem c is especially close to
the upthread quote).  As I said, I haven't found anything about the
https: scheme, but, it seems reasonable to assume that its spec
(wherever it's hiding) specifies that it's hierarchical.

So I think lynx is at fault for not handling relative path resolution
correctly.  Depending on what I've failed to find, the webserver may
also be at fault - does anyone have any pointers to the RFC(s) I've
missed?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Bela Lubkin
Ian Collier wrote:

> The link is written (some syntax elided):
>
>  load more
>
> Most other browsers, including links, don't copy this dot into the URL
> when following the link, so they don't experience a problem.
>
> Is Lynx correct to copy the dot?  I think not.  According to RFC 1808
> in Section 4 where it describes an algorithm for resolving relative
> URLs, in Step 6:
>
>a) All occurrences of "./", where "." is a complete path
>   segment, are removed.
>
> This carries through into RFC 3986 where section 5.2.4 describes a
> "Remove Dot Segments" algorithm.

Lynx appears to be slightly at fault here.

However, their web server is grossly at fault.  '/./' in a URL is just a
reference to the current directory; www.foo.bar/baz/bletch.html should
be understood identically to www.foo.bar/./baz/./bletch.html.  I don't
remember ever having heard of an HTTP server which gets this wrong,
before today.  (Server claims to be running some unspecified version of
Apache.  I don't believe any version of Apache is ever likely to have
had this problem.  There may be a default rewrite rule that is always
present, and they've somehow managed to delete?)

The third (probably inadvertent) culprit is the web page / page author
itself.  IF one is in the unique circumstance of using an incompetent
HTTP server which chokes on '.' references in a path, one should
definitely avoid constructing such paths.  The reference should read:

  load more

==

For practical purposes, regardless of whose fault this is: having
arrived at https://m.medicalxpress.com/./page2.html (and received
a 404), you can fix the situation by hitting 'E' (edit current
page URL); arrow back far enough to erase a './', leaving only
https://m.medicalxpress.com/page2.html; hit Enter, and you're there.

>Bela<

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Alejandro Lieber

The problem appears when you enter to:

https://m.medicalxpress.com/

and then you click:

load more

The response is:
"The Page Cannot be Found

Sorry, the page you were looking for could not be found.

   The page might have been removed, had its name changed, or become 
temporarily unavailable."


Alejandro Lieber

Rosario   Argentina


On 14/8/19 04:52, russellb...@gmail.com wrote:

'In various versions of Lynx, I cannot open the following address:
'https://m.medicalxpress.com/page2.html'

Doesn't happen to me.  I've never seen lynx change a target
address.  Addressees redirect often.  What does lynx log?

russell bell

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Ian Collier
> In various versions of Lynx, I cannot open the following address:
> 'https://m.medicalxpress.com/page2.html'

As others have said, Lynx can open this page fine.  However, I think
the problem is the following:

If you browse https://m.medicalxpress.com/ in Lynx and find the
"load more" link, when you select this, Lynx tries to load the URL
https://m.medicalxpress.com/./page2.html
and the server says that this is not found.  The server does not
expect the dot to be present in the URL.

The link is written (some syntax elided):

 load more

Most other browsers, including links, don't copy this dot into the URL
when following the link, so they don't experience a problem.

Is Lynx correct to copy the dot?  I think not.  According to RFC 1808
in Section 4 where it describes an algorithm for resolving relative URLs,
in Step 6:

   a) All occurrences of "./", where "." is a complete path
  segment, are removed.

This carries through into RFC 3986 where section 5.2.4 describes a
"Remove Dot Segments" algorithm.

imc

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Larry Hynes via Lynx-dev
Alejandro Lieber  wrote:
> In various versions of Lynx, I cannot open the following address:
> 
> https://m.medicalxpress.com/page2.html
> 
> Lynx transforms this address adding ./ into:
> 
> https://m.medicalxpress.com/./page2.html
> 
> and the page cannot be found.
> 
> The same happens in:
> 
> https://m.phys.org/page2.html
> 
> https://m.techxplore.com/page2.html
> 
> This problem does not occur in Links and eLinks text browsers.

The address

https://m.medicalxpress.com/page2.html

opens fine for me in lynx here. A link on that page, titled "load
more", appears to add the './', leading to a 404. So I'm not sure
your problem is anything to do with anything lynx is doing.

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


[Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread russellbell

'In various versions of Lynx, I cannot open the following address:
'https://m.medicalxpress.com/page2.html'

Doesn't happen to me.  I've never seen lynx change a target
address.  Addressees redirect often.  What does lynx log?

russell bell

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


[Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html

2019-08-14 Thread Alejandro Lieber

In various versions of Lynx, I cannot open the following address:

https://m.medicalxpress.com/page2.html

Lynx transforms this address adding ./ into:

https://m.medicalxpress.com/./page2.html

and the page cannot be found.

The same happens in:

https://m.phys.org/page2.html

https://m.techxplore.com/page2.html

This problem does not occur in Links and eLinks text browsers.

Alejandro Lieber
Rosario  Argentina


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev