Peter, OK, thanks for the concrete example. So these are out there in the wild, I'll add a special case to handle these.
Thanks, Sven > On 05 Feb 2015, at 20:03, PBKResearch <[email protected]> wrote: > > Sven > > I agree the '//' case is weird, I would never use it myself. However, my > requirement is to be able to parse and dissect web pages, particularly > Wikipedia and Wiktionary pages, and they use this construction all the time. > Mostly it occurs in link tags in page headers. I think the reason is that > the individual pages are in, for example, en.wiktionary.org, but shared > resources are in bits.wikimedia.org; hence the 'relative' address is in > effect a complete path (in which case why not put 'http:' in front and make > it an absolute address?). > > The problem arises in parsing with the Blanchard parser because it is > designed as a validator, hence it follows up the links in the page header to > make sure the resources exist. This is of no interest to me, I just want to > get at the body of the page, but I carried on using it because it is very > good at parsing the body. I had considered mutilating the parser by cutting > out all the processing it does on link nodes. However, before that happened > Monty pointed me to XMLHTMLParser and then Soup; these are just parsers, not > validators, so as far as they are concerned the link addresses are just > text. > > As I said, I am pretty sure I shall abandon the Blanchard parser and use one > of the two that Monty identified - probably Soup. Hence I can ignore this > problem from now on. Whether you think '//' worth including is for you to > decide. The only argument I can see is that it was handled by the now > deprecated Url class, so in theory it could be used by someone still using > Pharo 2 or earlier, who would find problems on updating to Pharo 3 or later. > > Hope this helps > > Peter Kenny > > -----Original Message----- > From: Sven Van Caekenberghe [mailto:[email protected]] > Sent: 05 February 2015 17:28 > To: PBKResearch > Cc: monty; Pharo Development List > Subject: Re: ZnUrl>>#withRelativeReference: > > Peter, > > Thanks for the feedback. (CC-ing the list) > >> On 05 Feb 2015, at 18:16, PBKResearch <[email protected]> wrote: >> >> Sven >> >> Thanks for your efforts. I have tried ZnUrl>>#withRelativeReference: >> on the examples I gave in my e-mail of 11 Jan. Unfortunately it gives >> the same incorrect result as ZnUrl>>#inContextOf: in the case where >> the relative address begins with '//'. Admittedly this is a rather >> weird case, but RFC >> 3986 does acknowledge its existence (see para 4.2) and it is dealt >> with correctly by the old Url class>># combine:withRelative: (in fact >> there is special coding for this case in HierarchicalUrl>># >> privateInitializeFromText:). (I am not sure whether the pseudo-code in >> RFC >> 3986 sec 5 deals correctly with an initial '//'; it is not considered >> explicitly, but I could not follow all the ramifications of the case >> with initial '/'.) > > I would like to understand why you need it, it seems very weird to me, it > was one of the few cases that I decided not to implement: > > In ZnUrlTests>>#testReferenceResolution > > " '//g' -> 'http://g'. " "we do not support relative network path > references (4.2)" > > In the RFC they say (page 26) > > << > A relative reference that begins with two slash characters is termed > a network-path reference; such references are rarely used. >>> > > Could you please give a concrete example of how/why this is useful ? > > Thx, > > Sven > >> I feel rather guilty that you have gone to so much trouble because, >> thanks to Monty, I now have two alternatives to the Blanchard parser >> (XMLHTMLParser and Soup). I shall pretty certainly be using one or >> other of these in future, in place of the Blanchard parser, because >> they provide more flexible ways of interrogating the resulting DOM - >> and also because they are actively maintained. So from my point of >> view there is now no need for you to pursue this any further - unless you > see this as a loose end to be tidied up. > > Not problem, I am trying to make it right. > >> Thanks again >> >> Peter Kenny >> >> PS I can't post to the Pharo Development List, so I left that out of >> the addressee list. >> >> -----Original Message----- >> From: Sven Van Caekenberghe [mailto:[email protected]] >> Sent: 05 February 2015 10:30 >> To: Pharo Development List >> Cc: monty; [email protected] >> Subject: ZnUrl>>#withRelativeReference: >> >> Hi, >> >> I added ZnUrl>>#withRelativeReference: which implements the process >> described in section 5 of RFC 3986. >> >> https://pharo.fogbugz.com/f/cases/14855/Add-reference-resolution-to-Zn >> Url >> >> Summary: >> >> In certain contexts (like links on a webpage) partial URLs are used >> that must be interpreted relative to a base URL (like the URL of the >> webpage itself). >> >> Example: >> >> 'http://www.site.com/static/html/home.html' asZnUrl >> withRelativeReference: '../js/menu.js' >> >> => http://www.site.com/static/js/menu.js >> >> This was previously not possible with ZnUrl. >> >> If you know this stuff, please have a look. >> >> Monty ? Peter ? >> >> Sven >> >> PS: this is in #bleedingEdge for now >> >
