Peter,

OK, thanks for the concrete example. So these are out there in the wild, I'll 
add a special case to handle these.

Thanks,

Sven

> On 05 Feb 2015, at 20:03, PBKResearch <[email protected]> wrote:
> 
> Sven
> 
> I agree the '//' case is weird, I would never use it myself. However, my
> requirement is to be able to parse and dissect web pages, particularly
> Wikipedia and Wiktionary pages, and they use this construction all the time.
> Mostly it occurs in link tags in page headers. I think the reason is that
> the individual pages are in, for example, en.wiktionary.org, but shared
> resources are in bits.wikimedia.org; hence the 'relative' address is in
> effect a complete path (in which case why not put 'http:' in front and make
> it an absolute address?).
> 
> The problem arises in parsing with the Blanchard parser because it is
> designed as a validator, hence it follows up the links in the page header to
> make sure the resources exist. This is of no interest to me, I just want to
> get at the body of the page, but I carried on using it because it is very
> good at parsing the body. I had considered mutilating the parser by cutting
> out all the processing it does on link nodes. However, before that happened
> Monty pointed me to XMLHTMLParser and then Soup; these are just parsers, not
> validators, so as far as they are concerned the link addresses are just
> text.
> 
> As I said, I am pretty sure I shall abandon the Blanchard parser and use one
> of the two that Monty identified - probably Soup. Hence I can ignore this
> problem from now on. Whether you think '//' worth including is for you to
> decide. The only argument I can see is that it was handled by the now
> deprecated Url class, so in theory it could be used by someone still using
> Pharo 2 or earlier, who would find problems on updating to Pharo 3 or later.
> 
> Hope this helps
> 
> Peter Kenny
> 
> -----Original Message-----
> From: Sven Van Caekenberghe [mailto:[email protected]] 
> Sent: 05 February 2015 17:28
> To: PBKResearch
> Cc: monty; Pharo Development List
> Subject: Re: ZnUrl>>#withRelativeReference:
> 
> Peter,
> 
> Thanks for the feedback. (CC-ing the list)
> 
>> On 05 Feb 2015, at 18:16, PBKResearch <[email protected]> wrote:
>> 
>> Sven
>> 
>> Thanks for your efforts. I have tried ZnUrl>>#withRelativeReference: 
>> on the examples I gave in my e-mail of 11 Jan. Unfortunately it gives 
>> the same incorrect result as ZnUrl>>#inContextOf: in the case where 
>> the relative address begins with '//'. Admittedly this is a rather 
>> weird case, but RFC
>> 3986 does acknowledge its existence (see para 4.2) and it is dealt 
>> with correctly by the old Url class>># combine:withRelative: (in fact 
>> there is special coding for this case in HierarchicalUrl>># 
>> privateInitializeFromText:). (I am not sure whether the pseudo-code in 
>> RFC
>> 3986 sec 5 deals correctly with an initial '//'; it is not considered 
>> explicitly, but I could not follow all the ramifications of the case 
>> with initial '/'.)
> 
> I would like to understand why you need it, it seems very weird to me, it
> was one of the few cases that I decided not to implement:
> 
> In ZnUrlTests>>#testReferenceResolution
> 
>  " '//g' -> 'http://g'. " "we do not support relative network path
> references (4.2)"
> 
> In the RFC they say (page 26)
> 
> << 
>   A relative reference that begins with two slash characters is termed
>   a network-path reference; such references are rarely used.
>>> 
> 
> Could you please give a concrete example of how/why this is useful ?
> 
> Thx,
> 
> Sven
> 
>> I feel rather guilty that you have gone to so much trouble because, 
>> thanks to Monty, I now have two alternatives to the Blanchard parser 
>> (XMLHTMLParser and Soup). I shall pretty certainly be using one or 
>> other of these in future, in place of the Blanchard parser, because 
>> they provide more flexible ways of interrogating the resulting DOM - 
>> and also because they are actively maintained. So from my point of 
>> view there is now no need for you to pursue this any further - unless you
> see this as a loose end to be tidied up.
> 
> Not problem, I am trying to make it right.
> 
>> Thanks again
>> 
>> Peter Kenny
>> 
>> PS I can't post to the Pharo Development List, so I left that out of 
>> the addressee list.
>> 
>> -----Original Message-----
>> From: Sven Van Caekenberghe [mailto:[email protected]]
>> Sent: 05 February 2015 10:30
>> To: Pharo Development List
>> Cc: monty; [email protected]
>> Subject: ZnUrl>>#withRelativeReference:
>> 
>> Hi,
>> 
>> I added ZnUrl>>#withRelativeReference: which implements the process 
>> described in section 5 of RFC 3986.
>> 
>> https://pharo.fogbugz.com/f/cases/14855/Add-reference-resolution-to-Zn
>> Url
>> 
>> Summary:
>> 
>> In certain contexts (like links on a webpage) partial URLs are used 
>> that must be interpreted relative to a base URL (like the URL of the 
>> webpage itself).
>> 
>> Example:
>> 
>> 'http://www.site.com/static/html/home.html' asZnUrl
>>     withRelativeReference: '../js/menu.js'
>> 
>> => http://www.site.com/static/js/menu.js
>> 
>> This was previously not possible with ZnUrl. 
>> 
>> If you know this stuff, please have a look. 
>> 
>> Monty ? Peter ?
>> 
>> Sven
>> 
>> PS: this is in #bleedingEdge for now
>> 
> 


Reply via email to