Re: Enable strict IRI parsing in query parser?

Rob Vesse Wed, 17 Sep 2014 01:47:53 -0700

Closing the loop on this I can confirm that the specific example is invalid


RFC 7230 (HTTP 1.1) Section 2.7.1
(http://tools.ietf.org/html/rfc7230#section-2.7.1)

A sender MUST NOT generate an "http" URI with an empty host
   identifier.  A recipient that processes such a URI reference MUST
   reject it as invalid


So actually the IRI validator is quite correct in rejecting the example
URI because URIs of that form while permitted by the generic syntax and
not allowed by the specific scheme

Rob

On 16/09/2014 18:02, "Andy Seaborne" <[email protected]> wrote:

>On 16/09/14 08:47, Rob Vesse wrote:
>> Yes looks like email managed it a bit but you got the correct gist, a
>>IRI
>> with http:/ I.e. only a single slash followed by some further path
>> components
>>
>> If as you say this is a valid albeit unusual IRIe how come the IRI
>> validator rejects it?  Is it requiring that all IRIs be absolute?
>
>The IRI code has a bunch of things it detects.  The IRI factory is then
>set to decide what to treat as fatal errors and which to report as
>warnings but continue.
>
>The IRI validator prints out all errors and all warning IRI code
>reports.  It's set more verbose than other code.
>
>------
>Where are you parsing these queries?  App code? Fuseki?
>It might make sense to have relative URIs in some more circumstance
>default to at least logged warnings, and maybe as error.
>
>(actually: http:/foo is an absolute URI! All "absolute" means is does it
>have a scheme name.  As an http URI isn't incomplete (I'm not sure if
>there is a technical term for a "complete" HTTP URI with authority and
>path is)
>
>
>       URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
>
>       hier-part   = "//" authority path-abempty
>                   / path-absolute
>                   / path-rootless
>                   / path-empty
>
>and the example is case 2 : path-absolute, not case 1.
>
>       Andy
>
>
>>
>> Rob
>>
>> On 15/09/2014 19:25, "Andy Seaborne" <[email protected]> wrote:
>>
>>> On 15/09/14 11:25, Rob Vesse wrote:
>>>> Found one way of doing this:
>>>>
>>>> query.setBaseURI(new IRIResolver());
>>>>
>>>> However you have to do this in the setup of the parser before the
>>>>query
>>>> is
>>>> parsed which is not something your average user will have access to
>>>>and
>>>> setting it after parsing has happened has no effect.
>>>>
>>>> So how would an average user who is not customising the query parser
>>>> enable strict IRI parsing?
>>>>
>>>> Rob
>>>>
>>>> On 15/09/2014 10:31, "Rob Vesse" <[email protected]> wrote:
>>>>
>>>>> Is there an easy way to enable strict IRI parsing in the query
>>>>>parser?
>>>>>
>>>>> For example the following user query is accepted by ARQ:
>>>>>
>>>>> SELECT *
>>>>>      WHERE {
>>>>>        ?subject rdfs:subClassOf <http:/google.com <http://google.com/
>>>>> <http://google.com/>>> .
>>>>>      }
>>>
>>> (not sure if email has damaged that example)
>>>
>>>>>
>>>>> Note the incorrect URI, when put through the IRI validator at
>>>>> sparql.org
>>>>> ARQ produces the following:
>>>>>
>>>>> http:/google.com ==> http:/google.com
>>>>> <http:/google.com> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A
>>>>> component that is required by the scheme is missing.
>>>>>
>>>>> Is there any way to get this behaviour from the query parser?
>>>>>
>>>>> Rob
>>>>>
>>>
>>> http:/path is a valid URI - it's a rather odd one but the host name is
>>> optional and when resolved will be the host name of the base.
>>>
>>> It does occur for real on the web - e.g. https:/login swaps the
>>>protocol
>>> to https if you were using http: and it works whatever hostname you got
>>> to that page from.
>
>
>
>>>
>>>     Andy
>>>
>>>
>>>
>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

Re: Enable strict IRI parsing in query parser?

Reply via email to