Michael Sperberg-McQueen has defined types that match
different flavours of URI in

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd

and

http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd

To see the way these complex regular expressions are constructed, view
these documents at the raw XML level using (for example) curl.

Michael Kay
Saxonica


On 19/12/2012 11:13, Benito van der Zander wrote:
Hi,

btw. has anyone a regular expression matching exactly the allowed anyURIs of XSD 1.0?

I tried to make one by translating the BNF in RFC 2396 and 2732 to regex, by having a regex for every token, and substituting them everywhere the token is used in the BNF.

But the resulting regex:

((((([a-zA-Z][a-zA-Z0-9+-.]*:)?((//(((([a-zA-Z0-9-_.!~*''();:&=+$,]|%[a-fA-F0-9]{2})*@)?((([a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])?.)*[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?.?)|([0-9]+(.[0-9]+){3})|\[(([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?|([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?::([0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4})*)?)(:[0-9]{1,3}(.[0-9]{1,3}){3})?\])(:[0-9]*)?)?|([a-zA-Z0-9-_.!~*''()$,;:@&=+]|%[a-fA-F0-9]{2})+)(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?)|(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)))|(([a-zA-Z0-9-_.!~*''();@&=+$,]|%[a-fA-F0-9]{2})+(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*(/([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*(;([a-zA-Z0-9-_.!~*''():@&=+$,]|%[a-fA-F0-9]{2})*)*)*)?))([?]([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*)?)|([a-zA-Z][a-zA-Z0-9+-.]*:([a-zA-Z0-9-_.!~*''();?:@&=+$,]|%[a-fA-F0-9]{2})(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2}))*))?(#(([;/?:@&=+$,\][a-zA-Z0-9-_.!~*''()]|%[a-fA-F0-9]{2})*))?

is just horrible.
(and it might not even work so well with unicode)

Benito


On 12/19/2012 10:31 AM, Michael Kay wrote:
The validation rules for xs:anyURI in the XSD 1.0 specification are notoriously troublesome, and it is not surprising that different implementors interpret them differently.

This is what XSD 1.0 says:

<quote>
The ·lexical space· of anyURI is finite-length character sequences which, when the algorithm defined in Section 5.4 of [XML Linking Language] is applied to them, result in strings which are legal URIs according to [RFC 2396], as amended by [RFC 2732].

Note: Spaces are, in principle, allowed in the ·lexical space· of anyURI, however, their use is highly discouraged (unless they are encoded by %20).
</quote>

The "Note" here suggests that Sedna is wrong to reject the value (it also suggests that your query is wrong to supply it, but that you should be able to get away with it).

The "algorithm" referred to in this rule is basically the escaping of special characters such as space.

Note that in XSD 1.1, the spec gives up trying to define what's valid in an xs:anyURI and what isn't - all strings are now valid in the lexical space of xs:anyURI.

Michael Kay
Saxonica

On 19/12/2012 09:11, Robby Pelssers wrote:
Hi all,


I tested following Xquery with Sedna and Zorba:

declare function local:getPipUri($id as xs:string) as xs:anyURI {
    xs:anyURI(concat("http://www.nxp.com/pip/";, $id))
};

local:getPipUri("CX24483 14LZ")


Sedna throws an exception:
2012/12/19 10:07:09 database query/update failed (SEDNA Message: ERROR FORG0001
Invalid value for cast/constructor.
Details: The value does not conform to the lexical constraints defined for the xs:anyURI type.
Query line: 6, column:4
)


http://www.zorba-xquery.com/html/demo happy returns "http://www.nxp.com/pip/CX24483 14LZ"

So how does the xs:anyURI cast work? Is the developer supposed to encode the String before passing it to xs:anyURI or is the anyURI function supposed to do this?

Thx in advance,
Robby

_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk


_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk




_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk




_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

Reply via email to