Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 25/11/2023 13:47, Marco Neumann wrote:

I was looking for an IRI validator and this one didn't come up in the
search engines. This service might need a bit more visibility and some
incoming links.


It gets lost in all the code library "validators"



Marco

On Sat, Nov 25, 2023 at 1:34 PM Andy Seaborne  wrote:




On 24/11/2023 10:05, Marco Neumann wrote:

(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.


It's a hangover from XML QNames.

Turtle doesn't care.

Style-wise, yes, avoid an initial number.


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was

later

also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.


There's an online IRI validator.

https://sparql.org/iri-validator.html

using the jena-iri package.






Re: Querying URL with square brackets

2023-11-25 Thread Marco Neumann
I was looking for an IRI validator and this one didn't come up in the
search engines. This service might need a bit more visibility and some
incoming links.

Marco

On Sat, Nov 25, 2023 at 1:34 PM Andy Seaborne  wrote:

>
>
> On 24/11/2023 10:05, Marco Neumann wrote:
> > (side note) preferably the local name of a URI should not start with a
> > number but a letter or underscore.
>
> It's a hangover from XML QNames.
>
> Turtle doesn't care.
>
> Style-wise, yes, avoid an initial number.
>
> > What do you mean by human-readable here? For large technical systems it's
> > simply not feasible to encode meaning into the URI and I might even
> > consider it an anti-pattern.
> >
> > There are some community efforts that have introduced single letters and
> > number sequences for vocabulary development like CIDOC CRM which was
> later
> > also adopted by community projects like wikidata. But instance data
> > typically doesn't have that requirement and can be random but has to be
> > syntax compliant of course.
> >
> > I am sure Andy can elaborate on the details of the encoding here.
>
> There's an online IRI validator.
>
> https://sparql.org/iri-validator.html
>
> using the jena-iri package.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 10:05, Marco Neumann wrote:

(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.


It's a hangover from XML QNames.

Turtle doesn't care.

Style-wise, yes, avoid an initial number.


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was later
also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.


There's an online IRI validator.

https://sparql.org/iri-validator.html

using the jena-iri package.


Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 08:55, Marco Neumann wrote:

Laura, see jena issue #2102
https://github.com/apache/jena/issues/2102


It's specific to [].

Because data formats accept these bad URIs (with a warning), the fact 
SPARQL generates errors is a bug to be fixed.


Andy



Marco

On Fri, Nov 24, 2023 at 7:12 AM Laura Morales  wrote:


I have a few URLs containing square brackets like
http://example.org/foo[1]bar
I can create a TDB2 dataset without much problems, with warnings


Warnings exist for a reason!

>> but no errors.



I tried escaping, "foo\[1\]bar" but it doesn't work.


URIs don't accept \ escapes.

And U+ doesn't help because the check isn't just in the parser.



Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 10:40, Marco Neumann wrote:

The URI syntax is defined by the Internet Engineering Task Force (IETF) in
RFC 3986.

W3C RDF is just a rule-taker here ;)

https://datatracker.ietf.org/doc/html/rfc3986


We've drafted a non-normative section:

https://www.w3.org/TR/rdf12-concepts/#iri-abnf

which is all the RFCs we could find and adopting the current state of 
terminology.


Nowadays, URI and IRI are interchangeable. Only use in HTTP requests 
worries about ASCII vs UTF-8 and then only in old software. Use a 
toolkit and it'll sort it out.


Only the URI scheme name is restricted to A-Z.

   Andy



Marco

On Fri, Nov 24, 2023 at 10:36 AM Laura Morales  wrote:


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.


This is my problem. I do NOT want to encode any meaning into URLs, but I
do want them to be human readable simply because I) properties are URLs
too, 2) they can be used online, and 3) they are simpler to work with, for
example editing in a Turtle file or writing a query.

:alice :knows :bobvs:dsa7hdsahdsa782j :d93ifg75jgueeywu
:s93oeirugj290sjf

I can avoid [ entirely, but it rises the question of what other characters
I MUST avoid.


{} {}

You can use () but hierarchical names are better.

Be careful about ':' because it can't be in the first segment of a path 
of a relative URI (it looks like a scheme name).


Andy








RDF URI references [Was: Querying URL with square brackets]

2023-11-25 Thread Andy Seaborne
Another option is the HTTP query string - think of it as asking a 
question of resource "http://example.org/book;


Andy

On 24/11/2023 11:03, Martynas Jusevičius wrote:

On Fri, Nov 24, 2023 at 11:46 AM Laura Morales  wrote:



in the case that I want to use these URLs with a web browser.


I don't understand what the trouble with the above example is?


The problem with # is that browsers treat them as the start of a local 
reference. When you open http://example.org/book#1 the server only receives 
http://example.org/book. In other words it would be an error to create nodes 
for n different books (#1 #2 #3 #n) if my goal is also to use these URLs with a 
browser (for example if I want to show one page for every book). It's not a 
problem with Jena, it's a problem with the way browsers treat the fragment.


If you want a page for every book, don't use fragment URIs. Use
http://example.org/book/1 or http://example.org/book/1#this instead of
  http://example.org/book#1.


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
Martynas, I think you have to go way back in time to fully appreciate the
anchor reference and its "interference" with URI local names. :)

Fundamentally URIs as identifiers are not meant to be retrieved as such
Laura. So a web browser is not designed to follow the implicit "physical"
link of an identifier.

To "browse" URIs as identifiers only you need a RDF browser or plugin that
may dereference documents from objects for display as URLs.

Marco


On Fri, Nov 24, 2023 at 1:55 PM Martynas Jusevičius 
wrote:

> On Fri, Nov 24, 2023 at 12:50 PM Laura Morales  wrote:
> >
> > > If you want a page for every book, don't use fragment URIs. Use
> > > http://example.org/book/1 or http://example.org/book/1#this instead of
> > >  http://example.org/book#1.
> >
> > yes yes I agree with this. I only tried to present an example of yet
> another "quirk" between raw data and browsers (where this kind of data is
> supposed to be used).
>
> Still don't understand the problem :) http://example.org/book#1
> uniquely identifies a resource, but you'll need to get the whole
> http://example.org/book document to retrieve it. That's just how HTTP
> works.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Martynas Jusevičius
On Fri, Nov 24, 2023 at 12:50 PM Laura Morales  wrote:
>
> > If you want a page for every book, don't use fragment URIs. Use
> > http://example.org/book/1 or http://example.org/book/1#this instead of
> >  http://example.org/book#1.
>
> yes yes I agree with this. I only tried to present an example of yet another 
> "quirk" between raw data and browsers (where this kind of data is supposed to 
> be used).

Still don't understand the problem :) http://example.org/book#1
uniquely identifies a resource, but you'll need to get the whole
http://example.org/book document to retrieve it. That's just how HTTP
works.


Re: Querying URL with square brackets

2023-11-24 Thread Laura Morales
> If you want a page for every book, don't use fragment URIs. Use
> http://example.org/book/1 or http://example.org/book/1#this instead of
>  http://example.org/book#1.

yes yes I agree with this. I only tried to present an example of yet another 
"quirk" between raw data and browsers (where this kind of data is supposed to 
be used).


Re: Querying URL with square brackets

2023-11-24 Thread Martynas Jusevičius
On Fri, Nov 24, 2023 at 11:46 AM Laura Morales  wrote:
>
> > > in the case that I want to use these URLs with a web browser.
> >
> > I don't understand what the trouble with the above example is?
>
> The problem with # is that browsers treat them as the start of a local 
> reference. When you open http://example.org/book#1 the server only receives 
> http://example.org/book. In other words it would be an error to create nodes 
> for n different books (#1 #2 #3 #n) if my goal is also to use these URLs with 
> a browser (for example if I want to show one page for every book). It's not a 
> problem with Jena, it's a problem with the way browsers treat the fragment.

If you want a page for every book, don't use fragment URIs. Use
http://example.org/book/1 or http://example.org/book/1#this instead of
 http://example.org/book#1.


Re: Querying URL with square brackets

2023-11-24 Thread Laura Morales
> > in the case that I want to use these URLs with a web browser.
>
> I don't understand what the trouble with the above example is?

The problem with # is that browsers treat them as the start of a local 
reference. When you open http://example.org/book#1 the server only receives 
http://example.org/book. In other words it would be an error to create nodes 
for n different books (#1 #2 #3 #n) if my goal is also to use these URLs with a 
browser (for example if I want to show one page for every book). It's not a 
problem with Jena, it's a problem with the way browsers treat the fragment.


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
The URI syntax is defined by the Internet Engineering Task Force (IETF) in
RFC 3986.

W3C RDF is just a rule-taker here ;)

https://datatracker.ietf.org/doc/html/rfc3986

Marco

On Fri, Nov 24, 2023 at 10:36 AM Laura Morales  wrote:

> > What do you mean by human-readable here? For large technical systems it's
> > simply not feasible to encode meaning into the URI and I might even
> > consider it an anti-pattern.
>
> This is my problem. I do NOT want to encode any meaning into URLs, but I
> do want them to be human readable simply because I) properties are URLs
> too, 2) they can be used online, and 3) they are simpler to work with, for
> example editing in a Turtle file or writing a query.
>
> :alice :knows :bobvs:dsa7hdsahdsa782j :d93ifg75jgueeywu
> :s93oeirugj290sjf
>
> I can avoid [ entirely, but it rises the question of what other characters
> I MUST avoid.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Laura Morales
> What do you mean by human-readable here? For large technical systems it's
> simply not feasible to encode meaning into the URI and I might even
> consider it an anti-pattern.

This is my problem. I do NOT want to encode any meaning into URLs, but I do 
want them to be human readable simply because I) properties are URLs too, 2) 
they can be used online, and 3) they are simpler to work with, for example 
editing in a Turtle file or writing a query.

:alice :knows :bobvs:dsa7hdsahdsa782j :d93ifg75jgueeywu 
:s93oeirugj290sjf

I can avoid [ entirely, but it rises the question of what other characters I 
MUST avoid.


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.

What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was later
also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.




On Fri, Nov 24, 2023 at 9:31 AM Laura Morales  wrote:

> Thank you a lot. FILTER(STR(?id) = "...") works, as suggested by Andy. I
> do recognize though that it is a hack, and that URLs should probably not
> have a [.
>
> But now I have trouble understanding UTF8 addresses. I would use random
> alphanumeric URLs everywhere if I could, or I would %-encode everything.
> But nodes IDs (URLs) are supposed to be valid, human-readable URLs because
> they're used online. Jena, and browsers, work fine with IRIs (which are
> UTF8), but the way special characters are used is not the same. For example
> it's perfectly fine in my graph to have a URL fragment, such as
> http://example.org/foo#bar but these URLs are not usable with a browser
> because the fragment is a local reference (local to the browser) that is
> not sent to the server. Which means in practice, that if I want to stay out
> of trouble I should not create a graph with IDs
>
> http://example.org/book#1
> http://example.org/book#2
> http://example.org/book#3
>
> in the case that I want to use these URLs with a web browser. Viceversa,
> browsers are perfectly fine with a [ in the path, but Jena is stricter.
>
> So, if I want to use UTF8 addresses (IRIs) in my graph, and if I don't
> want to %-encode them because I want them to be human-readbale (also
> because they are much easier to read/edit manually), what is the list of
> characters that MUST be %-encoded?
>
>
> > Sent: Friday, November 24, 2023 at 9:55 AM
> > From: "Marco Neumann" 
> > To: users@jena.apache.org
> > Subject: Re: Querying URL with square brackets
> >
> > Laura, see jena issue #2102
> > https://github.com/apache/jena/issues/2102
> >
> > Marco
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Martynas Jusevičius
On Fri, Nov 24, 2023 at 10:31 AM Laura Morales  wrote:
>
> Thank you a lot. FILTER(STR(?id) = "...") works, as suggested by Andy. I do 
> recognize though that it is a hack, and that URLs should probably not have a 
> [.
>
> But now I have trouble understanding UTF8 addresses. I would use random 
> alphanumeric URLs everywhere if I could, or I would %-encode everything. But 
> nodes IDs (URLs) are supposed to be valid, human-readable URLs because 
> they're used online. Jena, and browsers, work fine with IRIs (which are 
> UTF8), but the way special characters are used is not the same. For example 
> it's perfectly fine in my graph to have a URL fragment, such as 
> http://example.org/foo#bar but these URLs are not usable with a browser 
> because the fragment is a local reference (local to the browser) that is not 
> sent to the server. Which means in practice, that if I want to stay out of 
> trouble I should not create a graph with IDs
>
> http://example.org/book#1
> http://example.org/book#2
> http://example.org/book#3
>
> in the case that I want to use these URLs with a web browser.

I don't understand what the trouble with the above example is?

> Viceversa, browsers are perfectly fine with a [ in the path, but Jena is 
> stricter.

It's not Jena that's stricter, it's the standard specifications. Or
you can say browsers are too lax. They use their own WHATWG URL
"specification".
Sometimes the URL you see in the address bar is not the actual URL
being sent to the server.

>
> So, if I want to use UTF8 addresses (IRIs) in my graph, and if I don't want 
> to %-encode them because I want them to be human-readbale (also because they 
> are much easier to read/edit manually), what is the list of characters that 
> MUST be %-encoded?
>
>
> > Sent: Friday, November 24, 2023 at 9:55 AM
> > From: "Marco Neumann" 
> > To: users@jena.apache.org
> > Subject: Re: Querying URL with square brackets
> >
> > Laura, see jena issue #2102
> > https://github.com/apache/jena/issues/2102
> >
> > Marco


Re: Querying URL with square brackets

2023-11-24 Thread Laura Morales
Thank you a lot. FILTER(STR(?id) = "...") works, as suggested by Andy. I do 
recognize though that it is a hack, and that URLs should probably not have a [.

But now I have trouble understanding UTF8 addresses. I would use random 
alphanumeric URLs everywhere if I could, or I would %-encode everything. But 
nodes IDs (URLs) are supposed to be valid, human-readable URLs because they're 
used online. Jena, and browsers, work fine with IRIs (which are UTF8), but the 
way special characters are used is not the same. For example it's perfectly 
fine in my graph to have a URL fragment, such as http://example.org/foo#bar but 
these URLs are not usable with a browser because the fragment is a local 
reference (local to the browser) that is not sent to the server. Which means in 
practice, that if I want to stay out of trouble I should not create a graph 
with IDs

http://example.org/book#1
http://example.org/book#2
http://example.org/book#3

in the case that I want to use these URLs with a web browser. Viceversa, 
browsers are perfectly fine with a [ in the path, but Jena is stricter.

So, if I want to use UTF8 addresses (IRIs) in my graph, and if I don't want to 
%-encode them because I want them to be human-readbale (also because they are 
much easier to read/edit manually), what is the list of characters that MUST be 
%-encoded?


> Sent: Friday, November 24, 2023 at 9:55 AM
> From: "Marco Neumann" 
> To: users@jena.apache.org
> Subject: Re: Querying URL with square brackets
>
> Laura, see jena issue #2102
> https://github.com/apache/jena/issues/2102
>
> Marco


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
Laura, see jena issue #2102
https://github.com/apache/jena/issues/2102

Marco

On Fri, Nov 24, 2023 at 7:12 AM Laura Morales  wrote:

> I have a few URLs containing square brackets like
> http://example.org/foo[1]bar
> I can create a TDB2 dataset without much problems, with warnings but no
> errors. I can also query these nodes "indirectly", that is if I query them
> by some property and not by URI. My problem is that I cannot query them
> directly by URI. As soon as I try to use the URIs explicitly in a query,
> for example "DESCRIBE ", I receive this
> error
>
> ERROR SPARQL  :: [line: 1, col: 10] Bad IRI: '
> http://example.org/foo[1]bar':  Code:
> 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for
> URIs/IRIs.
>
> I tried escaping, "foo\[1\]bar" but it doesn't work.
> I tried converting from a string, FILTER(?id = URI("
> http://example.org/foo[1]bar;)) but it doesn't work
> What else could I try?
>


-- 


---
Marco Neumann


Querying URL with square brackets

2023-11-23 Thread Laura Morales
I have a few URLs containing square brackets like http://example.org/foo[1]bar
I can create a TDB2 dataset without much problems, with warnings but no errors. 
I can also query these nodes "indirectly", that is if I query them by some 
property and not by URI. My problem is that I cannot query them directly by 
URI. As soon as I try to use the URIs explicitly in a query, for example 
"DESCRIBE ", I receive this error

ERROR SPARQL  :: [line: 1, col: 10] Bad IRI: 
'http://example.org/foo[1]bar':  Code: 
0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for 
URIs/IRIs.

I tried escaping, "foo\[1\]bar" but it doesn't work.
I tried converting from a string, FILTER(?id = 
URI("http://example.org/foo[1]bar;)) but it doesn't work
What else could I try?