Thank you for the answer. I think I can see the problem, but I'm still convinced that it should be left to the data producers to standardize on one form, if they need to. It looks to me like a common problem that multiple URIs exist for representing the same entity. For example when combining different sources or different ontologies. I think data producers get around this either by agreeing on one particular URI, or by creating a new URI, or resorting to reasoning (owl:sameAs). But I would not expect the database to automatically change any URI, that's why this was surprising to me.
I think a very similar example is http:example.org/path/to/file http:/example.org/path/to/file http://example.org/path/to/file http:///example.org/path//to///file When importing these, Jena does not change them, and treats them as different URIs instead. I would expect this behaviour for every URI, unless "file:" needs to be treated differently. On Mon, 2025-01-06 at 21:05 +0000, Andy Seaborne wrote: > > > On 06/01/2025 19:14, zPlus wrote: > > > But we need one form for URI matching otherwise "file:/path" does not > > > match "file:///path" > > > > Why does Jena need to match "file:/path" and "file:///path"? Shouldn't it be > > left to the user to choose one form or the other in their data? > > There is no "right" answer for file: URLs. > > Having one normalized form means the same name is for data producer > (load database) and data consumer (SPARQL query) whether they write it > file:/ or file:/// or a mixture; or when multiple sources of data are > combined. And across operating systems. > > There isn't "the user". > > https://datatracker.ietf.org/doc/html/rfc8089.html#appendix-B > > """ > o A traditional file URI for a local file with an empty authority. > This is the most common format in use today. For example: > > * "file:///path/to/file" > """ > > And on Windows ... > > C:/path is the "C:" URI scheme. > > file:C:/path is going to be interpreted different on Windows and linux/Mac. > > The whole thing is messy. > > Andy >