Re: [Virtuoso-users] backslash character within URI

Florian Kleedorfer Thu, 23 Sep 2010 12:08:31 +0000

 Hi Patrick,

On 22.9.2010 21:01, Patrick van Kleef wrote:

HI Florian,

I did some semantic web crawling and I gathered information on football
clubs from freebase and dbpedia.
In the dbpedia data, there is an individual that's equivalent
(owl:sameAs) http://dbpedia.org/resource/Chelsea_F.C. , namely
http://dbpedia.org/resource/Chelse_FC\
(note the backslash at the end of the URI)

When I use the sesame 2 API to access the virtuoso RDF store (directly
via the java repository), to collect all the owl:sameAs triples for
http://dbpedia.org/resource/Chelsea_F.C. like so

st = con.getStatements((Resource) ind, OWL.SAMEAS, null, true,
getContexts());

I get this error:

sparql select *
from named<http://myhost.com/ns/graph/linkeddata>
from named<http://myhost.com/ns/graph/soccer>
from named<http://myhost.com/ns/graph/tennis>
from named<http://myhost.com/ns/graph/entertainment-dbpedia>
from named<http://myhost.com/ns/common#inferred>
from named<http://myhost.com/ns/graph/dbpedia-ontology>
where {
  graph ?g

{<http://dbpedia.org/resource/Chelse_FC\><http://www.w3.org/2002/07/owl#sameAs> ?o }

[virtuoso.jdbc3.VirtuosoException: SQ074: Line 1: syntax error at '\'before 'SELECT']


I googled a bit and found these explanations:
http://docs.openlinksw.com/virtuoso/dbadm.html , specifically, under
6.1.9.1.9 [Client]:

SQL_NO_CHAR_C_ESCAPE=1

and gave it a shot (setting the conf var to 1), but the result was just
another error:

Could not open RepositoryConnection for transaction; nestedexception is org.openrdf.repository.RepositoryException:virtuoso.jdbc3.VirtuosoException: Not using UTF-8 encoding of SQLstatements, but processing character escapes also disabled



I also tried setting
SQL_UTF8_EXECS = 1
in spite of reading in
http://docs.openlinksw.com/virtuoso/wideidentifiers.html that this may
make the whole content of my database unreadable - luckily, It didn't.
The error, however, prevailed.

Do I just need to re-populate my database with the new SQL_UTF8_EXECS =
1 setting or is it something else?

You cannot use a plain \ character in a URI, you need to urlencode itlike so:


sparql select *
from named<http://myhost.com/ns/graph/linkeddata>
from named<http://myhost.com/ns/graph/soccer>
from named<http://myhost.com/ns/graph/tennis>
from named<http://myhost.com/ns/graph/entertainment-dbpedia>
from named<http://myhost.com/ns/common#inferred>
from named<http://myhost.com/ns/graph/dbpedia-ontology>
where {
  graph ?g

{<http://dbpedia.org/resource/Chelse_FC%5C><http://www.w3.org/2002/07/owl#sameAs> ?o }

  }

Just like when you would use a URL your browser like:

   http://dbpedia.org/page/Chelse_FC%5C


See also: http://en.wikipedia.org/wiki/Percent-encoding

The triple containing the offending URI was crawled by virtuoso, comesfrom freebase ( http://rdf.freebase.com/rdf/en.chelsea_fc ), and the URIcan be found on dbpedia as well, so in fact it's a dbpedia bug, correct?(Btw, I assume that they already know about the problem from this mailhttp://www.mail-archive.com/[email protected]/msg00561.html)

However, shouldn't virtuoso's crawler reject such URIs when encounteredduring the crawling process so as to keep such bugs from spreading intovirtuoso-based LOD applications?

As for your suggestion of percent-escaping the backslash: Honestly Idon't know how I'd do that.


In the line that causes the error,

st = con.getStatements((Resource) ind, OWL.SAMEAS, null, true,

getContexts());

ind is an object of class org.openrdf.model.URI and it holds the valuehttp://dbpedia.org/resource/Chelse_FC\ (I checked via the threaddebugger, the string ends with a single backslash); con is aRepositoryConnection object obtained from thevirtuoso.sesame2.driver.VirtuosoRepository instance.The URI is fetched from the repository in a previous execution of theabove statement in a function that loads the transitive owl:sameAs -closure for an individual URI.

I actually thought that any necessary escaping would be handled by thesesame library (or the virtuoso implementation of the sesamerepo/connection). Do you suggest that I check every URI that I intend touse in con.getStatements(...) and percent-encode any offendingcharacters? Aside from performance considerations, would that even work?



Thanks,
Florian

Re: [Virtuoso-users] backslash character within URI

Reply via email to