Re: inconsistent blank node id (jena api and sparql endpoint web interface)

Paul Gearon Wed, 06 Mar 2013 17:37:22 -0800

Joshua did answer, but I thought I'd just add...

Blank nodes have no canonical representation. They will typically be
represented with a number internally. When a query asks to show them, then
they should appear as one of the accepted formats for a blank node, which
is usually _: followed by an identifier that is unique to that store. For
instance, a blank node may appear as _:1234.

Just like RDF documents in Turtle, a blank node representation need only be
unique to the current context. The next query result can return data about
a completely different blank node and ALSO use _:1234 as the identifier,
even though the nodes are completely different. In practice, this isn't
likely to happen with most RDF stores (since stores usually just print the
internal identifier), but it's possible. You tend to see it when a store
exports data, since the first blank node may show up as _:0001, the second
as _:0002, and so on. The next export can be showing completely different
data but use those same identifiers in the document.

Some systems like to skolemize blank nodes, which means that a pseudo
global identifier (in the form of an IRI) is created for them. It looks
like the first endpoint you referred to did this (using a scheme of
"nodeID"). I don't like what the second end point did, since if it started
with a letter (possible, since they're hex digits) then it would be
parseable as a URI.

The *only* way you can refer to a blank node again is by it's properties.
Most schemas or ontologies will have a property on the blank node that
uniquely identifies it. In some cases it may be a group of properties that
uniquely identifies it. You need to use a variable to refer to the blank
node, and then include a triple pattern in your query that connects that
variable to the property/value that you need. If you have blank nodes that
cannot be uniquely identified (by any one, or combination of properties),
then you may not be using an appropriate schema for the data.

As a convenience, instead of a variable to represent your blank node, you
can use a blank node syntax. This is just a variable without a real name.
So instead of your WHERE clause containing:

$blank ex:identifier "identifying-value" .
$blank ex:property $resultData

You can instead say:

_:b1 ex:identifier "identifying-value" .
_:b1 ex:property $resultData

But remember that this is essentially just an unnamed variable. It might
look like a blank node, but it can bind to anything (including blank nodes
and IRIs).

Regards,
Paul

On Wed, Mar 6, 2013 at 10:54 AM, Ziqi Zhang
<[email protected]>wrote:

> Hi
>
> I may have misunderstood something but here is my problem.
>
> I am using Jena API to get triples from this SPARQL endpoint:
> http://sparql.sindice.com/**sparql <http://sparql.sindice.com/sparql>
> My query is:
> --------------
> SELECT DISTINCT ?s ?o WHERE {
> ?s rdf:type rdfs:Class .
> {?s foaf:name "species"@en .}
> UNION {?s foaf:name "species" .}
> OPTIONAL {?s owl:equivalentClass ?o .}
> }
> --------------
>
> The query should return 5 results, each about a *blank node*. If you send
> the query using the web interface above, you should get the following
> results:
> ---------------------------
> s       o
> nodeID://b122741495 http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42069<http://purl.org/science/protein/bysequence/ncbi_gene.42069>
> nodeID://b122741495 http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42504<http://purl.org/science/protein/bysequence/ncbi_gene.42504>
> nodeID://b122741495 http://purl.org/science/**
> protein/bysequence/ncbi_gene.**47877<http://purl.org/science/protein/bysequence/ncbi_gene.47877>
> nodeID://b122741495 http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42945<http://purl.org/science/protein/bysequence/ncbi_gene.42945>
> nodeID://b122741495     nodeID://b122741495
>
>
> ---------------------------
>
> However, using Java Jena API and the following, code, I get completely
> different blank node IDs:
> ----------------------
> s                                                       o
> 32ec7330:13d4066f80a:-7fff http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42069<http://purl.org/science/protein/bysequence/ncbi_gene.42069>
> 32ec7330:13d4066f80a:-7fff http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42504<http://purl.org/science/protein/bysequence/ncbi_gene.42504>
> 32ec7330:13d4066f80a:-7fff http://purl.org/science/**
> protein/bysequence/ncbi_gene.**47877<http://purl.org/science/protein/bysequence/ncbi_gene.47877>
> 32ec7330:13d4066f80a:-7fff http://purl.org/science/**
> protein/bysequence/ncbi_gene.**42945<http://purl.org/science/protein/bysequence/ncbi_gene.42945>
> ----------------------
>
> Why are the IDs different? because they are different, I cannot do further
> queries on the node at the sparql end point. What I mean is, if I then
> query:
> "Select ?p ?o where{32ec7330:13d4066f80a:-**7fff ?p ?o .}"
> I will have no results, because the node ID does not match with
> "nodeID://b122741495".
>
>
> Would really appreciate any insight to this!
>
> --
> Ziqi Zhang
>
>

Re: inconsistent blank node id (jena api and sparql endpoint web interface)

Reply via email to