On 03/31/2013 02:13 PM, Andy Seaborne wrote:
On 31/03/13 11:47, lou1se m1ch3l wrote:
I'm working with input models like this one:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://www.test.org/">
<rdf:Description rdf:about="http://www.test.org/test">
<sometext xml:lang="en" rdf:parseType="Literal">some
text</sometext>
<sometext xml:lang="fr"
rdf:parseType="Literal">texte</sometext>
</rdf:Description>
</rdf:RDF>
Hi there,
In RDF/XML if both a datatype and a language is given the datatype
take precedence and the language, or enclosing language (it may be
further out) is ignored.
<sometext xml:lang="en"
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>foo</sometext>
==> "foo"^^xsd:string. No language.
rdf:parseType="Literal" is like writing rdf:datatype=rdf:XMLLiteral
except it also tells the parer to use the XML context as the literal
lexical form.
So the language is ignored - XMLLiterals are supposed to be
self-contained XML fragments, independent of the XML document context
as if wrapped in <div></div>
To put a language in the content
<sometext rdf:parseType="Literal"><span
xml:lang="fr">texte<span></sometext>
The language is not part of the RDF literal and SPARQL LANG() wil not
get it.
Your SPARQL query is right - it is the data that does not contain the
language information.
Andy
Thank you for the detailed and clear answer.
I didn't know (as you can guess) that "In RDF/XML if both a datatype and
a language is given the datatype take precedence and the (enclosing)
language is ignored", though my intuition was that this use of
parseType="Literal" was inappropriate.
As I don't have any influence on the incoming data model, I "solved" my
problem by pre-processing the input to get rid of the problematic
parseType attributes. Notice that I was unable to succeed in this at the
RDF level (within Jena API), as the language information seems to be
definitively lost (at least I couldn't retrieve it) once the graph is
built: I had to do this pre-processing at the text/XML level.
Regards.
I'm trying to load the text for a given language, using some SPARQL like
this:
SELECT ?sometext
WHERE {
?x <http://www.test.org/sometext>?sometext .
FILTER (LANG(?sometext) = 'en')
}
As you can see bellow, I'm having trouble to filter the result according
to the language (en):
$ sparql --data=data.rdf.xml --query=test.rq
------------
| sometext |
============
------------
If I comment out the filtering as
# FILTER (LANG(?sometext) = 'en')
the result is as bellow:
------------------------------------------------------------------------
| sometext |
========================================================================
| "texte"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>|
| "some text"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>|
------------------------------------------------------------------------
If I remove all the rdf:parseType="Literal" attributes from the model,
using the initial query, result is:
------------------
| sometext |
==================
| "some text"@en |
------------------
So, it seems that the xml:lang attribute applies to the sometext element
itself, but _not_ to the enclosed literal denoted by
rdf:parseType="Literal".
I'm quite sure it's not a bug within the Jena framework, but rather a
consequence from my ignorance: how should i write the SPARQL query to
filter the results according to the value of the xml:lang attribute,
provided I have to accommodate with the input model ?
Thanks for any advice or interesting pointer.
Regards.