I did further investigations and found the following implementation in
RDFDataMgr works for our use cases:
private static ContentType determineCT(String target, String ctStr,
Lang hintLang)
{
if ( hintLang != null )
return hintLang.getContentType() ;
boolean isTextPlain =
WebContent.contentTypeTextPlain.equals(ctStr) ;
if ( ctStr != null )
ctStr = WebContent.contentTypeCanonical(ctStr) ;
ContentType ct = (ctStr==null) ? null : ContentType.create(ctStr) ;
// If it's text plain, we ignore it because a lot of naive
// server setups return text/plain for any file type.
// We use the file extension.
if ( ct == null || isTextPlain )
ct = RDFLanguages.guessContentType(target) ;
return ct ;
}
However, this also revealed that the first five tests of
TestJenaReaderRIOT do not follow the contract and failed: they call
Model.read(url) with no further arguments and expect "content
negotiation" based on file names, but according to the documentation
this variant should only work for RDF/XML syntax. While I acknowledge
that the preference of RDF/XML may be a historical artifact, I do
believe that something is inconsistent here. Instead of changing the
contract of the existing implementation, maybe new methods should be
introduced for the automated language detection.
In any case we definitely need a way to bypass the current
implementation and force the explicitly provided language to overrule
any automatically determined languages.
HTH
Holger
On 8/23/2013 9:10, Holger Knublauch wrote:
On 8/23/2013 2:06, Andy Seaborne wrote:
Based on your experience, when does it make a difference?
A number of our test cases broke, and I tracked it down to the case
where the base URI of a graph ends with something like .owl yet the
file is saved in ttl. A number of our ontologies used such naming
conventions in the past, and I wouldn't be surprised if others are now
in the same situation where they started with some base URI and used
it in owl:imports etc, and then changed the serialization to Turtle.
While the base URI ends with .owl, the input stream is still coming
from a local file ending with .ttl. But our application knows that
these are really turtle files, and since we take control of
owl:imports this is IMHO a valid scenario.
There are three pieces of information to consider: content type (if
from the web), the user specified language and the file extension.
What happens when there is contradictory information is a pragmatic
decision. Hence the "don't trust text/plain" part -- too many Turtle
files come back text plain!
Yes that's fine but our case above is not related to the HTTP response
type, but should rely entirely on the information passed into the read
as third parameter (FileUtils.langTurtle etc). If the latter is
provided then it should override anything else, and things like
response type and file ending should only be used if no language
"hint" has been provided. The general mismatch remains that the
Model.read API states that the language is the file type, while RIOT
changes this to be a hint only. I believe changing the order of the
two if statements from my previous message would be a better
implementation, and restore the behavior of the Model.read API.
Hope this clarifies things.
Thanks,
Holger