I did further investigations and found the following implementation in RDFDataMgr works for our use cases:

private static ContentType determineCT(String target, String ctStr, Lang hintLang)
    {
        if ( hintLang != null )
            return hintLang.getContentType() ;

boolean isTextPlain = WebContent.contentTypeTextPlain.equals(ctStr) ;

        if ( ctStr != null )
            ctStr = WebContent.contentTypeCanonical(ctStr) ;
        ContentType ct = (ctStr==null) ? null : ContentType.create(ctStr) ;

        // If it's text plain, we ignore it because a lot of naive
        // server setups return text/plain for any file type.
        // We use the file extension.

        if ( ct == null || isTextPlain )
            ct = RDFLanguages.guessContentType(target) ;

        return ct ;
    }

However, this also revealed that the first five tests of TestJenaReaderRIOT do not follow the contract and failed: they call Model.read(url) with no further arguments and expect "content negotiation" based on file names, but according to the documentation this variant should only work for RDF/XML syntax. While I acknowledge that the preference of RDF/XML may be a historical artifact, I do believe that something is inconsistent here. Instead of changing the contract of the existing implementation, maybe new methods should be introduced for the automated language detection.

In any case we definitely need a way to bypass the current implementation and force the explicitly provided language to overrule any automatically determined languages.

HTH
Holger


On 8/23/2013 9:10, Holger Knublauch wrote:
On 8/23/2013 2:06, Andy Seaborne wrote:
Based on your experience, when does it make a difference?

A number of our test cases broke, and I tracked it down to the case where the base URI of a graph ends with something like .owl yet the file is saved in ttl. A number of our ontologies used such naming conventions in the past, and I wouldn't be surprised if others are now in the same situation where they started with some base URI and used it in owl:imports etc, and then changed the serialization to Turtle. While the base URI ends with .owl, the input stream is still coming from a local file ending with .ttl. But our application knows that these are really turtle files, and since we take control of owl:imports this is IMHO a valid scenario.


There are three pieces of information to consider: content type (if from the web), the user specified language and the file extension. What happens when there is contradictory information is a pragmatic decision. Hence the "don't trust text/plain" part -- too many Turtle files come back text plain!

Yes that's fine but our case above is not related to the HTTP response type, but should rely entirely on the information passed into the read as third parameter (FileUtils.langTurtle etc). If the latter is provided then it should override anything else, and things like response type and file ending should only be used if no language "hint" has been provided. The general mismatch remains that the Model.read API states that the language is the file type, while RIOT changes this to be a hint only. I believe changing the order of the two if statements from my previous message would be a better implementation, and restore the behavior of the Model.read API.

Hope this clarifies things.

Thanks,
Holger


Reply via email to