Re: RIOT language selection logic

Holger Knublauch Thu, 22 Aug 2013 22:23:06 -0700

I did further investigations and found the following implementation inRDFDataMgr works for our use cases:

private static ContentType determineCT(String target, String ctStr,Lang hintLang)

    {
        if ( hintLang != null )
            return hintLang.getContentType() ;

boolean isTextPlain =WebContent.contentTypeTextPlain.equals(ctStr) ;


        if ( ctStr != null )
            ctStr = WebContent.contentTypeCanonical(ctStr) ;
        ContentType ct = (ctStr==null) ? null : ContentType.create(ctStr) ;

        // If it's text plain, we ignore it because a lot of naive
        // server setups return text/plain for any file type.
        // We use the file extension.

        if ( ct == null || isTextPlain )
            ct = RDFLanguages.guessContentType(target) ;

        return ct ;
    }

However, this also revealed that the first five tests ofTestJenaReaderRIOT do not follow the contract and failed: they callModel.read(url) with no further arguments and expect "contentnegotiation" based on file names, but according to the documentationthis variant should only work for RDF/XML syntax. While I acknowledgethat the preference of RDF/XML may be a historical artifact, I dobelieve that something is inconsistent here. Instead of changing thecontract of the existing implementation, maybe new methods should beintroduced for the automated language detection.

In any case we definitely need a way to bypass the currentimplementation and force the explicitly provided language to overruleany automatically determined languages.


HTH
Holger


On 8/23/2013 9:10, Holger Knublauch wrote:

On 8/23/2013 2:06, Andy Seaborne wrote:
Based on your experience, when does it make a difference?
A number of our test cases broke, and I tracked it down to the casewhere the base URI of a graph ends with something like .owl yet thefile is saved in ttl. A number of our ontologies used such namingconventions in the past, and I wouldn't be surprised if others are nowin the same situation where they started with some base URI and usedit in owl:imports etc, and then changed the serialization to Turtle.While the base URI ends with .owl, the input stream is still comingfrom a local file ending with .ttl. But our application knows thatthese are really turtle files, and since we take control ofowl:imports this is IMHO a valid scenario.
There are three pieces of information to consider: content type (iffrom the web), the user specified language and the file extension.What happens when there is contradictory information is a pragmaticdecision. Hence the "don't trust text/plain" part -- too many Turtlefiles come back text plain!
Yes that's fine but our case above is not related to the HTTP responsetype, but should rely entirely on the information passed into the readas third parameter (FileUtils.langTurtle etc). If the latter isprovided then it should override anything else, and things likeresponse type and file ending should only be used if no language"hint" has been provided. The general mismatch remains that theModel.read API states that the language is the file type, while RIOTchanges this to be a hint only. I believe changing the order of thetwo if statements from my previous message would be a betterimplementation, and restore the behavior of the Model.read API.
Hope this clarifies things.

Thanks,
Holger

Re: RIOT language selection logic

Reply via email to