On 20/02/15 08:01, Marco Tenti wrote:
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail  to
read some file with .nt extension. These file are generated  with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
I get in the specific these error:

InputStream in = Filemanager.get().open("filename");

//1)
model.read(in, "NT");

That is :
model.read(in, baseURI)

not setting the language.


Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
col: 7 ] Element or attribute do not match QName production:
QName::=(NCName':')?NCName. *

It thinks its RDF/XML because you set the base URI to "NT" and the default language is RDF/XML.

Better:

RDFDataMgr.read(model, in, Lang.NT) ;

as it uses typed constants.

RDFDataMgr.read(model, "filename") ;

will work with file extension .nt/.ttl etc

(actually, model.read("filename2) works nowadays)

:
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
*Exception  org.apache.jena.atlas.AtlasException:
java.nio.charset.MalformedInputException: Input length = 1*

any idea why jena trhow these exception?

Bad data.

If you get

  java.nio.charset.MalformedInputException

it means the file is not valid UTF-8. Exactly where is hard to determine from the error because Jena reads a block of 128K bytes for efficiency reasons (it's a major cost of N-Triples parsing) and the java bytes to chars conversion for UTF-8 does not say where the error occurs.

A common cause is iso-8859-1 data.  N-Triples is UTF-8 only.

There is a utility in jena "riotcmd.utf8" that does a careful utf8 read of the file character by character.

Look at your data and very carefully check how the program you are using is setup. It's all too easy to accidentally view a file in the platform native setup.

ty in advance. Greetings.


        Andy

Reply via email to