Re: EXCEPTION: org.apache.jena.atlas.AtlasException: java.nio.charset.MalformedInputException: Input length = 1, while i read a NT file

Andy Seaborne Fri, 20 Feb 2015 01:37:55 -0800

On 20/02/15 08:01, Marco Tenti wrote:

Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail  to
read some file with .nt extension. These file are generated  with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
I get in the specific these error:


InputStream in = Filemanager.get().open("filename");

//1)
model.read(in, "NT");


That is :
model.read(in, baseURI)

not setting the language.


Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
col: 7 ] Element or attribute do not match QName production:
QName::=(NCName':')?NCName. *

It thinks its RDF/XML because you set the base URI to "NT" and thedefault language is RDF/XML.


Better:

RDFDataMgr.read(model, in, Lang.NT) ;

as it uses typed constants.

RDFDataMgr.read(model, "filename") ;

will work with file extension .nt/.ttl etc

(actually, model.read("filename2) works nowadays)

:
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
*Exception  org.apache.jena.atlas.AtlasException:
java.nio.charset.MalformedInputException: Input length = 1*

any idea why jena trhow these exception?


Bad data.

If you get

  java.nio.charset.MalformedInputException

it means the file is not valid UTF-8. Exactly where is hard todetermine from the error because Jena reads a block of 128K bytes forefficiency reasons (it's a major cost of N-Triples parsing) and the javabytes to chars conversion for UTF-8 does not say where the error occurs.


A common cause is iso-8859-1 data.  N-Triples is UTF-8 only.

There is a utility in jena "riotcmd.utf8" that does a careful utf8 readof the file character by character.

Look at your data and very carefully check how the program you are usingis setup. It's all too easy to accidentally view a file in the platformnative setup.

ty in advance. Greetings.


        Andy

Re: EXCEPTION: org.apache.jena.atlas.AtlasException: java.nio.charset.MalformedInputException: Input length = 1, while i read a NT file

Reply via email to