On 20/02/15 08:01, Marco Tenti wrote:
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail to
read some file with .nt extension. These file are generated with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
I get in the specific these error:
InputStream in = Filemanager.get().open("filename");
//1)
model.read(in, "NT");
That is :
model.read(in, baseURI)
not setting the language.
Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
col: 7 ] Element or attribute do not match QName production:
QName::=(NCName':')?NCName. *
It thinks its RDF/XML because you set the base URI to "NT" and the
default language is RDF/XML.
Better:
RDFDataMgr.read(model, in, Lang.NT) ;
as it uses typed constants.
RDFDataMgr.read(model, "filename") ;
will work with file extension .nt/.ttl etc
(actually, model.read("filename2) works nowadays)
:
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
*Exception org.apache.jena.atlas.AtlasException:
java.nio.charset.MalformedInputException: Input length = 1*
any idea why jena trhow these exception?
Bad data.
If you get
java.nio.charset.MalformedInputException
it means the file is not valid UTF-8. Exactly where is hard to
determine from the error because Jena reads a block of 128K bytes for
efficiency reasons (it's a major cost of N-Triples parsing) and the java
bytes to chars conversion for UTF-8 does not say where the error occurs.
A common cause is iso-8859-1 data. N-Triples is UTF-8 only.
There is a utility in jena "riotcmd.utf8" that does a careful utf8 read
of the file character by character.
Look at your data and very carefully check how the program you are using
is setup. It's all too easy to accidentally view a file in the platform
native setup.
ty in advance. Greetings.
Andy