K solved, ty Dave and Andy for the response and sorry i writing bad constructor because i was in a rush, anyway in the specific my problem was the iso-8859-1 encoding. Like Andy has said. TY all.
2015-02-20 10:35 GMT+01:00 Andy Seaborne <[email protected]>: > On 20/02/15 08:01, Marco Tenti wrote: > >> Hi everyone, i'm loading milions of triple split in hundred file in the >> same server very simple, but during the process i get sometime a fail to >> read some file with .nt extension. These file are generated with SILK ( >> http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/) >> I get in the specific these error: >> >> InputStream in = Filemanager.get().open("filename"); >> >> //1) >> model.read(in, "NT"); >> > > That is : > model.read(in, baseURI) > > not setting the language. > > >> Console: [line: 1, col: 7 ] Element or attribute do not match QName >> production: QName::=(NCName':')?NCName. >> *Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1, >> col: 7 ] Element or attribute do not match QName production: >> QName::=(NCName':')?NCName. * >> > > It thinks its RDF/XML because you set the base URI to "NT" and the default > language is RDF/XML. > > Better: > > RDFDataMgr.read(model, in, Lang.NT) ; > > as it uses typed constants. > > RDFDataMgr.read(model, "filename") ; > > will work with file extension .nt/.ttl etc > > (actually, model.read("filename2) works nowadays) > > : >> //2) >> org.apache.jena.riot.RDFDataMgr.read(model,in,"NT"); >> *Exception org.apache.jena.atlas.AtlasException: >> java.nio.charset.MalformedInputException: Input length = 1* >> >> any idea why jena trhow these exception? >> > > Bad data. > > If you get > > java.nio.charset.MalformedInputException > > it means the file is not valid UTF-8. Exactly where is hard to determine > from the error because Jena reads a block of 128K bytes for efficiency > reasons (it's a major cost of N-Triples parsing) and the java bytes to > chars conversion for UTF-8 does not say where the error occurs. > > A common cause is iso-8859-1 data. N-Triples is UTF-8 only. > > There is a utility in jena "riotcmd.utf8" that does a careful utf8 read of > the file character by character. > > Look at your data and very carefully check how the program you are using > is setup. It's all too easy to accidentally view a file in the platform > native setup. > > ty in advance. Greetings. >> >> > Andy > >
