Hello, Jena Community. TL;DR: Is there a way I can export and/or import a graph with invalid IRIs; typically, IRIs with spaces in them?
Details: When I try to write out a graph that contains the IRI <http://foo.com/bar baz> with a method like this: static void writeModelTo(String baseURI, Model model, OutputStream out) { RDFWriter.create() .base(baseURI) .format(RDFFormat.TURTLE_BLOCKS) .source(model) .output(out); } the result is an error with a stack trace like this: Caused by: org.apache.jena.irix.IRIException: <http://foo.com/bar baz> Code: 17/WHITESPACE in PATH: A single whitespace character. These match no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs. at org.apache.jena.irix.IRIProviderJenaIRI.exceptions(IRIProviderJenaIRI.java:256) ~[jena-core-4.3.2.jar:4.3.2] at org.apache.jena.irix.IRIProviderJenaIRI.newIRIxJena(IRIProviderJenaIRI.java:137) ~[jena-core-4.3.2.jar:4.3.2] at org.apache.jena.irix.IRIProviderJenaIRI.create(IRIProviderJenaIRI.java:145) ~[jena-core-4.3.2.jar:4.3.2] at org.apache.jena.irix.IRIx.create(IRIx.java:54) ~[jena-core-4.3.2.jar:4.3.2] at org.apache.jena.riot.out.NodeFormatterTTL.abbrevByBase(NodeFormatterTTL.java:100) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.out.NodeFormatterTTL.formatURI(NodeFormatterTTL.java:84) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.out.NodeFormatterBase.formatURI(NodeFormatterBase.java:70) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.out.NodeFormatterBase.format(NodeFormatterBase.java:43) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBase.outputNode(WriterStreamRDFBase.java:159) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBlocks.writePredicateObjectList(WriterStreamRDFBlocks.java:161) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBlocks.printBatch(WriterStreamRDFBlocks.java:140) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBlocks.printBatchTriples(WriterStreamRDFBlocks.java:126) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBatched.finishBatchTriples(WriterStreamRDFBatched.java:100) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBatched.batch(WriterStreamRDFBatched.java:74) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBatched.print(WriterStreamRDFBatched.java:88) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.WriterStreamRDFBase.triple(WriterStreamRDFBase.java:116) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.system.StreamRDFOps.sendTriplesToStream(StreamRDFOps.java:122) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.system.StreamRDFOps.sendGraphToStream(StreamRDFOps.java:108) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.TurtleWriterBlocks.output(TurtleWriterBlocks.java:36) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.TurtleWriterBase.output$(TurtleWriterBase.java:53) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.writer.TurtleWriterBase.write(TurtleWriterBase.java:47) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:236) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:195) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:146) ~[jena-arq-4.3.2.jar:4.3.2] at org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:205) ~[jena-arq-4.3.2.jar:4.3.2] Likewise, when I import a graph that contains the IRI <http://foo.com/bar baz#xxxx> with a method like this: static Model modelFrom(InputStream in, String baseURI) { Model model = ModelFactory.createDefaultModel(); RDFParser.create() .source(in) .lang(Lang.TURTLE) .base(baseURI) .parse(model); return model; } the result is an error with a stack trace like this: Caused by: org.apache.jena.riot.RiotException: [line: 30, col: 29] Bad character in IRI (space): <http://foo.com/bar[space]...> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:156) at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1334) at org.apache.jena.riot.tokens.TokenizerText.readIRI(TokenizerText.java:532) at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:194) at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:90) at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50) at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92) at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:98) at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:340) at org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:314) at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:178) at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46) at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79) at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186) at org.apache.jena.riot.RDFParser.read(RDFParser.java:366) at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:356) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:306) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:252) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:261) at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:576) Is there any way to configure the Turtle Writer and/or Reader to simply log these errors and continue processing, assuming the issues are not too fatal? It appears there are a number of ways to configure the IRI writing and reading validation, but the indirection was a bit too deep for me to figure out how to configure the validation used by the Turtle Writer and Reader. Configuring the Turtle Reader/Writer validation would be very helpful (for me, at least) for several reasons: - Often, I have no control over the contents of the graphs, but I still want to export and import the graphs. - It seems reasonable that, if I can store an invalid IRI in a Jena TDB, I should be able to export that data and re-import it. This would allow me to restore a graph, invalid data and all, to its original state from its previously-exported Turtle file. - These exceptions stop the export/import dead in its tracks. Therefore, if a graph has multiple invalid IRIs, the export/import must be executed at least once for each invalid IRI, after each error is fixed. It would be much nicer (particularly for a large graph) to report multiple errors per execution. I would greatly appreciate any and all help. :-) Brian
