On 22/11/15 14:17, François-Paul Servant wrote:
Andy,
I improved my test (running on the same file): several runs, RDFFormats
variations… Results and code below (SSD disk, otherwise 5 years old mac
powerbook)
Main fact remains: JSON-LD serialization is slow (~10 times slower than
turtle). But people want JSON-LD. I think I’ll have a look at javascript to
convert turtle to JSON-LD in the browser.
Do they really want idiomatic JSON and JSON-LD? i.e. JSON-LD that looks
like, and can be used as, JSON? An RDF model output to JSON-LD is not
very JSON-ish (depends on the RDF shape)
To get the more natural JSON aspect, I've had some success with a simple
pipeline of SPARQL SELECT to JSON-LD (this is not CONSTRUCT as JSON-LD).
SELECT ?var1 ?var2 ?var3 .... {
?x :property1 ?var1 ;
:property2 ?var2 ;
:property3 ?var3 ;
...
}
and then output something like:
{ @context { ... }
"@id" : "request-id" ,
"documentation_api" : "http://URL-of-API-documentation" ,
"documentation_data" : "http://URL-of-data-documentation" ,
"data" : {
"property1" : value-of-var1 ,
"property2" : value-of-var2 ,
"property3" : value-of-var3
}
}
which is a bit of metadata about the request and the data from the query.
It's more JSON-like and the details of syntax are more controlled which
is in keeping with JSON. It's predictable so JSON access by path can
be used.
c.f. XML and RDF/XML
> // warm it up
> System.out.println("*** WARM-UP ***");
> for (RDFFormat format : formats) {
> doIt(m, format, 1);
> }
May not make a difference here but a warm-up of one is not enough to get
all the overheads accounted for. It will have caused class loading, a
significant cost, but not caused all the JIT work to have been done.
For predictability, writing to a null output stream (or to /dev/null)
isolates the test from the vagaries of local I/O.
Andy
Best Regards,
fps
model.size() 7559
*** WARM-UP ***
JSON-LD/pretty TIME: 766 ms
JSON-LD/flat TIME: 563 ms
N-Triples/utf-8 TIME: 80 ms
RDF/XML/pretty TIME: 560 ms
RDF/XML/plain TIME: 227 ms
RDF/XML/pretty TIME: 518 ms
Turtle/blocks TIME: 120 ms
Turtle/flat TIME: 142 ms
Turtle/pretty TIME: 110 ms
N-Triples/utf-8 TIME: 50 ms
*** RESULTS ***
JSON-LD/pretty TIME: 497 ms
JSON-LD/flat TIME: 475 ms
N-Triples/utf-8 TIME: 31 ms
RDF/XML/pretty TIME: 253 ms
RDF/XML/plain TIME: 140 ms
RDF/XML/pretty TIME: 215 ms
Turtle/blocks TIME: 50 ms
Turtle/flat TIME: 46 ms
Turtle/pretty TIME: 52 ms
N-Triples/utf-8 TIME: 34 ms
package testperfs;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.riot.RDFDataMgr;
import org.apache.jena.riot.RDFFormat;
import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
public class serialization {
@BeforeClass
public static void setUpBeforeClass() throws Exception {
}
@AfterClass
public static void tearDownAfterClass() throws Exception {
}
@Before
public void setUp() throws Exception {
}
@After
public void tearDown() throws Exception {
}
@Test
public final void test() throws IOException {
Model m = loadModel();
System.out.println("model.size() " + m.size());
RDFFormat[] formats = {
RDFFormat.JSONLD_PRETTY,
RDFFormat.JSONLD_FLAT,
RDFFormat.NTRIPLES_UTF8,
RDFFormat.RDFXML_ABBREV,
RDFFormat.RDFXML_PLAIN,
RDFFormat.RDFXML_PRETTY,
RDFFormat.TURTLE_BLOCKS,
RDFFormat.TURTLE_FLAT,
RDFFormat.TURTLE_PRETTY,
RDFFormat.NTRIPLES_UTF8};
// warm it up
System.out.println("*** WARM-UP ***");
for (RDFFormat format : formats) {
doIt(m, format, 1);
}
// now for real
System.out.println("*** RESULTS ***");
for (RDFFormat format : formats) {
doIt(m, format, 20);
}
}
private void doIt(Model m, RDFFormat format, int n) throws IOException {
File f = new File(getFile("/testperfs"),"output.txt");
long time = 0;
for (int i = 0 ; i < n ; i++) {
if (f.exists()) f.delete();
OutputStream out = new BufferedOutputStream(new
FileOutputStream(f));
long start = System.currentTimeMillis();
RDFDataMgr.write(out, m, format) ;
out.flush();
out.close();
long end = System.currentTimeMillis();
time += (end-start);
}
String x = format + " TIME: " + time/n + " ms";
System.out.println(x);
}
private void doItOld(Model m, String lang) throws IOException {
File f = new File(getFile("/testperfs"),"output.txt");
if (f.exists()) f.delete();
OutputStream out = new BufferedOutputStream(new FileOutputStream(f));
long start = System.currentTimeMillis();
m.write(out, lang);
out.flush();
out.close();
long end = System.currentTimeMillis();
String x = lang + " TIME: " + (end-start) + " ms";
System.out.println(x);
f.delete();
}
private Model loadModel() throws IOException {
Model m = ModelFactory.createDefaultModel();
// Loading the model
File f = getTestFile();
InputStream in = new BufferedInputStream(new FileInputStream(f));
m.read(in, null, "JSON-LD");
in.close();
return m;
}
private File getTestFile() {
return getFile("/testperfs/docs.jsonld");
}
private File getFile(String name) {
URL resourceUrl = getClass().getResource(name);
return new File(resourceUrl.getFile());
}
}
Le 21 nov. 2015 à 18:06, Andy Seaborne <[email protected]> a écrit :
On 21/11/15 01:28, François-Paul Servant wrote:
Hi,
it seems to me that JSON-LD serialization is slow. Do you have the same feeling?
Here are the results of a comparative test that I run on my machine (outputing
one model to a file, using Jena 3.0.1-SNAPSHOT)
Are these single run costs? i.e. from cold?
model.size() 7559
JSON-LD TIME: 649 ms
Jena use a separate self-contained engine, jsonld-java, which in trun uses
Jackson.
It means taking a copy of much of the material to be printed, getting an
in-memory structure then traversing it for output, including formatting the
JSON so it is not all on one line and JSON-indented.
TURTLE TIME: 136 ms
That is the normal Turtle writer? It's pretty printing and so non-streaming -
there are variations (RDFFormat) to printing streaming style with less
prettiness but still using prefixed names. Pretty is not free.
RDF/XML TIME: 548 ms
Ditto - RDF/XML or RDF/XML-ABBREV. The default using RDFDataMgr for Loang.RDFXML is
"pretty" (RDF/XML-ABBREV)
N-TRIPLE TIME: 61 ms
Is that to a spinning disk or SSD? (Given a disk write inc sync is of the
order of 10ms)
Andy
annoying...
Best,
fps