Andy, As always, I really appreciate your prompt response and fixes. I continue to be amazed at how quickly Jena responds to bugs and even feature requests. And again, jena-text integration is crucial for my project, so I greatly appreciate the integration of this work to replace Fuseki/LARQ.
I tried rebuilding this morning, and yes, efficiency is greatly improved by not using RDF/XML-ABBREV. (It appears that s-get uses Turtle by default now...) BTW, I'm a big fan of JSON-LD as eclipsing RDF/XML. I currently use jsonld-java for that, and I believe you mentioned to me on that forum that you hope to have that fully integrated into Jena at some point. The biggest selling point for me is that I am able to give my data as JSON-LD to customers and they are able to adapt to use it very easily as regular JSON, without them even knowing that they are actually working with RDF (though I feel a bit guilty about the subterfuge ;-). -Elli >________________________________ > From: Andy Seaborne <[email protected]> >To: [email protected] >Sent: Friday, June 28, 2013 6:17 AM >Subject: Re: Problem with Fuseki generating RDF/XML > > >Hi there, > >I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML. > >Details: > >The default when using RIOT to write in Lang.RDFXML is to use the pretty >form. i.e. when using RDFDataMgr.write(model,Lang.RDFXML). RIOT I/O is >not automatically used if available. > >Fuseki uses new style RDFDataMgr, not model.write so got affetced by the >change. > >Writing model.write() isn't affected. > >Yes - RDF/XML-ABBREV is expensive. I'm not completely sure why - the >Turtle writer is doing a similar, but not identical, analysis of the >model before writing. However, the RDF/XML-ABBREV writer has more >choices and more options to consider. > >>> is anyone really using >>> RDF/XML anymore as a human-readable format anyway? > >Absolutely! > >But, today, it's the standard. Tomorrow, it won't be the only choice >and I'm guessing that Turtle-only toolkits will emerge. > >Next ... > >DatasetAccessor: > >It does not seem to be setting the accept header at all so it gets the >default. Which is application/rdf+xml. > >I've recorded the need to set the accept header to a list based on >efficiency as: > >https://issues.apache.org/jira/browse/JENA-481 > >I thinking the order should be N-triples, Turtle, RDF/XML, "whatever you >can give me". > >For reference, the accept string for reading RDF with >RDFDataMgr.loadModel(URL) or model.read(URL) is currently: > >text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5; > >Maybe that should include "application/n-triples" - including the >original MIME type of text/plain is distinctly unhelpful. > > Andy > > >On 27/06/13 19:56, Rob Vesse wrote: >> Andy can probably give you a definitive answer here >> >> I know that there were significant improvements to the RDF output >> infrastructure made in 2.10.1 so my guess is that somehow the default >> RDF/XML output got switched as part of this upgrade (not necessarily >> intentionally). >> >> If this is the case Andy can likely make the fix easily, I however don't >> know where to look for this setting. >> >> Rob >> >> >> On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]> wrote: >> >>> I think I may have tracked down what is causing my slow performance of >>> GET with the new Fuseki 0.28 snapshot. Comparing the output of s-get for >>> the same data from the latest Fuseki 0.28 snapshot, and from the 0.26 >>> release, I discovered that the 0.28 snapshot is creating the XML in >>> hierarchical form, with nesting of elements (RDF/XML-ABBREV). In Fuseki >>> 0.26, it would output the RDF in the regular flattened RDF/XML format. >>> Obviously, creating the flattened form is much more efficient. >>> >>> While I understand that RDF/XML-ABBREV is more human readable, there's a >>> big price to pay in efficiency, at least for my data. In my case, I'm >>> accessing my Fuseki endpoint via datasetAccessor.getModel(), and as far >>> as I know, there's no way for me to tell Fuseki through this API that I >>> want the data to be serialized as N-TRIPLES (since it's just going to be >>> loaded in a Jena model anyway and not read by a human). Is there a way I >>> can control how Fuseki serializes by default? And why was the default >>> serialization format changed to RDF/XML-ABBREV - is anyone really using >>> RDF/XML anymore as a human-readable format anyway? ;-) >>> >>> I really appreciate any advice, workarounds, or fixes for this issue. I >>> can't really switch back to the earlier Fuseki versions anymore, since >>> the new jena-text makes my life so much easier since I no longer have to >>> worry about manually reindexing after SPARQL Update, like I did with >>> Fuseki and LARQ. Thanks for incorporating jena-text! >>> >>> Thank you, >>> Elli >>> >>> >>> >>>> ________________________________ >>>> From: Elli Schwarz <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Sent: Wednesday, June 26, 2013 9:48 AM >>>> Subject: Problem with Fuseki generating RDF/XML >>>> >>>> >>>> Rob, >>>> >>>> (This email previously had the subject JENA-378 Redux) >>>> >>>> I think I tracked down the problem with getModel() a bit more. Using >>>> s-get, I can get data back as TTL immediately: >>>> ./s-get http://localhost:3131/ds/data http://192.168.6.37/graph/uri_data >>>> >>>> >>>> If I modify the s-get script to get results as RDF/XML, then it takes >>>> several minutes for Fuseki 0.28-SNAPSHOT to respond. >>>> >>>> I start Fuseki 0.28 with this command (Fuseki 0.26 is started similarly, >>>> but with the config-tdb.ttl assembler): >>>> /usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar >>>> /opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar --update >>>> --config=config-tdb-text.ttl --port=3131 >>>> >>>> >>>> If I point the same modified s-get script to the Fuseki 0.26 release, >>>> the RDF/XML comes back immediately. My guess is that the >>>> DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data").getMod >>>> el(modelName) command I use gets data back as RDF/XML, and for some >>>> reason Fuseki 0.28 takes a long time to generate RDF/XML. Any ideas as >>>> to what changed in the latest version of Fuseki that would cause this >>>> problem? Is there any way I can set Fuseki (or the client >>>> DatasetAccessor) to use TTL serialization? >>>> >>>> (BTW, I created JENA-479 for the other bug I discovered with SPARQL >>>> Insert scripts.) >>>> >>>> Thank you very much for your help, >>>> Elli >>>> >>>> >>>> >>>>> ________________________________ >>>>> From: Rob Vesse <[email protected]> >>>>> To: "[email protected]" <[email protected]>; Elli Schwarz >>>>> <[email protected]> >>>>> Sent: Tuesday, June 25, 2013 4:40 PM >>>>> Subject: Re: JENA-378 Redux >>>>> >>>>> >>>>>> I use the older stable jena-core and jena-arq 2.10.0 and jena-fuseki >>>>>> 0.2.6 >>>>> >>>>> The current stable releases are jena-core and jena-arq 2.10.1 and >>>>> jena-fuseki 0.2.7 >>>>> >>>>> Do you experience the problem with those versions? >>>>> >>>>> Fuseki config file or arguments used to start would be useful. >>>>> >>>>> Rob >>>>> >>>>> >>>>> On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]> wrote: >>>>> >>>>>> This past January, I reported a bug to this list which was recorded as >>>>>> JENA-378. I'm now experiencing what appears to be the same problem, >>>>>> where >>>>>> [ ] syntax in an Insert script doesn't work when using >>>>>> UpdateExecutionFactory: >>>>>> >>>>>> String updateString = "INSERT {} WHERE { ?x ?p [ ?a ?b ] }"; >>>>>> UpdateRequest update = UpdateFactory.create(updateString); >>>>>> >>>>>> UpdateProcessor up = UpdateExecutionFactory.createRemote(update, >>>>>> "http://localhost:3131/ds/update"); >>>>>> up.execute(); >>>>>> >>>>>> The error is: 400 Encountered " "?" "? "" >>>>>> caused by the client generating incorrect SPARQL with an extra ? (as >>>>>> viewed from the Fuseki log): INSERT { } WHERE { ?x ?p ??0 . ??0 ?a >>>>>> ?b >>>>>> } >>>>>> >>>>>> This is with jena-core & jena-arg 2.10.2-SNAPSHOT, and with >>>>>> jena-fuseki >>>>>> 0.2.8-SNAPSHOT (compiled today). >>>>>> -- >>>>>> Another problem I'm having which I can't track down is that the >>>>>> following >>>>>> code takes a VERY long time to execute (10 minutes): >>>>>> DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update").ge >>>>>> tMo >>>>>> del(modelName); >>>>>> >>>>>> With earlier versions of Fuseki, it would take seconds, with the same >>>>>> data. The problem seems to be related to my Fuseki server instance >>>>>> itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my client code, >>>>>> since even if I use the older stable jena-core and jena-arq 2.10.0 and >>>>>> jena-fuseki 0.2.6, I also have the problem (but not if I connect it to >>>>>> an >>>>>> earlier Fuseki release). Upon debugging, it appears that for some >>>>>> reason >>>>>> the HTTP request itself is taking a long time to complete. In fact, I'm >>>>>> not even getting anything in the Fuseki log for about a minute after >>>>>> the >>>>>> request is made, but once the request is made I immediately see a spike >>>>>> in CPU usage on the server. This doesn't appear to be a network latency >>>>>> issue since other access to the server isn't affected, it appears to be >>>>>> just this call. It would seem that Fuseki is spinning its wheels on >>>>>> something. >>>>>> >>>>>> I realize this may not be enough info for you to determine what is >>>>>> causing the problem, but I don't know how else to track down the issue. >>>>>> Using s-get I can get back the data quickly, which is strange since I >>>>>> though it would be doing the same thing as the getModel(). >>>>>> >>>>>> Thank you, >>>>>> Elli >>>>> >>>>> >>>>> >>>> >> > > > >
