Re: Problem with Fuseki generating RDF/XML

Elli Schwarz Fri, 28 Jun 2013 06:20:00 -0700

Andy,

As always, I really appreciate your prompt response and fixes. I continue to be 
amazed at how quickly Jena responds to bugs and even feature requests. And 
again, jena-text integration is crucial for my project, so I greatly appreciate 
the integration of this work to replace Fuseki/LARQ.


I tried rebuilding this morning, and yes, efficiency is greatly improved by not 
using RDF/XML-ABBREV. (It appears that s-get uses Turtle by default now...)

BTW, I'm a big fan of JSON-LD as eclipsing RDF/XML. I currently use jsonld-java 
for that, and I believe you mentioned to me on that forum that you hope to have 
that fully integrated into Jena at some point. The biggest selling point for me 
is that I am able to give my data as JSON-LD to customers and they are able to 
adapt to use it very easily as regular JSON, without them even knowing that 
they are actually working with RDF (though I feel a bit guilty about the 
subterfuge ;-).

-Elli



>________________________________
> From: Andy Seaborne <[email protected]>
>To: [email protected] 
>Sent: Friday, June 28, 2013 6:17 AM
>Subject: Re: Problem with Fuseki generating RDF/XML
> 
>
>Hi there,
>
>I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML.
>
>Details:
>
>The default when using RIOT to write in Lang.RDFXML is to use the pretty 
>form.  i.e. when using RDFDataMgr.write(model,Lang.RDFXML).  RIOT I/O is 
>not automatically used if available.
>
>Fuseki uses new style RDFDataMgr, not model.write so got affetced by the 
>change.
>
>Writing model.write() isn't affected.
>
>Yes - RDF/XML-ABBREV is  expensive.  I'm not completely sure why - the 
>Turtle writer is doing a similar, but not identical, analysis of the 
>model before writing.  However, the RDF/XML-ABBREV writer has more 
>choices and more options to consider.
>
>>> is anyone really using
>>> RDF/XML anymore as a human-readable format anyway?
>
>Absolutely!
>
>But, today, it's the standard.  Tomorrow, it won't be the only choice 
>and I'm guessing that Turtle-only toolkits will emerge.
>
>Next ...
>
>DatasetAccessor:
>
>It does not seem to be setting the accept header at all so it gets the 
>default.  Which is application/rdf+xml.
>
>I've recorded the need to set the accept header to a list based on 
>efficiency as:
>
>https://issues.apache.org/jira/browse/JENA-481
>
>I thinking the order should be N-triples, Turtle, RDF/XML, "whatever you 
>can give me".
>
>For reference, the accept string for reading RDF with 
>RDFDataMgr.loadModel(URL) or model.read(URL) is currently:
>
>text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5;
>
>Maybe that should include "application/n-triples" - including the 
>original MIME type of text/plain is distinctly unhelpful.
>
>    Andy
>
>
>On 27/06/13 19:56, Rob Vesse wrote:
>> Andy can probably give you a definitive answer here
>>
>> I know that there were significant improvements to the RDF output
>> infrastructure made in 2.10.1 so my guess is that somehow the default
>> RDF/XML output got switched as part of this upgrade (not necessarily
>> intentionally).
>>
>> If this is the case Andy can likely make the fix easily, I however don't
>> know where to look for this setting.
>>
>> Rob
>>
>>
>> On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]> wrote:
>>
>>> I think I may have tracked down what is causing my slow performance of
>>> GET with the new Fuseki 0.28 snapshot. Comparing the output of s-get for
>>> the same data from the latest Fuseki 0.28 snapshot, and from the 0.26
>>> release, I discovered that the 0.28 snapshot is creating the XML in
>>> hierarchical form, with nesting of elements (RDF/XML-ABBREV). In Fuseki
>>> 0.26, it would output the RDF in the regular flattened RDF/XML format.
>>> Obviously, creating the flattened form is much more efficient.
>>>
>>> While I understand that RDF/XML-ABBREV is more human readable, there's a
>>> big price to pay in efficiency, at least for my data. In my case, I'm
>>> accessing my Fuseki endpoint via datasetAccessor.getModel(), and as far
>>> as I know, there's no way for me to tell Fuseki through this API that I
>>> want the data to be serialized as N-TRIPLES (since it's just going to be
>>> loaded in a Jena model anyway and not read by a human). Is there a way I
>>> can control how Fuseki serializes by default? And why was the default
>>> serialization format changed to RDF/XML-ABBREV - is anyone really using
>>> RDF/XML anymore as a human-readable format anyway? ;-)
>>>
>>> I really appreciate any advice, workarounds, or fixes for this issue. I
>>> can't really switch back to the earlier Fuseki versions anymore, since
>>> the new jena-text makes my life so much easier since I no longer have to
>>> worry about manually reindexing after SPARQL Update, like I did with
>>> Fuseki and LARQ. Thanks for incorporating jena-text!
>>>
>>> Thank you,
>>> Elli
>>>
>>>
>>>
>>>> ________________________________
>>>> From: Elli Schwarz <[email protected]>
>>>> To: "[email protected]" <[email protected]>
>>>> Sent: Wednesday, June 26, 2013 9:48 AM
>>>> Subject: Problem with Fuseki generating RDF/XML
>>>>
>>>>
>>>> Rob,
>>>>
>>>> (This email previously had the subject JENA-378 Redux)
>>>>
>>>> I think I tracked down the problem with getModel() a bit more. Using
>>>> s-get, I can get data back as TTL immediately:
>>>> ./s-get http://localhost:3131/ds/data http://192.168.6.37/graph/uri_data
>>>>
>>>>
>>>> If I modify the s-get script to get results as RDF/XML, then it takes
>>>> several minutes for Fuseki 0.28-SNAPSHOT to respond.
>>>>
>>>> I start Fuseki 0.28 with this command (Fuseki 0.26 is started similarly,
>>>> but with the config-tdb.ttl assembler):
>>>> /usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar
>>>> /opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar --update
>>>> --config=config-tdb-text.ttl --port=3131
>>>>
>>>>
>>>> If I point the same modified s-get script to the Fuseki 0.26 release,
>>>> the RDF/XML comes back immediately. My guess is that the
>>>> DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data";).getMod
>>>> el(modelName) command I use gets data back as RDF/XML, and for some
>>>> reason Fuseki 0.28 takes a long time to generate RDF/XML. Any ideas as
>>>> to what changed in the latest version of Fuseki that would cause this
>>>> problem? Is there any way I can set Fuseki (or the client
>>>> DatasetAccessor) to use TTL serialization?
>>>>
>>>> (BTW, I created JENA-479 for the other bug I discovered with SPARQL
>>>> Insert scripts.)
>>>>
>>>> Thank you very much for your help,
>>>> Elli
>>>>
>>>>
>>>>
>>>>> ________________________________
>>>>> From: Rob Vesse <[email protected]>
>>>>> To: "[email protected]" <[email protected]>; Elli Schwarz
>>>>> <[email protected]>
>>>>> Sent: Tuesday, June 25, 2013 4:40 PM
>>>>> Subject: Re: JENA-378 Redux
>>>>>
>>>>>
>>>>>> I use the older stable jena-core and jena-arq 2.10.0 and jena-fuseki
>>>>>> 0.2.6
>>>>>
>>>>> The current stable releases are jena-core and jena-arq 2.10.1 and
>>>>> jena-fuseki 0.2.7
>>>>>
>>>>> Do you experience the problem with those versions?
>>>>>
>>>>> Fuseki config file or arguments used to start would be useful.
>>>>>
>>>>> Rob
>>>>>
>>>>>
>>>>> On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]> wrote:
>>>>>
>>>>>> This past January, I reported a bug to this list which was recorded as
>>>>>> JENA-378. I'm now experiencing what appears to be the same problem,
>>>>>> where
>>>>>> [ ] syntax in an Insert script doesn't work when using
>>>>>> UpdateExecutionFactory:
>>>>>>
>>>>>>   String updateString = "INSERT {} WHERE { ?x ?p [ ?a  ?b ] }";
>>>>>>   UpdateRequest update = UpdateFactory.create(updateString);
>>>>>>
>>>>>>   UpdateProcessor up = UpdateExecutionFactory.createRemote(update,
>>>>>>       "http://localhost:3131/ds/update";);
>>>>>>   up.execute();
>>>>>>
>>>>>> The error is: 400 Encountered " "?" "? ""
>>>>>> caused by the client generating incorrect SPARQL with an extra ? (as
>>>>>> viewed from the Fuseki log):  INSERT { } WHERE   { ?x ?p ??0 . ??0 ?a
>>>>>> ?b
>>>>>> }
>>>>>>
>>>>>> This is with jena-core & jena-arg  2.10.2-SNAPSHOT, and with
>>>>>> jena-fuseki
>>>>>> 0.2.8-SNAPSHOT (compiled today).
>>>>>> --
>>>>>> Another problem I'm having which I can't track down is that the
>>>>>> following
>>>>>> code takes a VERY long time to execute (10 minutes):
>>>>>> DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update";).ge
>>>>>> tMo
>>>>>> del(modelName);
>>>>>>
>>>>>> With earlier versions of Fuseki, it would take seconds, with the same
>>>>>> data. The problem seems to be related to my Fuseki server instance
>>>>>> itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my client code,
>>>>>> since even if I use the older stable jena-core and jena-arq 2.10.0 and
>>>>>> jena-fuseki 0.2.6, I also have the problem (but not if I connect it to
>>>>>> an
>>>>>> earlier Fuseki release). Upon debugging, it appears that for some
>>>>>> reason
>>>>>> the HTTP request itself is taking a long time to complete. In fact, I'm
>>>>>> not even getting anything in the Fuseki log for about a minute after
>>>>>> the
>>>>>> request is made, but once the request is made I immediately see a spike
>>>>>> in CPU usage on the server. This doesn't appear to be a network latency
>>>>>> issue since other access to the server isn't affected, it appears to be
>>>>>> just this call. It would seem that Fuseki is spinning its wheels on
>>>>>> something.
>>>>>>
>>>>>> I realize this may not be enough info for you to determine what is
>>>>>> causing the problem, but I don't know how else to track down the issue.
>>>>>> Using s-get I can get back the data quickly, which is strange since I
>>>>>> though it would be doing the same thing as the getModel().
>>>>>>
>>>>>> Thank you,
>>>>>> Elli
>>>>>
>>>>>
>>>>>
>>>>
>>
>
>
>
>

Re: Problem with Fuseki generating RDF/XML

Reply via email to