Re: Problem with Fuseki generating RDF/XML

Andy Seaborne Fri, 28 Jun 2013 13:05:36 -0700

On 28/06/13 14:19, Elli Schwarz wrote:

Andy,


As always, I really appreciate your prompt response and fixes. I
continue to be amazed at how quickly Jena responds to bugs and even
feature requests.

Sometimes its easier to just do the change rather than the paper workand risk forgetting it :-)


This wasn't an intentional change.

And again, jena-text integration is crucial for my
project, so I greatly appreciate the integration of this work to replace
Fuseki/LARQ.

I tried rebuilding this morning, and yes, efficiency is greatly improved
by not using RDF/XML-ABBREV. (It appears that s-get uses Turtle by
default now...)


Yes :-)

BTW, I'm a big fan of JSON-LD as eclipsing RDF/XML. I currently use
jsonld-java for that, and I believe you mentioned to me on that forum
that you hope to have that fully integrated into Jena at some point. The
biggest selling point for me is that I am able to give my data as
JSON-LD to customers and they are able to adapt to use it very easily as
regular JSON, without them even knowing that they are actually working
with RDF (though I feel a bit guilty about the subterfuge ;-).


There is an adapter at

https://github.com/afs/jena-jsonld

with example of integration (== call JenaJSONLD.init())

which is using

https://github.com/jsonld-java/jsonld-java

to do all the real JSON-LD work.

One issue with JSON-LD is that if you work with it as JSON then it maynot remain JSON-LD/RDF. It's OK to read but an update to the JSON doesnot necessarily remain correct JSON-LD.

JSON-LD does not scale. The processing model assumes you have the wholedocument available. It may be possible to write a directJSON-LD->triples parser which is streaming but some of the otheralgorithms work on documents. jsonld-java builds the JSON-LD in-memoryfirst.


        Andy


-Elli

    ------------------------------------------------------------------------
    *From:* Andy Seaborne <[email protected]>
    *To:* [email protected]
    *Sent:* Friday, June 28, 2013 6:17 AM
    *Subject:* Re: Problem with Fuseki generating RDF/XML

    Hi there,

    I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML.

    Details:

    The default when using RIOT to write in Lang.RDFXML is to use the
    pretty
    form.  i.e. when using RDFDataMgr.write(model,Lang.RDFXML).  RIOT
    I/O is
    not automatically used if available.

    Fuseki uses new style RDFDataMgr, not model.write so got affetced by
    the
    change.

    Writing model.write() isn't affected.

    Yes - RDF/XML-ABBREV is  expensive.  I'm not completely sure why - the
    Turtle writer is doing a similar, but not identical, analysis of the
    model before writing.  However, the RDF/XML-ABBREV writer has more
    choices and more options to consider.

     >> is anyone really using
     >> RDF/XML anymore as a human-readable format anyway?

    Absolutely!

    But, today, it's the standard.  Tomorrow, it won't be the only choice
    and I'm guessing that Turtle-only toolkits will emerge.

    Next ...

    DatasetAccessor:

    It does not seem to be setting the accept header at all so it gets the
    default.  Which is application/rdf+xml.

    I've recorded the need to set the accept header to a list based on
    efficiency as:

    https://issues.apache.org/jira/browse/JENA-481

    I thinking the order should be N-triples, Turtle, RDF/XML, "whatever
    you
    can give me".

    For reference, the accept string for reading RDF with
    RDFDataMgr.loadModel(URL) or model.read(URL) is currently:

    text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5;

    Maybe that should include "application/n-triples" - including the
    original MIME type of text/plain is distinctly unhelpful.

         Andy


    On 27/06/13 19:56, Rob Vesse wrote:
     > Andy can probably give you a definitive answer here
     >
     > I know that there were significant improvements to the RDF output
     > infrastructure made in 2.10.1 so my guess is that somehow the default
     > RDF/XML output got switched as part of this upgrade (not necessarily
     > intentionally).
     >
     > If this is the case Andy can likely make the fix easily, I
    however don't
     > know where to look for this setting.
     >
     > Rob
     >
     >
     > On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]
    <mailto:[email protected]>> wrote:
     >
     >> I think I may have tracked down what is causing my slow
    performance of
     >> GET with the new Fuseki 0.28 snapshot. Comparing the output of
    s-get for
     >> the same data from the latest Fuseki 0.28 snapshot, and from the
    0.26
     >> release, I discovered that the 0.28 snapshot is creating the XML in
     >> hierarchical form, with nesting of elements (RDF/XML-ABBREV). In
    Fuseki
     >> 0.26, it would output the RDF in the regular flattened RDF/XML
    format.
     >> Obviously, creating the flattened form is much more efficient.
     >>
     >> While I understand that RDF/XML-ABBREV is more human readable,
    there's a
     >> big price to pay in efficiency, at least for my data. In my
    case, I'm
     >> accessing my Fuseki endpoint via datasetAccessor.getModel(), and
    as far
     >> as I know, there's no way for me to tell Fuseki through this API
    that I
     >> want the data to be serialized as N-TRIPLES (since it's just
    going to be
     >> loaded in a Jena model anyway and not read by a human). Is there
    a way I
     >> can control how Fuseki serializes by default? And why was the
    default
     >> serialization format changed to RDF/XML-ABBREV - is anyone
    really using
     >> RDF/XML anymore as a human-readable format anyway? ;-)
     >>
     >> I really appreciate any advice, workarounds, or fixes for this
    issue. I
     >> can't really switch back to the earlier Fuseki versions anymore,
    since
     >> the new jena-text makes my life so much easier since I no longer
    have to
     >> worry about manually reindexing after SPARQL Update, like I did with
     >> Fuseki and LARQ. Thanks for incorporating jena-text!
     >>
     >> Thank you,
     >> Elli
     >>
     >>
     >>
     >>> ________________________________
     >>> From: Elli Schwarz <[email protected]
    <mailto:[email protected]>>
     >>> To: "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>
     >>> Sent: Wednesday, June 26, 2013 9:48 AM
     >>> Subject: Problem with Fuseki generating RDF/XML
     >>>
     >>>
     >>> Rob,
     >>>
     >>> (This email previously had the subject JENA-378 Redux)
     >>>
     >>> I think I tracked down the problem with getModel() a bit more.
    Using
     >>> s-get, I can get data back as TTL immediately:
     >>> ./s-get http://localhost:3131/ds/data
    <http://localhost:3131/ds/data>http://192.168.6.37/graph/uri_data
     >>>
     >>>
     >>> If I modify the s-get script to get results as RDF/XML, then it
    takes
     >>> several minutes for Fuseki 0.28-SNAPSHOT to respond.
     >>>
     >>> I start Fuseki 0.28 with this command (Fuseki 0.26 is started
    similarly,
     >>> but with the config-tdb.ttl assembler):
     >>> /usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar
     >>> /opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar
    --update
     >>> --config=config-tdb-text.ttl --port=3131
     >>>
     >>>
     >>> If I point the same modified s-get script to the Fuseki 0.26
    release,
     >>> the RDF/XML comes back immediately. My guess is that the
     >>>
    DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data";).getMod
     >>> el(modelName) command I use gets data back as RDF/XML, and for some
     >>> reason Fuseki 0.28 takes a long time to generate RDF/XML. Any
    ideas as
     >>> to what changed in the latest version of Fuseki that would
    cause this
     >>> problem? Is there any way I can set Fuseki (or the client
     >>> DatasetAccessor) to use TTL serialization?
     >>>
     >>> (BTW, I created JENA-479 for the other bug I discovered with SPARQL
     >>> Insert scripts.)
     >>>
     >>> Thank you very much for your help,
     >>> Elli
     >>>
     >>>
     >>>
     >>>> ________________________________
     >>>> From: Rob Vesse <[email protected] <mailto:[email protected]>>
     >>>> To: "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>; Elli Schwarz
     >>>> <[email protected] <mailto:[email protected]>>
     >>>> Sent: Tuesday, June 25, 2013 4:40 PM
     >>>> Subject: Re: JENA-378 Redux
     >>>>
     >>>>
     >>>>> I use the older stable jena-core and jena-arq 2.10.0 and
    jena-fuseki
     >>>>> 0.2.6
     >>>>
     >>>> The current stable releases are jena-core and jena-arq 2.10.1 and
     >>>> jena-fuseki 0.2.7
     >>>>
     >>>> Do you experience the problem with those versions?
     >>>>
     >>>> Fuseki config file or arguments used to start would be useful.
     >>>>
     >>>> Rob
     >>>>
     >>>>
     >>>> On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]
    <mailto:[email protected]>> wrote:
     >>>>
     >>>>> This past January, I reported a bug to this list which was
    recorded as
     >>>>> JENA-378. I'm now experiencing what appears to be the same
    problem,
     >>>>> where
     >>>>> [ ] syntax in an Insert script doesn't work when using
     >>>>> UpdateExecutionFactory:
     >>>>>
     >>>>>  String updateString = "INSERT {} WHERE { ?x ?p [ ?a  ?b ] }";
     >>>>>  UpdateRequest update = UpdateFactory.create(updateString);
     >>>>>
     >>>>>  UpdateProcessor up = UpdateExecutionFactory.createRemote(update,
     >>>>>      "http://localhost:3131/ds/update";);
     >>>>>  up.execute();
     >>>>>
     >>>>> The error is: 400 Encountered " "?" "? ""
     >>>>> caused by the client generating incorrect SPARQL with an
    extra ? (as
     >>>>> viewed from the Fuseki log):  INSERT { } WHERE  { ?x ?p ??0 .
    ??0 ?a
     >>>>> ?b
     >>>>> }
     >>>>>
     >>>>> This is with jena-core & jena-arg  2.10.2-SNAPSHOT, and with
     >>>>> jena-fuseki
     >>>>> 0.2.8-SNAPSHOT (compiled today).
     >>>>> --
     >>>>> Another problem I'm having which I can't track down is that the
     >>>>> following
     >>>>> code takes a VERY long time to execute (10 minutes):
     >>>>>
    DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update";).ge
     >>>>> tMo
     >>>>> del(modelName);
     >>>>>
     >>>>> With earlier versions of Fuseki, it would take seconds, with
    the same
     >>>>> data. The problem seems to be related to my Fuseki server
    instance
     >>>>> itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my
    client code,
     >>>>> since even if I use the older stable jena-core and jena-arq
    2.10.0 and
     >>>>> jena-fuseki 0.2.6, I also have the problem (but not if I
    connect it to
     >>>>> an
     >>>>> earlier Fuseki release). Upon debugging, it appears that for some
     >>>>> reason
     >>>>> the HTTP request itself is taking a long time to complete. In
    fact, I'm
     >>>>> not even getting anything in the Fuseki log for about a
    minute after
     >>>>> the
     >>>>> request is made, but once the request is made I immediately
    see a spike
     >>>>> in CPU usage on the server. This doesn't appear to be a
    network latency
     >>>>> issue since other access to the server isn't affected, it
    appears to be
     >>>>> just this call. It would seem that Fuseki is spinning its
    wheels on
     >>>>> something.
     >>>>>
     >>>>> I realize this may not be enough info for you to determine
    what is
     >>>>> causing the problem, but I don't know how else to track down
    the issue.
     >>>>> Using s-get I can get back the data quickly, which is strange
    since I
     >>>>> though it would be doing the same thing as the getModel().
     >>>>>
     >>>>> Thank you,
     >>>>> Elli
     >>>>
     >>>>
     >>>>
     >>>
     >

Re: Problem with Fuseki generating RDF/XML

Reply via email to