On 28/06/13 14:19, Elli Schwarz wrote:
Andy,
As always, I really appreciate your prompt response and fixes. I
continue to be amazed at how quickly Jena responds to bugs and even
feature requests.
Sometimes its easier to just do the change rather than the paper work
and risk forgetting it :-)
This wasn't an intentional change.
And again, jena-text integration is crucial for my
project, so I greatly appreciate the integration of this work to replace
Fuseki/LARQ.
I tried rebuilding this morning, and yes, efficiency is greatly improved
by not using RDF/XML-ABBREV. (It appears that s-get uses Turtle by
default now...)
Yes :-)
BTW, I'm a big fan of JSON-LD as eclipsing RDF/XML. I currently use
jsonld-java for that, and I believe you mentioned to me on that forum
that you hope to have that fully integrated into Jena at some point. The
biggest selling point for me is that I am able to give my data as
JSON-LD to customers and they are able to adapt to use it very easily as
regular JSON, without them even knowing that they are actually working
with RDF (though I feel a bit guilty about the subterfuge ;-).
There is an adapter at
https://github.com/afs/jena-jsonld
with example of integration (== call JenaJSONLD.init())
which is using
https://github.com/jsonld-java/jsonld-java
to do all the real JSON-LD work.
One issue with JSON-LD is that if you work with it as JSON then it may
not remain JSON-LD/RDF. It's OK to read but an update to the JSON does
not necessarily remain correct JSON-LD.
JSON-LD does not scale. The processing model assumes you have the whole
document available. It may be possible to write a direct
JSON-LD->triples parser which is streaming but some of the other
algorithms work on documents. jsonld-java builds the JSON-LD in-memory
first.
Andy
-Elli
------------------------------------------------------------------------
*From:* Andy Seaborne <[email protected]>
*To:* [email protected]
*Sent:* Friday, June 28, 2013 6:17 AM
*Subject:* Re: Problem with Fuseki generating RDF/XML
Hi there,
I've switched back SPARQL Graph Store protocol GET to use plain RDF/XML.
Details:
The default when using RIOT to write in Lang.RDFXML is to use the
pretty
form. i.e. when using RDFDataMgr.write(model,Lang.RDFXML). RIOT
I/O is
not automatically used if available.
Fuseki uses new style RDFDataMgr, not model.write so got affetced by
the
change.
Writing model.write() isn't affected.
Yes - RDF/XML-ABBREV is expensive. I'm not completely sure why - the
Turtle writer is doing a similar, but not identical, analysis of the
model before writing. However, the RDF/XML-ABBREV writer has more
choices and more options to consider.
>> is anyone really using
>> RDF/XML anymore as a human-readable format anyway?
Absolutely!
But, today, it's the standard. Tomorrow, it won't be the only choice
and I'm guessing that Turtle-only toolkits will emerge.
Next ...
DatasetAccessor:
It does not seem to be setting the accept header at all so it gets the
default. Which is application/rdf+xml.
I've recorded the need to set the accept header to a list based on
efficiency as:
https://issues.apache.org/jira/browse/JENA-481
I thinking the order should be N-triples, Turtle, RDF/XML, "whatever
you
can give me".
For reference, the accept string for reading RDF with
RDFDataMgr.loadModel(URL) or model.read(URL) is currently:
text/turtle,application/rdf+xml;q=0.9,application/xml;q=0.8,*/*;q=0.5;
Maybe that should include "application/n-triples" - including the
original MIME type of text/plain is distinctly unhelpful.
Andy
On 27/06/13 19:56, Rob Vesse wrote:
> Andy can probably give you a definitive answer here
>
> I know that there were significant improvements to the RDF output
> infrastructure made in 2.10.1 so my guess is that somehow the default
> RDF/XML output got switched as part of this upgrade (not necessarily
> intentionally).
>
> If this is the case Andy can likely make the fix easily, I
however don't
> know where to look for this setting.
>
> Rob
>
>
> On 6/27/13 11:38 AM, "Elli Schwarz" <[email protected]
<mailto:[email protected]>> wrote:
>
>> I think I may have tracked down what is causing my slow
performance of
>> GET with the new Fuseki 0.28 snapshot. Comparing the output of
s-get for
>> the same data from the latest Fuseki 0.28 snapshot, and from the
0.26
>> release, I discovered that the 0.28 snapshot is creating the XML in
>> hierarchical form, with nesting of elements (RDF/XML-ABBREV). In
Fuseki
>> 0.26, it would output the RDF in the regular flattened RDF/XML
format.
>> Obviously, creating the flattened form is much more efficient.
>>
>> While I understand that RDF/XML-ABBREV is more human readable,
there's a
>> big price to pay in efficiency, at least for my data. In my
case, I'm
>> accessing my Fuseki endpoint via datasetAccessor.getModel(), and
as far
>> as I know, there's no way for me to tell Fuseki through this API
that I
>> want the data to be serialized as N-TRIPLES (since it's just
going to be
>> loaded in a Jena model anyway and not read by a human). Is there
a way I
>> can control how Fuseki serializes by default? And why was the
default
>> serialization format changed to RDF/XML-ABBREV - is anyone
really using
>> RDF/XML anymore as a human-readable format anyway? ;-)
>>
>> I really appreciate any advice, workarounds, or fixes for this
issue. I
>> can't really switch back to the earlier Fuseki versions anymore,
since
>> the new jena-text makes my life so much easier since I no longer
have to
>> worry about manually reindexing after SPARQL Update, like I did with
>> Fuseki and LARQ. Thanks for incorporating jena-text!
>>
>> Thank you,
>> Elli
>>
>>
>>
>>> ________________________________
>>> From: Elli Schwarz <[email protected]
<mailto:[email protected]>>
>>> To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
>>> Sent: Wednesday, June 26, 2013 9:48 AM
>>> Subject: Problem with Fuseki generating RDF/XML
>>>
>>>
>>> Rob,
>>>
>>> (This email previously had the subject JENA-378 Redux)
>>>
>>> I think I tracked down the problem with getModel() a bit more.
Using
>>> s-get, I can get data back as TTL immediately:
>>> ./s-get http://localhost:3131/ds/data
<http://localhost:3131/ds/data>http://192.168.6.37/graph/uri_data
>>>
>>>
>>> If I modify the s-get script to get results as RDF/XML, then it
takes
>>> several minutes for Fuseki 0.28-SNAPSHOT to respond.
>>>
>>> I start Fuseki 0.28 with this command (Fuseki 0.26 is started
similarly,
>>> but with the config-tdb.ttl assembler):
>>> /usr/bin/java -Dlog4j.configuration=log4j.properties -Xmx3200M -jar
>>> /opt/jena-2.10/jena-fuseki-0.2.8-SNAPSHOT/fuseki-server.jar
--update
>>> --config=config-tdb-text.ttl --port=3131
>>>
>>>
>>> If I point the same modified s-get script to the Fuseki 0.26
release,
>>> the RDF/XML comes back immediately. My guess is that the
>>>
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/data").getMod
>>> el(modelName) command I use gets data back as RDF/XML, and for some
>>> reason Fuseki 0.28 takes a long time to generate RDF/XML. Any
ideas as
>>> to what changed in the latest version of Fuseki that would
cause this
>>> problem? Is there any way I can set Fuseki (or the client
>>> DatasetAccessor) to use TTL serialization?
>>>
>>> (BTW, I created JENA-479 for the other bug I discovered with SPARQL
>>> Insert scripts.)
>>>
>>> Thank you very much for your help,
>>> Elli
>>>
>>>
>>>
>>>> ________________________________
>>>> From: Rob Vesse <[email protected] <mailto:[email protected]>>
>>>> To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>; Elli Schwarz
>>>> <[email protected] <mailto:[email protected]>>
>>>> Sent: Tuesday, June 25, 2013 4:40 PM
>>>> Subject: Re: JENA-378 Redux
>>>>
>>>>
>>>>> I use the older stable jena-core and jena-arq 2.10.0 and
jena-fuseki
>>>>> 0.2.6
>>>>
>>>> The current stable releases are jena-core and jena-arq 2.10.1 and
>>>> jena-fuseki 0.2.7
>>>>
>>>> Do you experience the problem with those versions?
>>>>
>>>> Fuseki config file or arguments used to start would be useful.
>>>>
>>>> Rob
>>>>
>>>>
>>>> On 6/25/13 1:35 PM, "Elli Schwarz" <[email protected]
<mailto:[email protected]>> wrote:
>>>>
>>>>> This past January, I reported a bug to this list which was
recorded as
>>>>> JENA-378. I'm now experiencing what appears to be the same
problem,
>>>>> where
>>>>> [ ] syntax in an Insert script doesn't work when using
>>>>> UpdateExecutionFactory:
>>>>>
>>>>> String updateString = "INSERT {} WHERE { ?x ?p [ ?a ?b ] }";
>>>>> UpdateRequest update = UpdateFactory.create(updateString);
>>>>>
>>>>> UpdateProcessor up = UpdateExecutionFactory.createRemote(update,
>>>>> "http://localhost:3131/ds/update");
>>>>> up.execute();
>>>>>
>>>>> The error is: 400 Encountered " "?" "? ""
>>>>> caused by the client generating incorrect SPARQL with an
extra ? (as
>>>>> viewed from the Fuseki log): INSERT { } WHERE { ?x ?p ??0 .
??0 ?a
>>>>> ?b
>>>>> }
>>>>>
>>>>> This is with jena-core & jena-arg 2.10.2-SNAPSHOT, and with
>>>>> jena-fuseki
>>>>> 0.2.8-SNAPSHOT (compiled today).
>>>>> --
>>>>> Another problem I'm having which I can't track down is that the
>>>>> following
>>>>> code takes a VERY long time to execute (10 minutes):
>>>>>
DatasetAccessorFactory.createHTTP("http://localhost:3131/ds/update").ge
>>>>> tMo
>>>>> del(modelName);
>>>>>
>>>>> With earlier versions of Fuseki, it would take seconds, with
the same
>>>>> data. The problem seems to be related to my Fuseki server
instance
>>>>> itself, which is 0.2.8-SNAPSHOT (r1496513), and not to my
client code,
>>>>> since even if I use the older stable jena-core and jena-arq
2.10.0 and
>>>>> jena-fuseki 0.2.6, I also have the problem (but not if I
connect it to
>>>>> an
>>>>> earlier Fuseki release). Upon debugging, it appears that for some
>>>>> reason
>>>>> the HTTP request itself is taking a long time to complete. In
fact, I'm
>>>>> not even getting anything in the Fuseki log for about a
minute after
>>>>> the
>>>>> request is made, but once the request is made I immediately
see a spike
>>>>> in CPU usage on the server. This doesn't appear to be a
network latency
>>>>> issue since other access to the server isn't affected, it
appears to be
>>>>> just this call. It would seem that Fuseki is spinning its
wheels on
>>>>> something.
>>>>>
>>>>> I realize this may not be enough info for you to determine
what is
>>>>> causing the problem, but I don't know how else to track down
the issue.
>>>>> Using s-get I can get back the data quickly, which is strange
since I
>>>>> though it would be doing the same thing as the getModel().
>>>>>
>>>>> Thank you,
>>>>> Elli
>>>>
>>>>
>>>>
>>>
>