See my previous reply.
The ordering-thing is really the same issue of the comment-thing (both being
non-semantic RDF document aspects).
The comma-thing is something very different and (maybe you noticed đ) I have a
very strong opinion here.
I will try to reformulate one more time and then shut up.
1. A directed graph when represented as triples has bad space-complexity.
2. Turtle improves the space-complexity by systematically shortcutting the
three components of a triple:
* Reuse of a subject > ;
* Reuse of a predicate > ,
* Reuse of a subject > [âŠ]
3. All these things have nothing to do with semantics, only with syntactic
sugar and associated space and time-complexities
4. a, b and c are fully orthogonal mechanisms which independently improve
the space-complexity of any directed graph
5. Maximum improvement of space-complexity is obtained when all three
mechanisms are applied
6. Applying no mechanism means the output is still Turtle but also just
triples (no gain, no syntactic sugar)
7. Applying only a and c (as your writer does) is also Turtle but not making
maximal use of the b-advantages ie using the commas
Note 1: time-complexity typically works the other way round
(spacetime-complexity is constant), but that wasnât the issue here. If
time-complexity IS the issue you better forget Turtle and stick with triples
(this is true in general, in theory it depends on the time-penalty balance
between reading an item and processing an item: better space-complexity means
less to process but the actual processing is normally more than you gain)
Note 2: a b c all replace strings by predefined characters (; , []) so the
end-result is always shorter than the original
In my eralier example:
Triples:
a1 b1 c1.
a1 b2 c2.
a1 b2 c3.
a1 b3 c4.
c4 b4 c5.
Applying a b and c:
a1 b1 c1; b2 c2, c3; b3 [c4 b4 c5].
Applying only b and c:
a1 b1 c1; b2 c2 ; b2 c3; b3 [c4 b4 c5].
As can be seen the second code is longer than first one.
David said earlier: âefficientâ is in the eye of the beholder.
This of course related. âEfficientâ only makes sense if you specify space or
time. Here we talked space-complexity/efficiency.
It could be of course that for some reason you optimized space and time
complexity by just leaving out âbâ.
Gr Michel
Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist
T +31888663107
M +31630381220
E [email protected]<mailto:[email protected]>
Location<https://www.google.com/maps/place/TNO+-+Locatie+Delft+-+Stieltjesweg/@52.000788,4.3745183,17z/data=!3m1!4b1!4m5!3m4!1s0x47c5b58c52869997:0x56681566be3b8c88!8m2!3d52.000788!4d4.376707>
[cid:[email protected]]<http://www.tno.nl/>
This message may contain information that is not intended for you. If you are
not the addressee or if this message was sent to you by mistake, you are
requested to inform the sender and delete the message. TNO accepts no liability
for the content of this e-mail, for the manner in which you use it and for
damage of any kind resulting from the risks inherent to the electronic
transmission of messages.
From: [email protected] [mailto:[email protected]]
On Behalf Of Irene Polikoff
Sent: vrijdag 14 juli 2017 00:16
To: [email protected]
Subject: Re: [topbraid-users] tbc ttl file questions
Dear Michel,
I do not have an assumption that RDF data is always in some database, but I do
assume that the files get ingested for processing into some tool/system that is
capable of deserializing them. With this, I do not really understand the issue
with respect to commas and ordering.
I do understand the issue with losing comments recorded using ##. As a
solution, I recommend capturing them directly in RDF. Or using some other, more
specific, properties such as prov:wasDerivedFrom or rdfs:seeAlso.
Irene
On Jul 13, 2017, at 4:35 PM, Bohms, H.M. (Michel)
<[email protected]<mailto:[email protected]>> wrote:
Dear Irene
Under the assumption:
ârdf data is always in an RDF database (accessible by SPARQL) and RDF Documents
are ONLY for semantic exchange between such systemsâ
I fully agree 100 % with all your statements (and those of your colleagues).
So the issue is about the âassumptionâ.
I observe many situations where the primary/reference data (typically an
ontology) is in an RDF Document. You might say âthis not goodâ or âthis wonât
be in futureâ etc. but that is just a situation I observe a lot (and the actual
RDF1.1 specifications are certainly not clear that this shouldnât be the case
either).
An example (other than my own cmo.ttl): https://www.w3.org/ns/prov.ttl
This is a reference specification published on the web as RDF document (in
various dereferenceable serialisations). It can be imported in any RDF DB you
like but the reference spec IS an RDF Document (so this is another status then
âjust for exchangeâ).
In this ontology you find many comments like:
â## Definitions from other ontologiesâ
Or
â# The following was imported from
http://www.w3.org/ns/prov-dc#<http://www.w3.org/ns/prov-dc>â
Now image you are the owner of such a reference ontology as RDF Document.
You should be aware that when editing your ontology, most tools (other than
text editors) will delete both order and comments (in general: all non-semantic
aspects of that document).
When I say âdeleteâ I of course mean the sequence (not-parse/record+write)
which effectively deletes the comments and specific order after editing.
The current formal specifications do not tell us much about the rightness of
assumptions above. Turtle specifies a comment mechanism but does not say ie âbe
careful, comments are only relevant when writing files, after parsing all will
be lostâ or something similar.
I also agree (with Holger) that if that second interpretation (RDF doc as
primary/reference) is assumed it is quite a job to record the non-semantic info
and reuse that when writing out again. (in the ISO STEP world the same issue is
relevant and some tools actually retain the non-semantic data in STEP Physical
Files to support more deterministic documents. In some situations this
simplifies model comparisons; I am not saying this is the right approach, only
saying it happens).
I hope I made the issue at least more clear now.
Greetings, Michel
Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist
T +31888663107
M +31630381220
E [email protected]<mailto:[email protected]>
Location<https://www.google.com/maps/place/TNO+-+Locatie+Delft+-+Stieltjesweg/@52.000788,4.3745183,17z/data=!3m1!4b1!4m5!3m4!1s0x47c5b58c52869997:0x56681566be3b8c88!8m2!3d52.000788!4d4.376707>
<image001.gif><http://www.tno.nl/>
This message may contain information that is not intended for you. If you are
not the addressee or if this message was sent to you by mistake, you are
requested to inform the sender and delete the message. TNO accepts no liability
for the content of this e-mail, for the manner in which you use it and for
damage of any kind resulting from the risks inherent to the electronic
transmission of messages.
From: [email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Irene Polikoff
Sent: donderdag 13 juli 2017 20:32
To: [email protected]<mailto:[email protected]>
Subject: Re: [topbraid-users] tbc ttl file questions
Michel,
Serializations and deserialization provide a way for data to be translated into
a format that could be used for transmission, interchange, storage in a file
system, etc. with the ability for it to be later reconstructed to create
semantically identical clone of the data.
The goal of RDF serializations and tool interoperability is to ensure that if
tool A produces a serialization of a graph X, tool B can read it in and
understand it as graph X. Tool B can then, in its turn, produce serialization
of graph X, tool A can import it and it is still the same graph. The
serialization output of A may not look exactly the same as the serialization
output of B, but their semantic interpretation is always the same.
Serialization/deserialization process is not intended to ensure that the
sequence of bytes in a file will be exactly the same. In case of both RDF/XML
and Turtle format, there are several syntactic variations for representing the
same information. The simplest RDF serialization is N-Triple. There is little
room in it for syntactic variations as it just contains triple statements.
However, even with that simplicity, there are variants since the order of
statements may vary. The bottom line is that if you are using serializations in
the interchange and parse them to deserialize for use in some target system,
you need a parser that will understand what the serialization means
semantically and will not rely purely on the byte sequence.
If TBC parser was ignoring something that captured semantics of data, this
would be a bug. I do not think it is the case. Comma is not ignored, it is
correctly understood by deserialization when data is imported into TBC.
âDeleting itâ is not even a concept because once data is deserialized, comma no
longer exists. We now have a graph. When you save it, it is serialized anew -
without any memory or consideration of how its serialization looked when it
came in. As long as the serialization still represents semantically identical
object, it is correct.
Regards,
Irene Polikoff
On Jul 13, 2017, at 4:13 AM, Bohms, H.M. (Michel)
<[email protected]<mailto:[email protected]>> wrote:
Seriously, if these low-level details of the TTL syntax are relevant to you,
just use text editors.
* Yes, low-level syntax issues ARE very relevant. They are the fundament
under all we do in the end. When convincing our client to move from SPFF or XML
to RDF and its serializations they expect implementations that 100% support
these specs. If a comment is a feature of that spec, if a comma is a feature of
that spec they do not expect that a parser and or writer ignores or even
deletes them. Anyway as said before, lets agree to disagree (although your
views in these matters highly surprise me I must say).
--
You received this message because you are subscribed to the Google Groups
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.