RE: [topbraid-users] tbc ttl file questions

Bohms, H.M. (Michel) Fri, 14 Jul 2017 01:14:23 -0700

See my previous reply.

The ordering-thing is really the same issue of the comment-thing (both being 
non-semantic RDF document aspects).


The comma-thing is something very different and (maybe you noticed 😊) I have a 
very strong opinion here.


I will try to reformulate one more time and then shut up.


  1.  A directed graph when represented as triples has bad space-complexity.
  2.  Turtle improves the space-complexity by systematically shortcutting the 
three components of a triple:
     *   Reuse of a subject            > ;
     *   Reuse of a predicate       > ,
     *   Reuse of a subject            > […]
  3.  All these things have nothing to do with semantics, only with syntactic 
sugar and associated space and time-complexities
  4.  a, b and c are fully orthogonal mechanisms which independently improve 
the space-complexity of any directed graph
  5.  Maximum improvement of space-complexity is obtained when all three 
mechanisms are applied
  6.  Applying no mechanism means the output is still Turtle but also just 
triples (no gain, no syntactic sugar)
  7.  Applying only a and c (as your writer does) is also Turtle but not making 
maximal use of the b-advantages ie using the commas

Note 1: time-complexity typically works the other way round 
(spacetime-complexity is constant), but that wasn’t the issue here. If 
time-complexity IS the issue you better forget Turtle and stick with triples 
(this is true in general, in theory it depends on the time-penalty balance 
between reading an item and processing an item: better space-complexity means 
less to process but the actual processing is normally more than you gain)
Note 2: a b  c all replace strings by predefined characters (; , []) so the 
end-result is always shorter than the original

In my eralier example:

Triples:
a1 b1 c1.
a1 b2 c2.
a1 b2 c3.
a1 b3 c4.
c4 b4 c5.

Applying a b and c:

a1 b1 c1; b2 c2, c3; b3 [c4 b4 c5].

Applying only b and c:
a1 b1 c1; b2 c2 ; b2 c3; b3 [c4 b4 c5].

As can be seen the second code is longer than first one.

David said earlier: “efficient” is in the eye of the beholder.

This of course related. “Efficient” only makes sense if you specify space or 
time. Here we talked space-complexity/efficiency.
It could be of course that for some reason you optimized space and time 
complexity by just leaving out “b”.

Gr Michel





Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist


T +31888663107
M +31630381220
E [email protected]<mailto:[email protected]>

Location<https://www.google.com/maps/place/TNO+-+Locatie+Delft+-+Stieltjesweg/@52.000788,4.3745183,17z/data=!3m1!4b1!4m5!3m4!1s0x47c5b58c52869997:0x56681566be3b8c88!8m2!3d52.000788!4d4.376707>



[cid:[email protected]]<http://www.tno.nl/>

This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.









From: [email protected] [mailto:[email protected]] 
On Behalf Of Irene Polikoff
Sent: vrijdag 14 juli 2017 00:16
To: [email protected]
Subject: Re: [topbraid-users] tbc ttl file questions

Dear Michel,

I do not have an assumption that RDF data is always in some database, but I do 
assume that the files get ingested for processing into some tool/system that is 
capable of deserializing them. With this, I do not really understand the issue 
with respect to commas and ordering.

I do understand the issue with losing comments recorded using ##. As a 
solution, I recommend capturing them directly in RDF. Or using some other, more 
specific, properties such as prov:wasDerivedFrom or rdfs:seeAlso.


Irene

On Jul 13, 2017, at 4:35 PM, Bohms, H.M. (Michel) 
<[email protected]<mailto:[email protected]>> wrote:

Dear Irene

Under the assumption:
“rdf data is always in an RDF database (accessible by SPARQL) and RDF Documents 
are ONLY for semantic exchange between such systems”
I fully agree 100 % with all your statements (and those of your colleagues).

So the issue is about the “assumption”.

I observe many situations where the primary/reference data (typically an 
ontology) is in an RDF Document. You might say “this not good” or “this won’t 
be in future” etc. but that is just a situation I observe a lot (and the actual 
RDF1.1 specifications are certainly not clear that this shouldn’t be the case 
either).

An example (other than my own cmo.ttl): https://www.w3.org/ns/prov.ttl

This is a reference specification published on the web as RDF document (in 
various dereferenceable serialisations). It can be imported in any RDF DB you 
like but the reference spec IS an RDF Document (so this is another status then 
“just for exchange”).

In this ontology you find many comments like:

“## Definitions from other ontologies”

Or





“# The following was imported from 
http://www.w3.org/ns/prov-dc#<http://www.w3.org/ns/prov-dc>”





Now image you are the owner of such a reference ontology as RDF Document.

You should be aware that when editing your ontology, most tools (other than 
text editors) will delete both order and comments (in general: all non-semantic 
aspects of that document).

When I say “delete” I of course mean the sequence (not-parse/record+write) 
which effectively deletes the comments and specific order after editing.



The current formal specifications do not tell us much about the rightness of 
assumptions above. Turtle specifies a comment mechanism but does not say ie “be 
careful, comments are only relevant when writing files, after parsing all will 
be lost” or something similar.



I also agree (with Holger) that if that second interpretation (RDF doc as 
primary/reference) is assumed it is quite a job to record the non-semantic info 
and reuse that when writing out again. (in the ISO STEP world the same issue is 
relevant and some tools actually retain the non-semantic data in STEP Physical 
Files to support more deterministic documents. In some situations this 
simplifies model comparisons; I am not saying this is the right approach, only 
saying it happens).

I hope I made the issue at least more clear now.

Greetings, Michel




Dr. ir. H.M. (Michel) Böhms
Senior Data Scientist



T +31888663107
M +31630381220
E [email protected]<mailto:[email protected]>

Location<https://www.google.com/maps/place/TNO+-+Locatie+Delft+-+Stieltjesweg/@52.000788,4.3745183,17z/data=!3m1!4b1!4m5!3m4!1s0x47c5b58c52869997:0x56681566be3b8c88!8m2!3d52.000788!4d4.376707>



<image001.gif><http://www.tno.nl/>

This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.









From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Irene Polikoff
Sent: donderdag 13 juli 2017 20:32
To: [email protected]<mailto:[email protected]>
Subject: Re: [topbraid-users] tbc ttl file questions

Michel,

Serializations and deserialization provide a way for data to be translated into 
a format that could be used for transmission, interchange, storage in a file 
system, etc. with the ability for it to be later reconstructed to create 
semantically identical clone of the data.

The goal of RDF serializations and tool interoperability is to ensure that if 
tool A produces a serialization of a graph X, tool B can read it in and 
understand it as graph X. Tool B can then, in its turn, produce serialization 
of graph X, tool A can import it and it is still the same graph. The 
serialization output of A may not look exactly the same as the serialization 
output of B, but their semantic interpretation is always the same.

Serialization/deserialization process is not intended to ensure that the 
sequence of bytes in a file will be exactly the same.  In case of both RDF/XML 
and Turtle format, there are several syntactic variations for representing the 
same information. The simplest RDF serialization is N-Triple. There is little 
room in it for syntactic variations as it just contains triple statements. 
However, even with that simplicity, there are variants since the order of 
statements may vary. The bottom line is that if you are using serializations in 
the interchange and parse them to deserialize for use in some target system, 
you need a parser that will understand what the serialization means 
semantically and will not rely purely on the byte sequence.

If TBC parser was ignoring something that captured semantics of data, this 
would be a bug. I do not think it is the case. Comma is not ignored, it is 
correctly understood by deserialization when data is imported into TBC. 
“Deleting it” is not even a concept because once data is deserialized, comma no 
longer exists. We now have a graph. When you save it, it is serialized anew - 
without any memory or consideration of how its serialization looked when it 
came in. As long as the serialization still represents semantically identical 
object, it is correct.

Regards,

Irene Polikoff

On Jul 13, 2017, at 4:13 AM, Bohms, H.M. (Michel) 
<[email protected]<mailto:[email protected]>> wrote:

Seriously, if these low-level details of the TTL syntax are relevant to you, 
just use text editors.


  *   Yes, low-level syntax issues ARE very relevant. They are the fundament 
under all we do in the end. When convincing our client to move from SPFF or XML 
to RDF and its serializations they expect implementations that 100% support 
these specs. If a comment is a feature of that spec, if a comma is a feature of 
that spec they do not expect that a parser and or writer ignores or even 
deletes them. Anyway as said before, lets agree to disagree (although your 
views in these matters highly surprise me I must say).


--
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

RE: [topbraid-users] tbc ttl file questions

Reply via email to