And this one also doesn't seem to have made it to the list.
Bob
From: Sean Barnum <[email protected]>
Date: Tuesday, August 1, 2023 at 2:06 PM
To: [email protected] <[email protected]>
Subject: FW: Some fodder for the discussion of blank nodesI just noticed that this has still not shown up in my inbox though it looks like it made it out of my outbox.
This is the email I was referring to during the call thinking that you all had it already.
Sorry about that.
Hopefully it makes it past the spdx list server this time.
sean
From: Sean Barnum <[email protected]>
Date: Tuesday, August 1, 2023 at 11:50 AM
To: [email protected] <[email protected]>
Subject: Some fodder for the discussion of blank nodesAll,
I apologize for the lateness of this. I threw it together yesterday and sent it to the list but just noticed that it never left my outbox so I must have messed something up.
This is a VERY simple overview of some of the aspects of blank nodes we should consider when discussing whether they should be used for SPDX 3.0
It is VERY informal and quickly thrown together so please do not interpret it as anything too rigorous. Rather than me spending time writing up rigorous argumentation I instead took an approach of pulling together several reference links addressing various aspects and let those do the talking with only a simple summarization of the aspect issue from me.
- Some VERY quick and short notes on the question of using blank nodes or not
- The below short outline includes several relevant links to resources on various aspects of this issue. All of these links were found within 20 mins of very simple Google querying and all were within the first 5-10 results for each Google query.
- There is broad consensus on the existence of significant issues and challenges with using blank nodes. 15-30 mins of googling will yield scores of papers, blog posts, articles, etc. calling out various reasons that blank nodes are problematic and should be avoided wherever possible in the large majority of situations. Defined semantics and specifications regarding Bnodes are inconsistent and contradictory leading to inconsistency between tools, ambiguity in how they will be processed, interpreted or queried.
- When blank nodes are used it is typically for the convenience of the producer but often comes at significant cost to the consumer in the form of ambiguity, uncertainty, complexity, and resources (time and computing resources)
- IF they are decided to be used they are ONLY for a single scope of a single datastore or single serialized document and NOT for global or cross-scope use. This is explicitly stated in all of the W3C specs dealing with Bnodes. Using them for cross-scope use as SPDX 3.0 is intended leads to significant potential data integrity issues.
- Avoiding these significant potential issues typically requires skolemization (replacing the localized ids with globally unique IRIs) of the Bnodes. This extra effort is forced on the consumer and is often done by processors and graph stores as part of deserialization/ingestion. However, due to the inconsistencies in the specs regarding Bnodes this is not consistent. Some processors and stores do not perform skolemization an simply utilize the localized Bnode ids (especially if they are producer asserted in any way). This leads to significant integrity issues as these ids collide (simple example is even explicitly in some W3C docs/specs and on the Wikipedia page) and increases significantly with the volume of cross-scope content ingested. Skolemization also does not provide any id-related context for the source of the nodes such as that provided by namespaces in producer specified IRIs.
- Bnodes also have very significant issues for SPARQL, the definitive standard mechanism for querying rdf graphs. The two do NOT play well together at all due to inherent issues in Bnode design and inconsistencies in the rdf specs related to Bnodes. Many queries can lead to inconsistent and non-integrous results. Various academics and companies have offered workarounds and schemes to attempt to address this disconnect but they all come at significant compute complexity and cost. These issues increase significantly with the volume of Bnodes in the overall graph being queried.
- Bnodes also cause significant issues with semantic entailment (ability to determine full semantic integrity and correctness) of the graph. Entailment is required for any higher-order semantic inferencing and analysis. A couple of academics have offered papers that purport to mathematically prove that entailment using Bnodes is NPComplete though there is broad consensus that while it is likely possible it is almost always impractical and problematic.
Sean
Links:
You receive all messages sent to this group.
View/Reply Online (#5279) |
Reply To Sender
| Reply To Group
|
Mute This Topic
| New Topic
Your Subscription |
Contact Group Owner |
Unsubscribe
[[email protected]]
