Re: JSON-LD writer and the Titanium RdfDataset

Nicholas Car Thu, 11 Jul 2024 01:54:45 -0700

We will be getting close to that. We are about to make a series of updates to 
RDFLib's JSON-LD capabilities and some sort of frame interpretation as you 
describe is probably needed. It think we might use new fundamental capabilities 
in RDFLib to make RDFrame to the JSON-LD Frame to SPARQL translation.


Nick




On Thursday, 11 July 2024 at 18:09, Holger Knublauch <hol...@topquadrant.com> 
wrote:

> Yes, using shapes to describe the closure is one option. I wonder if anyone 
> has a similar algorithm that takes a JSON-LD frame and generates SPARQL 
> queries for the triples that are visited by the frame, i.e. directly working 
> on the frame JSON only?
> 
> Holger
> 
> > On 11 Jul 2024, at 7:18 AM, Nicholas Car n...@kurrawong.net wrote:
> > 
> > Hi Holger and all,
> > 
> > We do something similar to what I think you want done here with RDFrame, a 
> > prototype tool we use so our APIs can extract triples from a store 
> > according to a (SHACL or CQL) frame:
> > 
> > https://rdframe.dev.kurrawong.ai/
> > 
> > I suspect what we are doing is early days/simple stuff compared to what you 
> > need but the principle of frame -> SPARQL seems relevant.
> > 
> > Cheers, Nick
> > 
> > On Thursday, 11 July 2024 at 15:03, Holger Knublauch hol...@topquadrant.com 
> > wrote:
> > 
> > > Hi Andy,
> > > 
> > > thanks for your response. To clarify, it would be a scenario such as a 
> > > TDB with 1 million triples and the request is to produce a JSON-LD 
> > > document from the "closure" around a given resource (in TopBraid's Source 
> > > Code panel when the user navigates to a resource or through API calls). 
> > > In other words: input is a Jena Graph, a start node and a JSON-LD frame 
> > > document, and the output should be a JSON-LD describing the node and all 
> > > reachable triples described by the frame.
> > > 
> > > So it sounds like Titanium cannot really be used for this as its 
> > > algorithms can only operate on their own in-memory copy of a graph, and 
> > > we cannot copy all 1 million triples into memory each time.
> > > 
> > > Holger
> > > 
> > > > On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote:
> > > > 
> > > > Hi Holger,
> > > > 
> > > > How big is the database?
> > > > What sort of framing are you aiming to do?
> > > > Using framing to select some from a large database doesn't feel like 
> > > > the way to extract triples as you've discovered. Framing can touch 
> > > > anywhere in the JSON document.
> > > > 
> > > > This recent thread is relevant --
> > > > https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl
> > > > 
> > > > That JSON-LD file is 280 million triples.
> > > > 
> > > > It's structure is
> > > > 
> > > > [{"@context": <url> , ... }
> > > > ,{"@context": <url> , ... }
> > > > ,{"@context": <url> , ... }
> > > > ...
> > > > ,{"@context": <url> , ... }
> > > > ]
> > > > 
> > > > 9 million array entries.
> > > > 
> > > > It looks to me like it has been produced by text manipulation, taking 
> > > > each entity, writing a separate, self-contained JSON-LD object then, by 
> > > > text, making a big array. That, or a tool that is designed specially to 
> > > > write large JSON-LD. e.g. the outer array.
> > > > 
> > > > That's the same context URL and would be a denial of service attack 
> > > > except Titanium reads the whole file as JSON and runs out of space.
> > > > 
> > > > The JSON-LD algorithms do assume the whole document is available. 
> > > > Titanium is a faithful implementation of the spec.
> > > > 
> > > > It is hard to work with.
> > > > 
> > > > In JSON the whole object needs to be seen - repeated member names (and 
> > > > facto - last duplicate wins) and "@context" being at the end are 
> > > > possible. Cases that don't occur in XML. Streaming JSON or JSON-LD is 
> > > > going to have to relax the strictness somehow.
> > > > 
> > > > JSON-LD is designed around the assumption of small/medium sized data.
> > > > 
> > > > And this affects writing. That large file looks like it was specially 
> > > > written or at least with a tool that is designed specially to write 
> > > > large JSON-LD. e.g. the outer array.
> > > > 
> > > > Jena could do with some RDFFormats + writers for JSONLD at scale. Oen 
> > > > obvious one is the one extending WriterStreamRDFBatched where a batch 
> > > > is the subject and its immediate triples, then write similar to the 
> > > > case above except in a way that is one context then the array is with 
> > > > "@graph".
> > > > 
> > > > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects
> > > > 
> > > > That doesn't solve the reading side - a companion reader would be 
> > > > needed that stream-reads JSON.
> > > > 
> > > > Contributions welcome!
> > > > 
> > > > Andy
> > > > 
> > > > On 10/07/2024 12:36, Holger Knublauch wrote:
> > > > 
> > > > > I am working on serializing partial RDF graphs to JSON-LD using the 
> > > > > Jena-Titanium bridge.
> > > > > Problem: For Titanium to "see" the triples it needs to have a 
> > > > > complete copy. See JenaTitanion.convert which copies all Jena triples 
> > > > > into a corresponding RdfDatset. This cannot scale if the graph is 
> > > > > backed by a database, and we only want to export certain triples (esp 
> > > > > for Framing). Titanium's RdfGraph does not provide an incremental 
> > > > > function similar to Graph.find() but only returns a complete Java 
> > > > > List of all triples.
> > > > > Has anyone here run into the same problem and what would be a 
> > > > > solution?
> > > > > I guess one solution would be an incremental algorithm that "walks" a 
> > > > > @context and JSON-LD frame document to collect all required Jena 
> > > > > triples, producing a sub-graph that can then be sent to Titanium. But 
> > > > > the complexity of such an algorithm is similar to having to implement 
> > > > > my own JSON-LD engine, which feels like an overkill.
> > > > > Holger

Re: JSON-LD writer and the Titanium RdfDataset

Reply via email to