Re: JSON-LD writer and the Titanium RdfDataset

Nicholas Car Wed, 10 Jul 2024 22:18:35 -0700

Hi Holger and all,

We do something similar to what I think you want done here with RDFrame, a 
prototype tool we use so our APIs can extract triples from a store according to 
a (SHACL or CQL) frame:


https://rdframe.dev.kurrawong.ai/

I suspect what we are doing is early days/simple stuff compared to what you 
need but the principle of frame -> SPARQL seems relevant.

Cheers, Nick




On Thursday, 11 July 2024 at 15:03, Holger Knublauch <hol...@topquadrant.com> 
wrote:

> Hi Andy,
> 
> thanks for your response. To clarify, it would be a scenario such as a TDB 
> with 1 million triples and the request is to produce a JSON-LD document from 
> the "closure" around a given resource (in TopBraid's Source Code panel when 
> the user navigates to a resource or through API calls). In other words: input 
> is a Jena Graph, a start node and a JSON-LD frame document, and the output 
> should be a JSON-LD describing the node and all reachable triples described 
> by the frame.
> 
> So it sounds like Titanium cannot really be used for this as its algorithms 
> can only operate on their own in-memory copy of a graph, and we cannot copy 
> all 1 million triples into memory each time.
> 
> Holger
> 
> > On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote:
> > 
> > Hi Holger,
> > 
> > How big is the database?
> > What sort of framing are you aiming to do?
> > Using framing to select some from a large database doesn't feel like the 
> > way to extract triples as you've discovered. Framing can touch anywhere in 
> > the JSON document.
> > 
> > This recent thread is relevant --
> > https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl
> > 
> > That JSON-LD file is 280 million triples.
> > 
> > It's structure is
> > 
> > [{"@context": <url> , ... }
> > ,{"@context": <url> , ... }
> > ,{"@context": <url> , ... }
> > ...
> > ,{"@context": <url> , ... }
> > ]
> > 
> > 9 million array entries.
> > 
> > It looks to me like it has been produced by text manipulation, taking each 
> > entity, writing a separate, self-contained JSON-LD object then, by text, 
> > making a big array. That, or a tool that is designed specially to write 
> > large JSON-LD. e.g. the outer array.
> > 
> > That's the same context URL and would be a denial of service attack except 
> > Titanium reads the whole file as JSON and runs out of space.
> > 
> > The JSON-LD algorithms do assume the whole document is available. Titanium 
> > is a faithful implementation of the spec.
> > 
> > It is hard to work with.
> > 
> > In JSON the whole object needs to be seen - repeated member names (and 
> > facto - last duplicate wins) and "@context" being at the end are possible. 
> > Cases that don't occur in XML. Streaming JSON or JSON-LD is going to have 
> > to relax the strictness somehow.
> > 
> > JSON-LD is designed around the assumption of small/medium sized data.
> > 
> > And this affects writing. That large file looks like it was specially 
> > written or at least with a tool that is designed specially to write large 
> > JSON-LD. e.g. the outer array.
> > 
> > Jena could do with some RDFFormats + writers for JSONLD at scale. Oen 
> > obvious one is the one extending WriterStreamRDFBatched where a batch is 
> > the subject and its immediate triples, then write similar to the case above 
> > except in a way that is one context then the array is with "@graph".
> > 
> > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects
> > 
> > That doesn't solve the reading side - a companion reader would be needed 
> > that stream-reads JSON.
> > 
> > Contributions welcome!
> > 
> > Andy
> > 
> > On 10/07/2024 12:36, Holger Knublauch wrote:
> > 
> > > I am working on serializing partial RDF graphs to JSON-LD using the 
> > > Jena-Titanium bridge.
> > > Problem: For Titanium to "see" the triples it needs to have a complete 
> > > copy. See JenaTitanion.convert which copies all Jena triples into a 
> > > corresponding RdfDatset. This cannot scale if the graph is backed by a 
> > > database, and we only want to export certain triples (esp for Framing). 
> > > Titanium's RdfGraph does not provide an incremental function similar to 
> > > Graph.find() but only returns a complete Java List of all triples.
> > > Has anyone here run into the same problem and what would be a solution?
> > > I guess one solution would be an incremental algorithm that "walks" a 
> > > @context and JSON-LD frame document to collect all required Jena triples, 
> > > producing a sub-graph that can then be sent to Titanium. But the 
> > > complexity of such an algorithm is similar to having to implement my own 
> > > JSON-LD engine, which feels like an overkill.
> > > Holger

Re: JSON-LD writer and the Titanium RdfDataset

Reply via email to