Hi Holger and all, We do something similar to what I think you want done here with RDFrame, a prototype tool we use so our APIs can extract triples from a store according to a (SHACL or CQL) frame:
https://rdframe.dev.kurrawong.ai/ I suspect what we are doing is early days/simple stuff compared to what you need but the principle of frame -> SPARQL seems relevant. Cheers, Nick On Thursday, 11 July 2024 at 15:03, Holger Knublauch <hol...@topquadrant.com> wrote: > Hi Andy, > > thanks for your response. To clarify, it would be a scenario such as a TDB > with 1 million triples and the request is to produce a JSON-LD document from > the "closure" around a given resource (in TopBraid's Source Code panel when > the user navigates to a resource or through API calls). In other words: input > is a Jena Graph, a start node and a JSON-LD frame document, and the output > should be a JSON-LD describing the node and all reachable triples described > by the frame. > > So it sounds like Titanium cannot really be used for this as its algorithms > can only operate on their own in-memory copy of a graph, and we cannot copy > all 1 million triples into memory each time. > > Holger > > > On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote: > > > > Hi Holger, > > > > How big is the database? > > What sort of framing are you aiming to do? > > Using framing to select some from a large database doesn't feel like the > > way to extract triples as you've discovered. Framing can touch anywhere in > > the JSON document. > > > > This recent thread is relevant -- > > https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl > > > > That JSON-LD file is 280 million triples. > > > > It's structure is > > > > [{"@context": <url> , ... } > > ,{"@context": <url> , ... } > > ,{"@context": <url> , ... } > > ... > > ,{"@context": <url> , ... } > > ] > > > > 9 million array entries. > > > > It looks to me like it has been produced by text manipulation, taking each > > entity, writing a separate, self-contained JSON-LD object then, by text, > > making a big array. That, or a tool that is designed specially to write > > large JSON-LD. e.g. the outer array. > > > > That's the same context URL and would be a denial of service attack except > > Titanium reads the whole file as JSON and runs out of space. > > > > The JSON-LD algorithms do assume the whole document is available. Titanium > > is a faithful implementation of the spec. > > > > It is hard to work with. > > > > In JSON the whole object needs to be seen - repeated member names (and > > facto - last duplicate wins) and "@context" being at the end are possible. > > Cases that don't occur in XML. Streaming JSON or JSON-LD is going to have > > to relax the strictness somehow. > > > > JSON-LD is designed around the assumption of small/medium sized data. > > > > And this affects writing. That large file looks like it was specially > > written or at least with a tool that is designed specially to write large > > JSON-LD. e.g. the outer array. > > > > Jena could do with some RDFFormats + writers for JSONLD at scale. Oen > > obvious one is the one extending WriterStreamRDFBatched where a batch is > > the subject and its immediate triples, then write similar to the case above > > except in a way that is one context then the array is with "@graph". > > > > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects > > > > That doesn't solve the reading side - a companion reader would be needed > > that stream-reads JSON. > > > > Contributions welcome! > > > > Andy > > > > On 10/07/2024 12:36, Holger Knublauch wrote: > > > > > I am working on serializing partial RDF graphs to JSON-LD using the > > > Jena-Titanium bridge. > > > Problem: For Titanium to "see" the triples it needs to have a complete > > > copy. See JenaTitanion.convert which copies all Jena triples into a > > > corresponding RdfDatset. This cannot scale if the graph is backed by a > > > database, and we only want to export certain triples (esp for Framing). > > > Titanium's RdfGraph does not provide an incremental function similar to > > > Graph.find() but only returns a complete Java List of all triples. > > > Has anyone here run into the same problem and what would be a solution? > > > I guess one solution would be an incremental algorithm that "walks" a > > > @context and JSON-LD frame document to collect all required Jena triples, > > > producing a sub-graph that can then be sent to Titanium. But the > > > complexity of such an algorithm is similar to having to implement my own > > > JSON-LD engine, which feels like an overkill. > > > Holger