We will be getting close to that. We are about to make a series of updates to RDFLib's JSON-LD capabilities and some sort of frame interpretation as you describe is probably needed. It think we might use new fundamental capabilities in RDFLib to make RDFrame to the JSON-LD Frame to SPARQL translation.
Nick On Thursday, 11 July 2024 at 18:09, Holger Knublauch <hol...@topquadrant.com> wrote: > Yes, using shapes to describe the closure is one option. I wonder if anyone > has a similar algorithm that takes a JSON-LD frame and generates SPARQL > queries for the triples that are visited by the frame, i.e. directly working > on the frame JSON only? > > Holger > > > On 11 Jul 2024, at 7:18 AM, Nicholas Car n...@kurrawong.net wrote: > > > > Hi Holger and all, > > > > We do something similar to what I think you want done here with RDFrame, a > > prototype tool we use so our APIs can extract triples from a store > > according to a (SHACL or CQL) frame: > > > > https://rdframe.dev.kurrawong.ai/ > > > > I suspect what we are doing is early days/simple stuff compared to what you > > need but the principle of frame -> SPARQL seems relevant. > > > > Cheers, Nick > > > > On Thursday, 11 July 2024 at 15:03, Holger Knublauch hol...@topquadrant.com > > wrote: > > > > > Hi Andy, > > > > > > thanks for your response. To clarify, it would be a scenario such as a > > > TDB with 1 million triples and the request is to produce a JSON-LD > > > document from the "closure" around a given resource (in TopBraid's Source > > > Code panel when the user navigates to a resource or through API calls). > > > In other words: input is a Jena Graph, a start node and a JSON-LD frame > > > document, and the output should be a JSON-LD describing the node and all > > > reachable triples described by the frame. > > > > > > So it sounds like Titanium cannot really be used for this as its > > > algorithms can only operate on their own in-memory copy of a graph, and > > > we cannot copy all 1 million triples into memory each time. > > > > > > Holger > > > > > > > On 10 Jul 2024, at 5:53 PM, Andy Seaborne a...@apache.org wrote: > > > > > > > > Hi Holger, > > > > > > > > How big is the database? > > > > What sort of framing are you aiming to do? > > > > Using framing to select some from a large database doesn't feel like > > > > the way to extract triples as you've discovered. Framing can touch > > > > anywhere in the JSON document. > > > > > > > > This recent thread is relevant -- > > > > https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl > > > > > > > > That JSON-LD file is 280 million triples. > > > > > > > > It's structure is > > > > > > > > [{"@context": <url> , ... } > > > > ,{"@context": <url> , ... } > > > > ,{"@context": <url> , ... } > > > > ... > > > > ,{"@context": <url> , ... } > > > > ] > > > > > > > > 9 million array entries. > > > > > > > > It looks to me like it has been produced by text manipulation, taking > > > > each entity, writing a separate, self-contained JSON-LD object then, by > > > > text, making a big array. That, or a tool that is designed specially to > > > > write large JSON-LD. e.g. the outer array. > > > > > > > > That's the same context URL and would be a denial of service attack > > > > except Titanium reads the whole file as JSON and runs out of space. > > > > > > > > The JSON-LD algorithms do assume the whole document is available. > > > > Titanium is a faithful implementation of the spec. > > > > > > > > It is hard to work with. > > > > > > > > In JSON the whole object needs to be seen - repeated member names (and > > > > facto - last duplicate wins) and "@context" being at the end are > > > > possible. Cases that don't occur in XML. Streaming JSON or JSON-LD is > > > > going to have to relax the strictness somehow. > > > > > > > > JSON-LD is designed around the assumption of small/medium sized data. > > > > > > > > And this affects writing. That large file looks like it was specially > > > > written or at least with a tool that is designed specially to write > > > > large JSON-LD. e.g. the outer array. > > > > > > > > Jena could do with some RDFFormats + writers for JSONLD at scale. Oen > > > > obvious one is the one extending WriterStreamRDFBatched where a batch > > > > is the subject and its immediate triples, then write similar to the > > > > case above except in a way that is one context then the array is with > > > > "@graph". > > > > > > > > https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects > > > > > > > > That doesn't solve the reading side - a companion reader would be > > > > needed that stream-reads JSON. > > > > > > > > Contributions welcome! > > > > > > > > Andy > > > > > > > > On 10/07/2024 12:36, Holger Knublauch wrote: > > > > > > > > > I am working on serializing partial RDF graphs to JSON-LD using the > > > > > Jena-Titanium bridge. > > > > > Problem: For Titanium to "see" the triples it needs to have a > > > > > complete copy. See JenaTitanion.convert which copies all Jena triples > > > > > into a corresponding RdfDatset. This cannot scale if the graph is > > > > > backed by a database, and we only want to export certain triples (esp > > > > > for Framing). Titanium's RdfGraph does not provide an incremental > > > > > function similar to Graph.find() but only returns a complete Java > > > > > List of all triples. > > > > > Has anyone here run into the same problem and what would be a > > > > > solution? > > > > > I guess one solution would be an incremental algorithm that "walks" a > > > > > @context and JSON-LD frame document to collect all required Jena > > > > > triples, producing a sub-graph that can then be sent to Titanium. But > > > > > the complexity of such an algorithm is similar to having to implement > > > > > my own JSON-LD engine, which feels like an overkill. > > > > > Holger