Re: JSON-LD writer and the Titanium RdfDataset

Andy Seaborne Wed, 10 Jul 2024 08:54:34 -0700

Hi Holger,

How big is the database?
What sort of framing are you aiming to do?

Using framing to select some from a large database doesn't feel like theway to extract triples as you've discovered. Framing can touch anywherein the JSON document.


This recent thread is relevant --
https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl

That JSON-LD file is 280 million triples.

It's structure is

[{"@context": <url> , ... }
,{"@context": <url> , ... }
,{"@context": <url> , ... }
...
,{"@context": <url> , ... }
]

9 million array entries.

It looks to me like it has been produced by text manipulation, takingeach entity, writing a separate, self-contained JSON-LD object then, bytext, making a big array. That, or a tool that is designed specially towrite large JSON-LD. e.g. the outer array.

That's the same context URL and would be a denial of service attackexcept Titanium reads the whole file as JSON and runs out of space.

The JSON-LD algorithms do assume the whole document is available.Titanium is a faithful implementation of the spec.


It is hard to work with.

In JSON the whole object needs to be seen - repeated member names (andfacto - last duplicate wins) and "@context" being at the end arepossible. Cases that don't occur in XML. Streaming JSON or JSON-LD isgoing to have to relax the strictness somehow.


JSON-LD is designed around the assumption of small/medium sized data.

And this affects writing. That large file looks like it was speciallywritten or at least with a tool that is designed specially to writelarge JSON-LD. e.g. the outer array.

Jena could do with some RDFFormats + writers for JSONLD at scale. Oenobvious one is the one extending WriterStreamRDFBatched where a batch isthe subject and its immediate triples, then write similar to the caseabove except in a way that is one context then the array is with "@graph".


https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects

That doesn't solve the reading side - a companion reader would be neededthat stream-reads JSON.


Contributions welcome!

    Andy

On 10/07/2024 12:36, Holger Knublauch wrote:

I am working on serializing partial RDF graphs to JSON-LD using the 
Jena-Titanium bridge.

Problem: For Titanium to "see" the triples it needs to have a complete copy. 
See JenaTitanion.convert which copies all Jena triples into a corresponding RdfDatset. 
This cannot scale if the graph is backed by a database, and we only want to export 
certain triples (esp for Framing). Titanium's RdfGraph does not provide an incremental 
function similar to Graph.find() but only returns a complete Java List of all triples.

Has anyone here run into the same problem and what would be a solution?

I guess one solution would be an incremental algorithm that "walks" a @context 
and JSON-LD frame document to collect all required Jena triples, producing a sub-graph 
that can then be sent to Titanium. But the complexity of such an algorithm is similar to 
having to implement my own JSON-LD engine, which feels like an overkill.

Holger

Re: JSON-LD writer and the Titanium RdfDataset

Reply via email to