Re: Graph Store compared to tdb2.tdbloader

David Habgood Mon, 30 May 2022 01:31:21 -0700

Thanks Lorenz and Andy

On Mon, May 30, 2022 at 5:43 PM Andy Seaborne <[email protected]> wrote:


> Hi David,
>
> On 30/05/2022 07:27, Lorenz Buehmann wrote:
> > Hi David,
> >
> > On 29.05.22 15:34, David Habgood wrote:
> >> Hi,
> >>
> >> I've been running Apache Jena Fuseki 4.5.0 in a docker container. I've
> >> loaded data to it two ways: though the graph store protocol, and using
> >> tdb2.tdbloader before starting Jena Fuseki. No issues with either,
> >> however
> >> I'm interested in what differences the two methods have.
> >>
> >> With the graph store protocol, I can put larger RDF files 'close' to
> >> where
> >> the docker container is running and handle any network issues, so the
> >> loads
> >> have been fine. Loading data this way is convenient and allows updates
> >> while Jena Fuseki is running. Are indexes continually updated as more
> >> data
> >> is loaded through the graph store protocol?
>
> Yes.  The storage database is updated as data is loaded.
>
> >>  Are there any other
> >> disadvantages to this method or reasons it (may) not be advised for
> large
> >> datasets? Conversely, I'm aware tdb2.tdbloader can load large
> >> datasets, is
> >> there any reason/s it should be used over graph store protocol?
>
> The difference is how fast the data is loaded. The graph store protocol
> doesn't do anything special for large data - it transactionally loads
> the incoming stream.
>
> The various varieties of loaders have one task - load large data. They
> manipulate the internal datastructures of TDB directly, need exclusive
> access and only apply when loading an initially empty database. If data
> is already present, the loader command does a simple load like GSP.
>
> ("Large" being 100 million+ - hardware dependent and to some extent data
> shape dependent as to the cut-over).
>
> So less convenient but faster at scale.
>
> >> Are there any other methods I should be considering (other than SPARQL
> >> INSERT)?
>
> Those are all the data loading methods.
>
> >> I'll also be running GeoSPARQL Jena for some instances, and needing to
> >> spatially index data. I think this will necessitate using tdb2.tdbloader
> >> and generating the spatial index 'offline' before starting Jena/Fuseki
> >> - or
> >> are there other ways?
> >
> > At least when you have the GeoSPARQL layer enabled in your Fuseki
> > assembler config, the index should be computed on the first start of
> > Fuseki just once and serialized at the configured destination. Only the
> > text index has to be generated offline before
> >
> >>
> >> Thanks
> >> David Habgood
> >>
>

Re: Graph Store compared to tdb2.tdbloader

Reply via email to