Re: TDB2 bulk loader - multiple files into different graph per file

Andy Seaborne Fri, 26 Aug 2022 13:30:52 -0700



On 26/08/2022 19:50, Dan Brickley wrote:

On Fri, 26 Aug 2022 at 16:27, Andy Seaborne <a...@apache.org> wrote:



On 26/08/2022 15:03, Lorenz Buehmann wrote:

I'm asking because I thought the initial loading is way faster then
iterating over multiple (graph, file) pairs and running the TDB2 loader
for each pair?


Yes. It is faster when loading from empty in a single run of a loader.

The loaders do some straight-to-index work which makes proper
transactions impossible, and so if a load has a parse error, a bypass of
transactions would, at best, break the database with half a load, or, at
worse, break the database.



Is it possible to load into new and dedicated named graphs so that such
partial loads could be easily cleaned up / reverted? Or the corruption is
deeper in the underlying data structures (index etc.)?


What sort of errors are you thinking of?

Loaders are one step of the pipeline from gettign data fro some 3rd partand into database. Their role is get data in as fast as possible withinthe hardware constraints.

A syntax error will be detected by the parser, and when the parseraborts the whole load aborts. Bulk loading is multiphase - load triplesto get a node table, the primary index (SPO, GSPO), then build the otherindexes. It is faster this way - and can have parallelism. Severalloaders have various degrees of parallelism.

If it aborts, there is, at best, a partial SPO table, no other indexes.The rest of the system assumes a valid database.

Syntax errors should be caught by checking first with 'riot' if youcan't trust the source.

The single-threaded loaders are transactional and will abort the loadtransaction. No data loaded, database is in the state as when the loadstarted. They also work on already-existing databases.

For schema errors (SHACL, ShEx) work on valid RDF, and all loaders willwork. The loaders "only" need syntactically RDF.


Schema fixup is later.

        Andy

Dan

         Andy

Re: TDB2 bulk loader - multiple files into different graph per file

Reply via email to