Re: TDB2 and bulk loading

Davide Fri, 23 Mar 2018 03:32:24 -0700

Well, so what is the best way to do this? I'm trying a lot of ways. Now I
load the default model in memory from dataset with getDefaultModel method,
and each time, I append triples in another model create with
createDefaultModel(). When I obtain 200000 triples into the model, I load
model in the dataset adding the same model to the default model with add()
function. But I don't think it is optimal. Another way that I was trying is
load directly models in the Fuseki Server, with a remote connection,
creating models with 200M size, and loading models, each 200M triples into
the server. But it causes an GC overhead limit exceeded. So, what is the
best way to perform this?


2018-03-21 11:22 GMT+01:00 Andy Seaborne <[email protected]>:

> Bulkloading (TDB1) is for working from an empty dataset.  The tricks it
> uses do not work when there is already data in dataset.  For TDB1, One of
> the bulkloaders simply loads triples/qwuads, the other refuses to load.
>
> For TDB2, which has no limits on the size of transactions, a batch size of
> 20K, or even 200M, should work. The larger the batch size, the more the
> transaction overheads are amortized.
>
>     Andy
>
>
> On 19/03/18 15:50, Davide wrote:
>
>> I've about 20000 triples to load each time. I load data into models with
>> Jena API, and write data inside a StreamWriter. When the buffer has a
>> certain size, I load data in the dataset with the Bulkloader. But now I'm
>> trying to use TDB2 with Loader.Bulkload method to see if there are
>> improvements, but I've a problem. I retrieve the dataset with
>> "TDB2Factory.connectDataset(location), and pass it in the Bulkload
>> function. But I've a ClassCastException in runtime:
>> "org.apache.jena.tdb2.store.DatasetGraphSwitchable cannot be cast to
>> org.apache.jena.tdb2.store.DatasetGraphTDB". How can I resolve this?
>>
>>

Re: TDB2 and bulk loading

Reply via email to