Re: TDB2 and bulk loading

Davide Fri, 23 Mar 2018 07:06:00 -0700

Hi,
Now I'm trying to load 200M triples. With "I don't think it's optimal" I
wanted to say that I don't know if it's the optimal approach. Anyway now I
try to use TDB2 with Fuseki server.


Thanks so much

Il 23 mar 2018 2:57 PM, "Andy Seaborne" <[email protected]> ha scritto:



On 23/03/18 10:32, Davide wrote:

> Well, so what is the best way to do this? I'm trying a lot of ways. Now I
> load the default model in memory from dataset with getDefaultModel method,
> and each time, I append triples in another model create with
> createDefaultModel(). When I obtain 200000 triples into the model, I load
> model in the dataset adding the same model to the default model with add()
> function.
>

20,00 or 200,000? Previously itwas 20K.


But I don't think it is optimal.
>

Could you expand on that please?

What load rates with what hardware do you get for TDB1 and for TDB2?


Another way that I was trying is
> load directly models in the Fuseki Server, with a remote connection,
> creating models with 200M size, and loading models, each 200M triples into
> the server. But it causes an GC overhead limit exceeded. So, what is the
> best way to perform this?
>

TDB2 can do that.

For TDB1, you'll have to break it into chunks depending on the memory
available.

    Andy



> 2018-03-21 11:22 GMT+01:00 Andy Seaborne <[email protected]>:
>
> Bulkloading (TDB1) is for working from an empty dataset.  The tricks it
>> uses do not work when there is already data in dataset.  For TDB1, One of
>> the bulkloaders simply loads triples/qwuads, the other refuses to load.
>>
>> For TDB2, which has no limits on the size of transactions, a batch size of
>> 20K, or even 200M, should work. The larger the batch size, the more the
>> transaction overheads are amortized.
>>
>>      Andy
>>
>>
>> On 19/03/18 15:50, Davide wrote:
>>
>> I've about 20000 triples to load each time. I load data into models with
>>> Jena API, and write data inside a StreamWriter. When the buffer has a
>>> certain size, I load data in the dataset with the Bulkloader. But now I'm
>>> trying to use TDB2 with Loader.Bulkload method to see if there are
>>> improvements, but I've a problem. I retrieve the dataset with
>>> "TDB2Factory.connectDataset(location), and pass it in the Bulkload
>>> function. But I've a ClassCastException in runtime:
>>> "org.apache.jena.tdb2.store.DatasetGraphSwitchable cannot be cast to
>>> org.apache.jena.tdb2.store.DatasetGraphTDB". How can I resolve this?
>>>
>>>
>>>
>

Re: TDB2 and bulk loading

Reply via email to