Hi, Now I'm trying to load 200M triples. With "I don't think it's optimal" I wanted to say that I don't know if it's the optimal approach. Anyway now I try to use TDB2 with Fuseki server.
Thanks so much Il 23 mar 2018 2:57 PM, "Andy Seaborne" <[email protected]> ha scritto: On 23/03/18 10:32, Davide wrote: > Well, so what is the best way to do this? I'm trying a lot of ways. Now I > load the default model in memory from dataset with getDefaultModel method, > and each time, I append triples in another model create with > createDefaultModel(). When I obtain 200000 triples into the model, I load > model in the dataset adding the same model to the default model with add() > function. > 20,00 or 200,000? Previously itwas 20K. But I don't think it is optimal. > Could you expand on that please? What load rates with what hardware do you get for TDB1 and for TDB2? Another way that I was trying is > load directly models in the Fuseki Server, with a remote connection, > creating models with 200M size, and loading models, each 200M triples into > the server. But it causes an GC overhead limit exceeded. So, what is the > best way to perform this? > TDB2 can do that. For TDB1, you'll have to break it into chunks depending on the memory available. Andy > 2018-03-21 11:22 GMT+01:00 Andy Seaborne <[email protected]>: > > Bulkloading (TDB1) is for working from an empty dataset. The tricks it >> uses do not work when there is already data in dataset. For TDB1, One of >> the bulkloaders simply loads triples/qwuads, the other refuses to load. >> >> For TDB2, which has no limits on the size of transactions, a batch size of >> 20K, or even 200M, should work. The larger the batch size, the more the >> transaction overheads are amortized. >> >> Andy >> >> >> On 19/03/18 15:50, Davide wrote: >> >> I've about 20000 triples to load each time. I load data into models with >>> Jena API, and write data inside a StreamWriter. When the buffer has a >>> certain size, I load data in the dataset with the Bulkloader. But now I'm >>> trying to use TDB2 with Loader.Bulkload method to see if there are >>> improvements, but I've a problem. I retrieve the dataset with >>> "TDB2Factory.connectDataset(location), and pass it in the Bulkload >>> function. But I've a ClassCastException in runtime: >>> "org.apache.jena.tdb2.store.DatasetGraphSwitchable cannot be cast to >>> org.apache.jena.tdb2.store.DatasetGraphTDB". How can I resolve this? >>> >>> >>> >
