Re: Data backup and restore

2020-06-09 Thread Andy Seaborne
On 09/06/2020 15:25, Tim Flicker wrote: Hi Andy, Thanks for the response. The plan is to implement new endpoints server side to do backup and restore. The backup process will run in the same JVM as the server and uses the same logic that is implemented in

Re: Data backup and restore

2020-06-09 Thread Tim Flicker
Hi Andy, Thanks for the response. The plan is to implement new endpoints server side to do backup and restore. The backup process will run in the same JVM as the server and uses the same logic that is implemented in org.apache.jena.tdb.TDBBackup.backup(...). My only concern is if this

Re: Literal, variable, resource in one object?

2020-06-09 Thread Andy Seaborne
On 09/06/2020 12:23, Steve Vestal wrote: I'm curious if there is an elegant, Jena-style way to do the following (which can be done pragmatically in many ways). I'd like to have a single object that can be any of a literal, a variable, or a resource in a specific model. RDFNode can be either

RE: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Hoffart, Johannes
Hi Andy, Thanks for the helpful pointers by you and others. I will change the heap settings to see if this at least allows the process to finish. For reference, the machine has 128GB of main memory and a regular HDD attached. I also changed the logging settings to see the progress (would be

Literal, variable, resource in one object?

2020-06-09 Thread Steve Vestal
I'm curious if there is an elegant, Jena-style way to do the following (which can be done pragmatically in many ways). I'd like to have a single object that can be any of a literal, a variable, or a resource in a specific model. RDFNode can be either a literal or a resource in a specific model.

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Wolfgang Fahl
Marco thank you for sharing your results. Could you please try to make the sample size 10 and 100 times bigger for the discussion we currently have at hand. Getting to a billion triples has not been a problem for the WikiData import. From 1-10 billion triples it gets tougher and for >10 billion

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Marco Neumann
same here, I get the best performance on single iron with SSD and fast DDRAM. The datacenters in the cloud tend to be very selective and you can only get the fast dedicated hardware in a few locations in the cloud. http://www.lotico.com/index.php/JENA_Loader_Benchmarks In addition keep in mind

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Andy Seaborne
It maybe that SSD is the important factor. 1/ From a while ago, on truthy: https://lists.apache.org/thread.html/70dde8e3d99ce3d69de613b5013c3f4c583d96161dec494ece49a412%40%3Cusers.jena.apache.org%3E before tdb2.tdbloader was a thing. 2/ I did some (not open) testing on a mere 800M and

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Wolfgang Fahl
Hi Johannes, thank you for bringing the issue to this mailinglist again. At https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits there is a question describing the issue and at http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Test_with_Apache_Jena a