Le jeu. 11 juin 2020 à 11:13, David Causse <[email protected]> a écrit : > > Hi, > > did you "munge"[0] the dumps prior to loading them? > As a comparison, loading the munged dump on a WMF production machine (128G, > 32cores, SSD drives) takes around 8days. > > 0: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_preparation >
munge.sh can be found at https://github.com/wikimedia/wikidata-query-deploy/blob/master/munge.sh The source is available at https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/Munge.java > > On Thu, Jun 11, 2020 at 12:37 AM Denny Vrandečić <[email protected]> wrote: >> >> Did you see this? >> >> https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/ I read Total time: ~5.5 days that is really impressive. My latest attempt at loading latest-lexmes.nt (10G uncompressed) inside my triple store took me 1 day and it requires 10G of disk space. I made progress but still far from being able to compete with blazegraph on that matter. I have an idea about some optimization to do. munge.sh will help >> >> On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín >> <[email protected]> wrote: >>> >>> Dear all, >>> >>> I'm loading the whole wikidata dataset into Blazegraph using a High >>> Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. >>> After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as >>> size. Initially the process was fast, but as the file increased its size >>> the loading speed has decreased. I realize that only 14 GB of RAM are being >>> used. I already implemented the recomendations given in >>> https://github.com/blazegraph/database/wiki/IOOptimization Do you have some >>> other recommendations to increase the loading speed? _______________________________________________ Wikidata-tech mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
