Le jeu. 11 juin 2020 à 11:13, David Causse <[email protected]> a écrit :
>
> Hi,
>
> did you "munge"[0] the dumps prior to loading them?
> As a comparison, loading the munged dump on a WMF production machine (128G, 
> 32cores, SSD drives) takes around 8days.
>
> 0: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_preparation
>

munge.sh can be found at
https://github.com/wikimedia/wikidata-query-deploy/blob/master/munge.sh

The source is available at
https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/Munge.java

>
> On Thu, Jun 11, 2020 at 12:37 AM Denny Vrandečić <[email protected]> wrote:
>>
>> Did you see this?
>>
>> https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/

I read Total time: ~5.5 days that is really impressive. My latest
attempt at loading latest-lexmes.nt (10G uncompressed) inside my
triple store took me 1 day and it requires 10G of disk space. I made
progress but still far from being able to compete with blazegraph on
that matter. I have an idea about some optimization to do. munge.sh
will help

>>
>> On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín 
>> <[email protected]> wrote:
>>>
>>> Dear all,
>>>
>>> I'm loading the whole wikidata dataset into Blazegraph using a High 
>>> Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. 
>>> After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as 
>>> size. Initially the process was fast, but as the file increased its size 
>>> the loading speed has decreased. I realize that only 14 GB of RAM are being 
>>> used. I already implemented the recomendations given in 
>>> https://github.com/blazegraph/database/wiki/IOOptimization Do you have some 
>>> other recommendations to increase the loading speed?

_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to