Hi Adam,
the server specs you posted are not so important. What disks did you use?

They should be SSD or 15k RPM SAS to make it faster.

Virtuoso can parse multi thread if you split the files before loading, but hdd 
speed is still the bottleneck.

Sebastian 

On June 20, 2019 2:37:16 PM GMT+02:00, Adam Sanchez <[email protected]> 
wrote:
>For your information
>
>a) It took 10.2 days to load the Wikidata RDF dump
>(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
>The bigdata.jnl file turned to be 1.3T
>
>Server technical features
>
>Architecture:          x86_64
>CPU op-mode(s):        32-bit, 64-bit
>Byte Order:            Little Endian
>CPU(s):                16
>On-line CPU(s) list:   0-15
>Thread(s) per core:    2
>Core(s) per socket:    8
>Socket(s):             1
>NUMA node(s):          1
>Vendor ID:             GenuineIntel
>CPU family:            6
>Model:                 79
>Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
>Stepping:              1
>CPU MHz:               1200.476
>CPU max MHz:           3000.0000
>CPU min MHz:           1200.0000
>BogoMIPS:              4197.65
>Virtualization:        VT-x
>L1d cache:             32K
>L1i cache:             32K
>L2 cache:              256K
>L3 cache:              20480K
>RAM: 128G
>
>b) It took 43 hours to load the Wikidata RDF dump
>(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
>07.20.3230.
>I had to patch Virtuoso because it was given the following error each
>time I load the RDF data
>
>09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
>42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
>RDF type and a non-geometry content
>
>The virtuoso.db file turned to be 340G.
>
>Server technical features
>
>Architecture:          x86_64
>CPU op-mode(s):        32-bit, 64-bit
>Byte Order:            Little Endian
>CPU(s):                12
>On-line CPU(s) list:   0-11
>Thread(s) per core:    2
>Core(s) per socket:    6
>Socket(s):             1
>NUMA node(s):          1
>Vendor ID:             GenuineIntel
>CPU family:            6
>Model:                 63
>Model name:            Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
>Stepping:              2
>CPU MHz:               1199.920
>CPU max MHz:           3800.0000
>CPU min MHz:           1200.0000
>BogoMIPS:              6984.39
>Virtualization:        VT-x
>L1d cache:             32K
>L1i cache:             32K
>L2 cache:              256K
>L3 cache:              15360K
>NUMA node0 CPU(s):     0-11
>RAM: 128G
>
>Best,
>
>
>Le mar. 4 juin 2019 à 16:37, Vi to <[email protected]> a écrit :
>>
>> V4 has 8 cores instead of 6.
>>
>> But well, it's a server grade config on purpose!
>>
>> Vito
>>
>> Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey
><[email protected]> ha scritto:
>>>
>>> On Tue, Jun 4, 2019 at 3:14 PM Vi to <[email protected]> wrote:
>>> >
>>> > AFAIR it's a double Xeon E5-2620 v3.
>>> > With modern CPUs frequency is not so significant.
>>>
>>> Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
>>> 2.10GHz (so v4 instead of v3, but the difference is probably
>minimal).
>>>
>>> > Vito
>>> >
>>> > Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez
><[email protected]> ha scritto:
>>> >>
>>> >> Thanks Guillaume!
>>> >> One question more, what is the CPU frequency (GHz)?
>>> >>
>>> >> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>>> >> <[email protected]> a écrit :
>>> >> >
>>> >> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez
><[email protected]> wrote:
>>> >> > >
>>> >> > > Hello,
>>> >> > >
>>> >> > > Does somebody know the minimal hardware requirements (disk
>size and
>>> >> > > RAM) for loading wikidata dump in Blazegraph?
>>> >> >
>>> >> > The actual hardware requirements will depend on your use case.
>But for
>>> >> > comparison, our production servers are:
>>> >> >
>>> >> > * 16 cores (hyper threaded, 32 threads)
>>> >> > * 128G RAM
>>> >> > * 1.5T of SSD storage
>>> >> >
>>> >> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is
>379G.
>>> >> > > The bigdata.jnl file which stores all the triples data in
>Blazegraph
>>> >> > > is 478G but still growing.
>>> >> > > I had 1T disk but is almost full now.
>>> >> >
>>> >> > The current size of our jnl file in production is ~670G.
>>> >> >
>>> >> > Hope that helps!
>>> >> >
>>> >> >     Guillaume
>>> >> >
>>> >> > > Thanks,
>>> >> > >
>>> >> > > Adam
>>> >> > >
>>> >> > > _______________________________________________
>>> >> > > Wikidata mailing list
>>> >> > > [email protected]
>>> >> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Guillaume Lederrey
>>> >> > Engineering Manager, Search Platform
>>> >> > Wikimedia Foundation
>>> >> > UTC+2 / CEST
>>> >> >
>>> >> > _______________________________________________
>>> >> > Wikidata mailing list
>>> >> > [email protected]
>>> >> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >>
>>> >> _______________________________________________
>>> >> Wikidata mailing list
>>> >> [email protected]
>>> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >
>>> > _______________________________________________
>>> > Wikidata mailing list
>>> > [email protected]
>>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>
>>> --
>>> Guillaume Lederrey
>>> Engineering Manager, Search Platform
>>> Wikimedia Foundation
>>> UTC+2 / CEST
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>_______________________________________________
>Wikidata mailing list
>[email protected]
>https://lists.wikimedia.org/mailman/listinfo/wikidata

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to